北京理工大学 机电学院, 北京 100081
* 邮箱: wangzhengjie@bit.edu.cn
收稿:2023-06-13,
网络出版:2024-02-06,
纸质出版:2024-01-30
移动端阅览
岳胜哲, 王正杰. 基于实例分割与光流的动态环境SLAM[J]. 兵工学报, 2024,45(1):156-165.
Shengzhe YUE, Zhengjie WANG. A SLAM in Dynamic Environment Based on Instance Segmentation and Optical Flow[J]. Acta Armamentarii, 2024, 45(1): 156-165.
岳胜哲, 王正杰. 基于实例分割与光流的动态环境SLAM[J]. 兵工学报, 2024,45(1):156-165. DOI: 10.12382/bgxb.2023.0568.
Shengzhe YUE, Zhengjie WANG. A SLAM in Dynamic Environment Based on Instance Segmentation and Optical Flow[J]. Acta Armamentarii, 2024, 45(1): 156-165. DOI: 10.12382/bgxb.2023.0568.
针对传统语义实时定位与建图(Semantic real-time Localization and Mappling
SLAM)算法在动态环境下剔除特征点过多、造成定位精度降低的问题
提出一种基于实例分割与光流的视觉语义SLAM算法。算法使用Mask R-CNN网络对图像中的潜在动态物体进行实例级别的分割
同时在光流线程中对动态物体进行识别并剔除
随后使用剩余的静态光流点与静态特征点联合优化定位
实现语义信息与光流信息的充分融合利用。使用公开数据集测试和地面无人平台实验对所提方法进行验证。实验结果表明
在TUM数据集下
新方法的定位均值误差相比ORB-SLAM2平均提高75%
相比Dyna-SLAM平均提高8.5%。
A visual semantic SLAM algorithm based on instance segmentation and optical flow is proposed to address the issue of excessive removal of features by traditional semantic SLAM algorithms in dynamic environments.The proposed algorithm utilizes a Mask R-CNN network to perform the instance-level segmentation of potential dynamic objects in an image
and also identifies and eliminates dynamic objects in the optical flow thread. The remaining static optical flow points and static feature points are then used to optimize the location estimation process
ensuring the optimal utilization of both semantic and optical flow information. The proposed algorithm is validated through testing on open datasets and an unmanned ground platform experiment. The experimental results indicate that the average error of the proposed algorithm is 75% and 8.5% lower than those of ORB-SLAM2 and Dyna-SLAM
respectively
on TUM dataset.
CADENA C , CARLONE L , CARRILLO H , et al . Past, present, and future of simultaneous localization and mapping: toward the robust-perception age [J ] . IEEE Transactions on Robotics , 2016 , 32 ( 6 ): 1309 - 1332 . DOI: 10.1109/TRO.2016.2624754 http://doi.org/10.1109/TRO.2016.2624754 http://ieeexplore.ieee.org/document/7747236/ http://ieeexplore.ieee.org/document/7747236/
TAKETOMI T , UCHIYAMA H , IKEDA S . Visual SLAM algorithms: a survey from 2010 to 2016 [J ] . IPSJ Transactions on Computer Vision and Applications , 2017 , 9 ( 1 ): 16 . DOI: 10.1186/s41074-017-0027-2 http://doi.org/10.1186/s41074-017-0027-2 SLAM is an abbreviation for simultaneous localization and mapping, which is a technique for estimating sensor motion and reconstructing structure in an unknown environment. Especially, Simultaneous Localization and Mapping (SLAM) using cameras is referred to as visual SLAM (vSLAM) because it is based on visual information only. vSLAM can be used as a fundamental technology for various types of applications and has been discussed in the field of computer vision, augmented reality, and robotics in the literature. This paper aims to categorize and summarize recent vSLAM algorithms proposed in different research communities from both technical and historical points of views. Especially, we focus on vSLAM algorithms proposed mainly from 2010 to 2016 because major advance occurred in that period. The technical categories are summarized as follows: feature-based, direct, and RGB-D camera-based approaches.
张福斌 , 张炳烁 , 杨玉帅 . 基于单目/IMU/里程计融合的SLAM算法 [J ] . 兵工学报 , 2022 , 43 ( 11 ): 2810 - 2818 .
ZHANG F B , ZHANG B S , YANG Y S . SLAM algorithm based on monocular/IMU/odometer fusion [J ] . Acta Armamentarii , 2022 , 43 ( 11 ): 2810 - 2818 . (in Chinese) DOI: 10.12382/bgxb.2022.0240 http://doi.org/10.12382/bgxb.2022.0240 It is common for navigation and positioning accuracy to be reduced when the monocular vision-inertial SLAM algorithm is applied to planar wheeled robots due to additional unobservability. To solve this problem, a tightly-coupled Visual/IMU/Odometer SLAM algorithm is proposed to improve localization accuracy. First, in the visual front-end part, the original image pyramid LK optical flow method is improved, and the rotation information of the gyroscope and the translation information from the odometer are used as priors to optimize the initial optical flow calculation process, thus reducing the calculation amount. Second, IMU/Odometer pre-integral is derived by introducing the wheel odometer information. Finally, odometer constraints are added into the initialization process and back-end nonlinear optimization to realize that vision, IMU, and odometer information are fully integrated. The results of the open-source data set test and car experiment show that the optical flow iteration time of the new algorithm is reduced by about 32.5%, and the average positioning error reduced by about 40% compared with that of VINS-Mono.
MUR-ARTAL R , TARDOS D J . ORB-SLAM2:an open-source SLAM system for monocular, stereo, and RGB-D cameras [J ] . IEEE Transactions on Robotics , 2017 , 33 ( 5 ): 1255 - 1262 . DOI: 10.1109/TRO.2017.2705103 http://doi.org/10.1109/TRO.2017.2705103 http://ieeexplore.ieee.org/document/7946260/ http://ieeexplore.ieee.org/document/7946260/
刘全攀 , 王正杰 , 王寰 . 基于双目视觉-惯性导航的轻型无人机导航算法 [J ] . 兵工学报 , 2020 , 41 ( 增刊2 ): 241 - 248 .
LIU Q P , WANG Z J , WANG H . Navigation algorithm of light UAV based on stereo visual inertial navigation odometer [J ] . Acta Armamentarii , 2020 , 41 ( S2 ): 241 - 248 . (in Chinese)
ENGEL J , KOLTUN V , CREMERS D . Direct sparse odometry [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 3 ): 611 - 625 . DOI: 10.1109/TPAMI.2017.2658577 http://doi.org/10.1109/TPAMI.2017.2658577 Direct Sparse Odometry (DSO) is a visual odometry method based on a novel, highly accurate sparse and direct structure and motion formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry-represented as inverse depth in a reference frame-and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on essentially featureless walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
SUN Y X , LIU M , MENG M Q H . Improving RGB-D SLAM in dynamic environments: a motion removal approach [J ] . Robotics and Autonomous Systems , 2016 , 89 : 110 - 122 . DOI: 10.1016/j.robot.2016.11.012 http://doi.org/10.1016/j.robot.2016.11.012 https://linkinghub.elsevier.com/retrieve/pii/S0921889015302232 https://linkinghub.elsevier.com/retrieve/pii/S0921889015302232
LI S L , LEE D . RGB-D SLAM in dynamic environments using static point weighting [J ] . IEEE Robotics and Automation Letters , 2017 , 2 ( 4 ): 2263 - 2270 . DOI: 10.1109/LRA.2017.2724759 http://doi.org/10.1109/LRA.2017.2724759 http://ieeexplore.ieee.org/document/7972984/ http://ieeexplore.ieee.org/document/7972984/
WANG Y B , HUANG S D . Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios [C ] // Proceedings of IEEE International Conference on Control, Automation, Robotics & Vision . Singapore : IEEE , 2014 : 1841 - 1846 .
TAN W , LIU H M , DONG Z L , et al . Robust monocular SLAM in dynamic environments [C ] // Proceedings of IEEE International Symposium on Mixed and Augmented Reality. Adelaide, SA, Australia:IEEE , 2013 : 209 - 218 .
BESCOS B , FACIL J M , CIVERA J , et al . DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J ] . IEEE Robotics and Automation Letters , 2018 , 3 ( 4 ): 4076 - 4083 . DOI: 10.1109/LSP.2016. http://doi.org/10.1109/LSP.2016. https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7083369 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7083369
CHENG J B , WANG Z , ZHOU H Y , et al . DM-SLAM:a feature-based SLAM system for rigid dynamic scenes [J ] . ISPRS International Journal of Geo-Information , 2020 , 9 ( 4 ): 202 . DOI: 10.3390/ijgi9040202 http://doi.org/10.3390/ijgi9040202 https://www.mdpi.com/2220-9964/9/4/202 https://www.mdpi.com/2220-9964/9/4/202 Most Simultaneous Localization and Mapping (SLAM) methods assume that environments are static. Such a strong assumption limits the application of most visual SLAM systems. The dynamic objects will cause many wrong data associations during the SLAM process. To address this problem, a novel visual SLAM method that follows the pipeline of feature-based methods called DM-SLAM is proposed in this paper. DM-SLAM combines an instance segmentation network with optical flow information to improve the location accuracy in dynamic environments, which supports monocular, stereo, and RGB-D sensors. It consists of four modules: semantic segmentation, ego-motion estimation, dynamic point detection and a feature-based SLAM framework. The semantic segmentation module obtains pixel-wise segmentation results of potentially dynamic objects, and the ego-motion estimation module calculates the initial pose. In the third module, two different strategies are presented to detect dynamic feature points for RGB-D/stereo and monocular cases. In the first case, the feature points with depth information are reprojected to the current frame. The reprojection offset vectors are used to distinguish the dynamic points. In the other case, we utilize the epipolar constraint to accomplish this task. Furthermore, the static feature points left are fed into the fourth module. The experimental results on the public TUM and KITTI datasets demonstrate that DM-SLAM outperforms the standard visual SLAM baselines in terms of accuracy in highly dynamic environments.
ZHONG F W , WANG S , ZHANG Z Q , et al . Detect-SLAM: making object detection and SLAM mutually beneficial [C ] // Proceedings of IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, NV , US : IEEE , 2018 : 1001 - 1010 .
YU C , LIU Z X , LIU X J , et al . DS-SLAM:a semantic visual SLAM towards dynamic environments [C ] // Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Madrid, Spain:IEEE , 2018 : 1168 - 1174 .
LIU Y B , MIURA J . RDS-SLAM: real-time dynamic SLAM using semantic segmentation methods [J ] . IEEE Access , 2021 , 9 : 23772 - 23785 . DOI: 10.1109/Access.6287639 http://doi.org/10.1109/Access.6287639 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639 https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639
ALCANTARILLA P F , YEBES J J , ALMAZAN J , et al . On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments [C ] // Proceedings of IEEE International Conference on Robotics and Automation. Saint Paul, MN , US : IEEE , 2012 : 1290 - 1297 .
刘立涛 , 聂亮 . 合成孔径光学成像系统与图像复原技术 [J ] . 自动化技术与应用 , 2021 , 40 ( 3 ): 96 - 101 .
LIU L T , NIE L . Synthetic aperture optical imaging system and image restoration technology [J ] . Automation Technology and Applications , 2021 , 40 ( 3 ): 96 - 101 . (in Chinese)
FISCHLER M , BOLLES R . Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography [J ] . Communications of the ACM , 1981 , 24 ( 6 ): 381 - 395 . DOI: 10.1145/358669.358692 http://doi.org/10.1145/358669.358692 https://dl.acm.org/doi/10.1145/358669.358692 https://dl.acm.org/doi/10.1145/358669.358692 A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing
HE K M , GKIOXARI G , DOLLAR P , et al . Mask r-cnn [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 386 - 397 . DOI: 10.1109/TPAMI.2018.2844175 http://doi.org/10.1109/TPAMI.2018.2844175 We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron.
TSUNG-YI L , MICHAEL M , SERGE J B , et al . Microsoft coco: common objects in context [J ] . Lecture Notes in Computer Science , 2014 , 48 : 740 - 755 .
NISTER D , NARODITSKY O , BERGEN J . Visual odometry for ground vehicle applications [J ] . Journal of Field Robotics , 2006 , 23 ( 1 ): 3 - 20 . DOI: 10.1002/rob.v23:01 http://doi.org/10.1002/rob.v23:01 https://onlinelibrary.wiley.com/toc/15564967/23/1 https://onlinelibrary.wiley.com/toc/15564967/23/1
CATANIA L , LUATI A . Robust estimation of a location parameter with the integrated hogg function [J ] . Statistics and Probability Letters , 2020 , 164 : 108812 . DOI: 10.1016/j.spl.2020.108812 http://doi.org/10.1016/j.spl.2020.108812 https://linkinghub.elsevier.com/retrieve/pii/S0167715220301152 https://linkinghub.elsevier.com/retrieve/pii/S0167715220301152
DELLAERT F . Square root SAM: simultaneous localization and mapping via square root information smoothing [J ] . The International Journal of Robotics Research , 2006 , 25 ( 12 ): 1181 - 1203 . DOI: 10.1177/0278364906072768 http://doi.org/10.1177/0278364906072768 http://journals.sagepub.com/doi/10.1177/0278364906072768 http://journals.sagepub.com/doi/10.1177/0278364906072768 Solving the SLAM (simultaneous localization and mapping) problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. Smoothing approaches have been investigated as a viable alternative to extended Kalman filter (EKF)-based solutions to the problem. In particular, approaches have been looked at that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact; they can be used in either batch or incremental mode; are better equipped to deal with non-linear process and measurement models; and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. This paper presents the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. Both simulation results and actual SLAM experiments in large-scale environments are presented that underscore the potential of these methods as an alternative to EKF-based approaches.
KAESS M , JOHANNSSON H , ROBERTS R , et al . iSAM2: Incremental smoothing and mapping using the Bayes tree [J ] . The International Journal of Robotics Research , 2012 , 31 ( 2 ): 216 - 235 . DOI: 10.1177/0278364911430419 http://doi.org/10.1177/0278364911430419 http://journals.sagepub.com/doi/10.1177/0278364911430419 http://journals.sagepub.com/doi/10.1177/0278364911430419 We present a novel data structure, the Bayes tree, that provides an algorithmic foundation enabling a better understanding of existing graphical model inference algorithms and their connection to sparse matrix factorization methods. Similar to a clique tree, a Bayes tree encodes a factored probability density, but unlike the clique tree it is directed and maps more naturally to the square root information matrix of the simultaneous localization and mapping (SLAM) problem. In this paper, we highlight three insights provided by our new data structure. First, the Bayes tree provides a better understanding of the matrix factorization in terms of probability densities. Second, we show how the fairly abstract updates to a matrix factorization translate to a simple editing of the Bayes tree and its conditional densities. Third, we apply the Bayes tree to obtain a completely novel algorithm for sparse nonlinear incremental optimization, named iSAM2, which achieves improvements in efficiency through incremental variable re-ordering and fluid relinearization, eliminating the need for periodic batch steps. We analyze various properties of iSAM2 in detail, and show on a range of real and simulated datasets that our algorithm compares favorably with other recent mapping algorithms in both quality and efficiency.
STURM J , ENGELHARD N , ENDRES F , et al . A benchmark for the evaluation of RGB-D SLAM systems [C ] // Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Vilamoura, Algarve , Portugal : IEEE , 2012 : 573 - 580 .
ZHANG J , HENEIN M , MAHONY R , et al . Robust ego and object 6-DoF motion estimation and tracking [C ] // Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, NV , US : IEEE , 2020 : 1875 - 1881 .
NEWSON A , ALMANSA A , FRADET M , et al . Video inpainting of complex scenes [J ] . SIAM Journal on Imaging Sciences , 2014 , 7 ( 4 ): 1993 - 2019 . DOI: 10.1137/140954933 http://doi.org/10.1137/140954933 http://epubs.siam.org/doi/10.1137/140954933 http://epubs.siam.org/doi/10.1137/140954933
0
浏览量
336
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号