May  2022, 18(3): 2109-2128. doi: 10.3934/jimo.2021059

## Stereo visual odometry based on dynamic and static features division

 College of Missile Engineering, Rocket Force University of Engineering, Xi'an, Shaanxi 710025, China

* Corresponding author: Guangbin Cai

Received  June 2020 Revised  December 2020 Published  May 2022 Early access  March 2021

Fund Project: The first author is mainly supported by NSSF of China under Grant (No. 61773387)

Accurate camera pose estimation in dynamic scenes is an important challenge for visual simultaneous localization and mapping, and it is critical to reduce the effects of moving objects on pose estimation. To tackle this problem, a robust visual odometry approach in dynamic scenes is proposed, which can precisely distinguish between dynamic and static features. The key to the proposed method is combining the scene flow and the static features relative spatial distance invariance principle. Moreover, a new threshold is proposed to distinguish dynamic features.Then the dynamic features are eliminated after matching with the virtual map points. In addition, a new similarity calculation function is proposed to improve the performance of loop-closure detection. Finally, the camera pose is optimized after obtaining a closed loop. Experiments have been conducted on TUM datasets and actual scenes, which shows that the proposed method reduces tracking errors significantly and estimates the camera pose precisely in dynamic scenes.

Citation: Hui Xu, Guangbin Cai, Xiaogang Yang, Erliang Yao, Xiaofeng Li. Stereo visual odometry based on dynamic and static features division. Journal of Industrial and Management Optimization, 2022, 18 (3) : 2109-2128. doi: 10.3934/jimo.2021059
##### References:
 [1] P. F. Alcantarilla, J. J. Yebes, J. Almazán et. al., On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments, 2012 IEEE International Conference on Robotics and Automation, Saint Paul, Minnesota, USA, IEEE, 2012. [2] Y. An, B. Li and L. Wang, Calibration of a 3D laser rangefinder and a camera based on optimization solution, J. Ind. Manag. Optim., 17 (2021), 427-445.  doi: 10.3934/jimo.2019119. [3] A. Angeli, D. Filliat and S. Doncieux, Fast and incremental method for loop-closure detection using bags of visual words, IEEE Transactions on Robotics, 24 (2008), 1027-1037. [4] C. Bibby and I. Reid, Simultaneous localisation and mapping in dynamic environments (SLAMIDE) with reversible data association, Robotics: Science and Systems, Atlanta, Georgia, USA, 2007. [5] L. Bose and A. Richards, Fast Depth Edge Detection and Edge Based Rgb-D Slam, IEEE International Conference on Robotics and Automation, Stockholm, Sweden, IEEE, 2016. [6] C. Cadena, L. Carlone and H. Carrillo, Simultaneous localization and mapping: Present, future, and the robust-perception age, IEEE Transactions on Robotics, 32 (2016), 1309-1332. [7] C. Choi, A. J. Trevor and H. I. Christensen, Rgbd Edge Detection and Edge-Based Registration, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, IEEE, 2013. [8] A. J. Davison, I. D. Reid and N. D. Molton, MonoSLAM: Real-time single camera SLAM, IEEE Transactions on Pattern Analysis & Machine Intelligence, 29 (2007), 1052-1067. [9] J. Engel, V. Koltun and D. Cremers, Direct sparse odometry, IEEE Transactions on Pattern Analysis & Machine Intelligence, 40 (2018), 611-625. [10] J. Engel, T. Schöps and D. Cremers, LSD-SLAM: Large-Scale Direct Monocular SLAM, European Conference on Computer Vision, Springer, Zürich, Switzerland, 2014. [11] J. Fan, On the Levenberg-Marquardt methods for convex constrained nonlinear equations, J. Ind. Manag. Optim., 9 (2013), 227-241.  doi: 10.3934/jimo.2013.9.227. [12] C. Forster, M. Pizzoli and D Scaramuzza, SVO: Fast Semi-Direct Monocular Visual Odometry, IEEE International Conference on Robotics and Automation, Hong Kong, China, IEEE, 2014. [13] C. Forster, Z. Zhang and M. Gassner, SVO: Semi-direct visual odometry for monocular and multicamera systems, IEEE Transactions on Robotics, 33 (2017), 249-265. [14] J. Fuentes-Pacheo, J. Ruiz-Ascencio and J. M. Rendón-Mancha, Visual simultaneous localization and mapping: A survey, Artificial Intelligence Review, 43 (2015), 55-81. [15] D.-K. Gu, G.-P. Liu and G.-R. Duan, Robust stability of uncertain second-order linear time-varying systems, J. Franklin Inst., 356 (2019), 9881-9906.  doi: 10.1016/j.jfranklin.2019.09.014. [16] D.-K. Gu and D.-W. Zhang, Parametric control to second-order linear time-varying systems based on dynamic compensator and multi-objective optimization, Appl. Math. Comput., 365 (2020), 124681, 25 pp. doi: 10.1016/j.amc.2019.124681. [17] D. K. Gu and D. W. Zhang, A parametric method to design dynamic compensator for high-order quasi-linear systems, Nonlinear Dynamics, 100 (2020), 1379-1400. [18] C. Kerl, J. Sturm and D. Cremers, Dense Visual Slam for Rgb-D Cameras, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, IEEE, 2013. [19] D. H. Kim and J. H. Kim, Image-Based Icp Algorithm for Visual Odometry using a Rgb-D Sensor in a Dynamic Environment, Robot Intelligence Technology and Applications, Gwangju, Korea, Springer, 2013. [20] D. H. Kim, S. B. Han and J. H. Kim, Visual Odometry Algorithm using an Rgb-D Sensor and Imu in a Highly Dynamic Environment, Robot Intelligence Technology and Applications, Beijing, China, Springer, 2015. [21] D. H. Kim and J. H. Kim, Effective background model-based rgb-d dense visual odometry in a dynamic environment, IEEE Transactions on Robotics, 32 (2016), 1565-1573. [22] M. Labbe and F. Michaud, Appearance-based loop closure detection for online large-scale and long-term operation, IEEE Transactions on Robotics, 9 (2013), 734-745. [23] S. Li and D. Lee, Rgb-d slam in dynamic environments using static point weighting, IEEE Robotics and Automation Letters, 2 (2017), 2263-2270. [24] B. Li, D. Yang and L. Deng, Visual vocabulary tree with pyramid TF-IDF scoring match scheme for loop closure detection, Acta Automatica Sinica, 37 (2011), 665-673. [25] Y. Li, G. Zhang and F. Wang, An improved loop closure detection algorithm based on historical model set, Robot, 37 (2015), 663-673. [26] Z. L. Lin, G. L. Zhang and E. Yao, Sterero visual odometry based on motion object detection in the dynamic scene, Acta Optica Sinica, 37 (2017), 187-195. [27] M. Lourakis and X. Zabulis, Model-Based Pose Estimation for Rigid Objects, International conference on computer vision systems, St. Petersburg, Russia, Springer, 2013. [28] R. Mur-Artal, J. M. M. Montiel and J. D. Tardós, ORB-SLAM: A versatile and accurate monocular slam system, IEEE Transactions on Robotics, 31 (2015), 1147-1163. [29] R. Mur-Artal and J. D. Tardós, ORB-SLAM2: An opensource slam system for monocular, stereo, and rgbd cameras, IEEE Transactions on Robotics, 335 (2017), 1255-1262. [30] D. Nistér, O. Naroditsky and J. Bergen, Visual odometry for ground vehicle applications, Journal of Field Robotics, 23 (2006), 3-20. [31] Z. Peng, Research on Vision-Based Ego-Motion Estimation and Environment Modeling in Dynamic Environment, Ph.D. dissertation, Zhejiang University, Hangzhou, China, 2013. [32] D. Scaramuzza and F. Fraundorfer, Visual odometry, IEEE Robotics & Automation Magazine, 18 (2011), 80-92. [33] J. Sturm, N. Engelhard, F. Endres et. al., A Benchmark for the Evaluation of RGB-D SLAM Systems, IEEE International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, IEEE, 2012. [34] Y. Sun, M. Liu and M. Q. H. Meng, Improving rgbd slam in dynamic environments: A motion removal approach, Robotics and Autonomous Systems, 89 (2017), 110-122. [35] W. Tan, H. Liu, Z. Dong et. al., Robust Monocular SLAM in Dynamic Environments, IEEE International Symposium on Mixed and Augmented Reality, Adelaide, Australia, IEEE, 2013. [36] G. Younes, D. Asmar and E. Shammas, Keyframe-based monocular slam: Design, survey, and future directions, Robotics and Autonomous Systems, 98 (2017), 67-88.

Stereo camera model
Generation of a visual vocabulary tree
Overview of the proposed algorithm in dynamic scenes
Classification of the scene flow based on angles [26]
Invariance of the relative spatial distance of the static points
Construction of the virtual map points
Three static features selected by the algorithm
Dynamic features obtained by the algorithm
Experiment scene sets
Experimental results of ORB-VO in lab scenes
Experimental results of the proposed method in lab scenes
Loop-closure detection result of the inverse proportional function
Loop-closure detection result of the negative exponential power function
Loop-closure detection result of the negative exponential power function
Comparisons between estimated trajectories and the ground truth in walking sequences
Comparisons between estimated trajectories and the ground truth in sitting sequences
Translation drift and rotational drift of VO method on TUM dataset
 Sequences RMSE of translational drift [m/s] RMSE of rotational drift [$^{\circ}$/s] DVO BaMVO SPW-VO Our Method DVO BaMVO SPW-VO Our Method sitting-static 0.0157 0.0248 0.0231 0.0112 0.6084 0.6977 0.7228 0.3356 sitting-xyz 0.0453 0.0482 0.0219 0.0132 1.4980 1.3885 0.8466 0.5753 sitting-rpy 0.1735 0.1872 0.0843 0.0280 6.0164 5.9834 5.6258 0.6811 sitting-halfsphere 0.1005 0.0589 0.0389 0.0151 4.6490 2.8804 1.8836 0.6103 walking-static 0.3818 0.1339 0.0327 0.0293 6.3502 2.0833 0.8085 0.5500 walking-xyz 0.4360 0.2326 0.0651 0.1034 7.6669 4.3911 1.6442 2.3273 walking-rpy 0.4038 0.3584 0.2252 0.2143 7.0662 6.3898 5.6902 3.9555 walking-halfsphere 0.2628 0.1738 0.0527 0.1061 5.2179 4.2863 2.4048 2.2983
 Sequences RMSE of translational drift [m/s] RMSE of rotational drift [$^{\circ}$/s] DVO BaMVO SPW-VO Our Method DVO BaMVO SPW-VO Our Method sitting-static 0.0157 0.0248 0.0231 0.0112 0.6084 0.6977 0.7228 0.3356 sitting-xyz 0.0453 0.0482 0.0219 0.0132 1.4980 1.3885 0.8466 0.5753 sitting-rpy 0.1735 0.1872 0.0843 0.0280 6.0164 5.9834 5.6258 0.6811 sitting-halfsphere 0.1005 0.0589 0.0389 0.0151 4.6490 2.8804 1.8836 0.6103 walking-static 0.3818 0.1339 0.0327 0.0293 6.3502 2.0833 0.8085 0.5500 walking-xyz 0.4360 0.2326 0.0651 0.1034 7.6669 4.3911 1.6442 2.3273 walking-rpy 0.4038 0.3584 0.2252 0.2143 7.0662 6.3898 5.6902 3.9555 walking-halfsphere 0.2628 0.1738 0.0527 0.1061 5.2179 4.2863 2.4048 2.2983
RMSE of the ATE of camera pose estimation (m$^{-1}$)
 Sequences ORB-SLAM2 MR-SLAM SPW-SLAM SF-SLAM Our Method sitting-static 0.0082 – – 0.0081 0.0073 sitting-xyz 0.0094 0.0482 0.0397 0.0101 0.0090 sitting-rpy 0.0197 – – 0.0180 0.0162 sitting-halfsphere 0.0211 0.0470 0.0432 0.0239 0.0164 walking-static 0.1028 0.0656 0.0261 0.0120 0.0108 walking-xyz 0.4278 0.0932 0.0601 0.2251 0.0884 walking-rpy 0.7407 0.1333 0.1791 0.1961 0.3620 walking-halfsphere 0.4939 0.1252 0.0489 0.0423 0.0411
 Sequences ORB-SLAM2 MR-SLAM SPW-SLAM SF-SLAM Our Method sitting-static 0.0082 – – 0.0081 0.0073 sitting-xyz 0.0094 0.0482 0.0397 0.0101 0.0090 sitting-rpy 0.0197 – – 0.0180 0.0162 sitting-halfsphere 0.0211 0.0470 0.0432 0.0239 0.0164 walking-static 0.1028 0.0656 0.0261 0.0120 0.0108 walking-xyz 0.4278 0.0932 0.0601 0.2251 0.0884 walking-rpy 0.7407 0.1333 0.1791 0.1961 0.3620 walking-halfsphere 0.4939 0.1252 0.0489 0.0423 0.0411
