• Previous Article
    Eliminating other-race effect for multi-ethnic facial expression recognition
  • MFC Home
  • This Issue
  • Next Article
    Comparisons of different methods for balanced data classification under the discrete non-local total variational framework
February  2019, 2(1): 29-41. doi: 10.3934/mfc.2019003

SEMANTIC-RTAB-MAP (SRM): A semantic SLAM system with CNNs on depth images

1. 

Beihang University, Beijing, China

2. 

Shenzhen Academy of Aerospace Technology, Shenzhen, China

* Corresponding author: bczhang@buaa.edu.cn

Published  March 2019

SLAM (simultaneous localization and mapping) system can be implemented based on monocular, RGB-D and stereo cameras. RTAB-MAP is a SLAM system, which can build dense 3D map. In this paper, we present a novel method named SEMANTIC-RTAB-MAP (SRM) to implement a semantic SLAM system based on RTAB-MAP and deep learning. We use YOLOv2 network to detect target objects in 2D images, and then use depth information for precise localization of the targets and finally add semantic information into 3D point clouds. We apply SRM in different scenes, and the results show its higher running speed and accuracy.

Citation: Mingyuan Mao, Hewei Zhang, Simeng Li, Baochang Zhang. SEMANTIC-RTAB-MAP (SRM): A semantic SLAM system with CNNs on depth images. Mathematical Foundations of Computing, 2019, 2 (1) : 29-41. doi: 10.3934/mfc.2019003
References:
[1]

R. Q. Charles, H. Su, K. Mo and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017). doi: 10.1109/CVPR.2017.16. Google Scholar

[2]

R. Girshick and J. Donahue, Trevor Darrell and Jitendra Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, (2013), 580-587.Google Scholar

[3]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778. doi: 10.1109/CVPR.2016.90. Google Scholar

[4]

M. Labbé and F. Michaud, Long-term online multi-session graph-based splam with memory management, Autonomous Robots, 3 (2017), 1-18. Google Scholar

[5]

M. Labbe and F. Michaud, Online global loop closure detection for large-scale multi-session graph-based SLAM, IEEE/RSJ International Conference on Intelligent Robots and Systems, (2014), 2661-2666. doi: 10.1109/IROS.2014.6942926. Google Scholar

[6]

M. Labbé and F. Michaud, Appearance-based loop closure detection for online large-scale and long-term operation, IEEE Transactions on Robotics, 29 (2013), 734-745. Google Scholar

[7]

M. Labbe and F. Michaud, Memory management for real-time appearance-based loop closure detection, IEEE/RSJ International Conference on Intelligent Robots and Systems, (2011), 1271-1276. doi: 10.1109/IROS.2011.6094602. Google Scholar

[8]

X. Li and R. Belaroussi, Semi-dense 3d semantic mapping from monocular slam, 2016.Google Scholar

[9]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed and C. Y. Fu, et al, SSD: Single Shot MultiBox Detector. European Conference on Computer Vision, Springer International Publishing, (2016), 21-37.Google Scholar

[10]

J. Mccormac, A. Handa, A. Davison and S. Leutenegger, Semanticfusion: dense 3d semantic mapping with convolutional neural networks, 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017. doi: 10.1109/ICRA.2017.7989538. Google Scholar

[11]

R. Mur-Artal and J. D. Tardós, Probabilistic semi-dense mapping from highly accurate feature-based monocular SLAM, Robotics: Science and Systems, (2015), 1-9. doi: 10.15607/RSS.2015.XI.041. Google Scholar

[12]

N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics, 9 (1979), 62-66. doi: 10.1109/TSMC.1979.4310076. Google Scholar

[13]

J. Redmon, S. Divvala, R. Girshick and A. Farhadi, You only look once: Unified, real-time object detection, Computer Vision and Pattern Recognition, (2016), 779-788. doi: 10.1109/CVPR.2016.91. Google Scholar

[14]

J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, IEEE Conference on Computer Vision and Pattern Recognition, (2017), 6517-6525. doi: 10.1109/CVPR.2017.690. Google Scholar

[15]

J. Redmon and A. Farhadi, Yolov3: an incremental improvement, 2018.Google Scholar

[16]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149. doi: 10.1109/TPAMI.2016.2577031. Google Scholar

[17]

N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford and I. Reid, Meaningful maps with object-oriented semantic mapping., Ieee/rsj International Conference on Intelligent Robots and Systems, IEEE, (2017), 5079-5085.Google Scholar

[18]

T. Whelan, S. Leutenegger, R. S. Moreno, B. Glocker and A. Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems, 2015. doi: 10.15607/RSS.2015.XI.001. Google Scholar

show all references

References:
[1]

R. Q. Charles, H. Su, K. Mo and L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017). doi: 10.1109/CVPR.2017.16. Google Scholar

[2]

R. Girshick and J. Donahue, Trevor Darrell and Jitendra Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, (2013), 580-587.Google Scholar

[3]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778. doi: 10.1109/CVPR.2016.90. Google Scholar

[4]

M. Labbé and F. Michaud, Long-term online multi-session graph-based splam with memory management, Autonomous Robots, 3 (2017), 1-18. Google Scholar

[5]

M. Labbe and F. Michaud, Online global loop closure detection for large-scale multi-session graph-based SLAM, IEEE/RSJ International Conference on Intelligent Robots and Systems, (2014), 2661-2666. doi: 10.1109/IROS.2014.6942926. Google Scholar

[6]

M. Labbé and F. Michaud, Appearance-based loop closure detection for online large-scale and long-term operation, IEEE Transactions on Robotics, 29 (2013), 734-745. Google Scholar

[7]

M. Labbe and F. Michaud, Memory management for real-time appearance-based loop closure detection, IEEE/RSJ International Conference on Intelligent Robots and Systems, (2011), 1271-1276. doi: 10.1109/IROS.2011.6094602. Google Scholar

[8]

X. Li and R. Belaroussi, Semi-dense 3d semantic mapping from monocular slam, 2016.Google Scholar

[9]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed and C. Y. Fu, et al, SSD: Single Shot MultiBox Detector. European Conference on Computer Vision, Springer International Publishing, (2016), 21-37.Google Scholar

[10]

J. Mccormac, A. Handa, A. Davison and S. Leutenegger, Semanticfusion: dense 3d semantic mapping with convolutional neural networks, 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017. doi: 10.1109/ICRA.2017.7989538. Google Scholar

[11]

R. Mur-Artal and J. D. Tardós, Probabilistic semi-dense mapping from highly accurate feature-based monocular SLAM, Robotics: Science and Systems, (2015), 1-9. doi: 10.15607/RSS.2015.XI.041. Google Scholar

[12]

N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics, 9 (1979), 62-66. doi: 10.1109/TSMC.1979.4310076. Google Scholar

[13]

J. Redmon, S. Divvala, R. Girshick and A. Farhadi, You only look once: Unified, real-time object detection, Computer Vision and Pattern Recognition, (2016), 779-788. doi: 10.1109/CVPR.2016.91. Google Scholar

[14]

J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, IEEE Conference on Computer Vision and Pattern Recognition, (2017), 6517-6525. doi: 10.1109/CVPR.2017.690. Google Scholar

[15]

J. Redmon and A. Farhadi, Yolov3: an incremental improvement, 2018.Google Scholar

[16]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149. doi: 10.1109/TPAMI.2016.2577031. Google Scholar

[17]

N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford and I. Reid, Meaningful maps with object-oriented semantic mapping., Ieee/rsj International Conference on Intelligent Robots and Systems, IEEE, (2017), 5079-5085.Google Scholar

[18]

T. Whelan, S. Leutenegger, R. S. Moreno, B. Glocker and A. Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems, 2015. doi: 10.15607/RSS.2015.XI.001. Google Scholar

Figure 1.  Performance of SRM. (First from Left) Original RGB-D image including a label of handbag. (Second from Left) Result of edge detection on the depth image. (Second from Right) Result of region growing. (First from Right) The corresponding 3D point cloud
Figure 2.  Results by Li et. al. [8]
Figure 3.  The Structure of Memory Management. [7]
Figure 4.  YOLOv1 structure. [13]
Figure 5.  Results by YOLOv2
Figure 6.  Overview of SRM method
Figure 7.  Results by YOLOv2
Figure 8.  The flow chart of our precise localization
Figure 9.  (Top) Original RGB-D image. (Middle Left) Result of edge detection by Canny operator. (Middle Right) Result of region growing. (Bottom) Semantic RGB-D image
Figure 10.  The left is the original local point cloud of RTAB-MAP. The right is the corresponding point cloud of SRM. Two refrigerators are recognized by YOLOv2 and painted with green (we assume that green represents refrigerators)
Figure 11.  (First Column) Original RGB image. (Second Column) Edges of targets extracted in depth image by Canny operator. (Third Column) Corresponding local point clouds. The blue sticks in the point clouds in the first three rows are axes. Because we detected different targets separately, some different targets are painted the same color. When we detect them at the same time, we just need to assign different colors to different classes of objects
Figure 12.  The left is the original RGB image. The right is the corresponding point cloud. We paint the bottle red, the laptop blue and the handbag green. We don't show Edges of targets extracted in depth image here because SRM process different objects one by one, which means we don't have an image including all of their edges. The handbag is not shown completely in the point cloud because the Kinect2 didn't get the depth data of that area
Figure 13.  The performance of semi-dense 3D semantic mapping. [8] (Top) Original image. (Bottom) Corresponding point cloud. Red represents buildings. Purple represents cars. Green bounding boxes are added by us. They are not included in the original image
Figure 14.  The left is the original RGB image. The right is the point cloud
Figure 15.  The left is the original RGB image. The right is the point cloud
[1]

Lingshuang Kong, Changjun Yu, Kok Lay Teo, Chunhua Yang. Robust real-time optimization for blending operation of alumina production. Journal of Industrial & Management Optimization, 2017, 13 (3) : 1149-1167. doi: 10.3934/jimo.2016066

[2]

Melody Dodd, Jennifer L. Mueller. A real-time D-bar algorithm for 2-D electrical impedance tomography data. Inverse Problems & Imaging, 2014, 8 (4) : 1013-1031. doi: 10.3934/ipi.2014.8.1013

[3]

Thomas Demoor, Joris Walraevens, Dieter Fiems, Stijn De Vuyst, Herwig Bruneel. Influence of real-time queue capacity on system contents in DiffServ's expedited forwarding per-hop-behavior. Journal of Industrial & Management Optimization, 2010, 6 (3) : 587-602. doi: 10.3934/jimo.2010.6.587

[4]

Xiang-Sheng Wang, Luoyi Zhong. Ebola outbreak in West Africa: real-time estimation and multiple-wave prediction. Mathematical Biosciences & Engineering, 2015, 12 (5) : 1055-1063. doi: 10.3934/mbe.2015.12.1055

[5]

Matthieu Canaud, Lyudmila Mihaylova, Jacques Sau, Nour-Eddin El Faouzi. Probability hypothesis density filtering for real-time traffic state estimation and prediction. Networks & Heterogeneous Media, 2013, 8 (3) : 825-842. doi: 10.3934/nhm.2013.8.825

[6]

Le Thi Hoai An, Tran Duc Quynh, Kondo Hloindo Adjallah. A difference of convex functions algorithm for optimal scheduling and real-time assignment of preventive maintenance jobs on parallel processors. Journal of Industrial & Management Optimization, 2014, 10 (1) : 243-258. doi: 10.3934/jimo.2014.10.243

[7]

Wei Huang, Ka-Fai Cedric Yiu, Henry Y. K. Lau. Semi-definite programming based approaches for real-time tractor localization in port container terminals. Numerical Algebra, Control & Optimization, 2013, 3 (4) : 665-680. doi: 10.3934/naco.2013.3.665

[8]

Chengtao Yong, Yan Huo, Chunqiang Hu, Yanfei Lu, Guanlin Jing. A real-time aggregate data publishing scheme with adaptive ω-event differential privacy. Mathematical Foundations of Computing, 2018, 1 (3) : 295-309. doi: 10.3934/mfc.2018014

[9]

Tao Guan, Denghua Zhong, Bingyu Ren, Pu Cheng. Construction schedule optimization for high arch dams based on real-time interactive simulation. Journal of Industrial & Management Optimization, 2015, 11 (4) : 1321-1342. doi: 10.3934/jimo.2015.11.1321

[10]

Yeming Dai, Yan Gao, Hongwei Gao, Hongbo Zhu, Lu Li. A real-time pricing scheme considering load uncertainty and price competition in smart grid market. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-17. doi: 10.3934/jimo.2018178

[11]

Wan Nor Ashikin Wan Ahmad Fatthi, Adibah Shuib, Rosma Mohd Dom. A mixed integer programming model for solving real-time truck-to-door assignment and scheduling problem at cross docking warehouse. Journal of Industrial & Management Optimization, 2016, 12 (2) : 431-447. doi: 10.3934/jimo.2016.12.431

[12]

Boran Hu, Zehui Cheng, Zhangbing Zhou. Web services recommendation leveraging semantic similarity computing. Mathematical Foundations of Computing, 2018, 1 (2) : 101-119. doi: 10.3934/mfc.2018006

[13]

Anna Erschler. Iterated identities and iterational depth of groups. Journal of Modern Dynamics, 2015, 9: 257-284. doi: 10.3934/jmd.2015.9.257

[14]

Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan. Multiple-instance learning for text categorization based on semantic representation. Big Data & Information Analytics, 2017, 2 (1) : 69-75. doi: 10.3934/bdia.2017009

[15]

Moulay Rchid Sidi Ammi, Ismail Jamiai. Finite difference and Legendre spectral method for a time-fractional diffusion-convection equation for image restoration. Discrete & Continuous Dynamical Systems - S, 2018, 11 (1) : 103-117. doi: 10.3934/dcdss.2018007

[16]

Simone Fiori. Auto-regressive moving-average discrete-time dynamical systems and autocorrelation functions on real-valued Riemannian matrix manifolds. Discrete & Continuous Dynamical Systems - B, 2014, 19 (9) : 2785-2808. doi: 10.3934/dcdsb.2014.19.2785

[17]

Didier Bresch, Jacques Simon. Western boundary currents versus vanishing depth. Discrete & Continuous Dynamical Systems - B, 2003, 3 (3) : 469-477. doi: 10.3934/dcdsb.2003.3.469

[18]

Penka Georgieva, Aleksey Zinger. Real orientations, real Gromov-Witten theory, and real enumerative geometry. Electronic Research Announcements, 2017, 24: 87-99. doi: 10.3934/era.2017.24.010

[19]

Ken-Ichi Nakamura, Toshiko Ogiwara. Periodically growing solutions in a class of strongly monotone semiflows. Networks & Heterogeneous Media, 2012, 7 (4) : 881-891. doi: 10.3934/nhm.2012.7.881

[20]

Julijana Gjorgjieva, Jon Jacobsen. Turing patterns on growing spheres: the exponential case. Conference Publications, 2007, 2007 (Special) : 436-445. doi: 10.3934/proc.2007.2007.436

 Impact Factor: 

Metrics

  • PDF downloads (95)
  • HTML views (745)
  • Cited by (0)

[Back to Top]