Semantic SLAM

Trajectory Estimation


We adopt the standard Absolute Pose Error (APE) and Relative Pose Error (RPE) as metrics for evaluating pose estimation. We align the predicted trajectory to the ground truth using a rigid transformation to evaluate the APE. The RPE is evaluated between two frames with a distance of 1 meter.

Method Setting Code APE RPE Runtime Environment
1 CT-ICP2
This method makes use of Velodyne laser scans.
code 0.50 1.00 % 0.06 s 1 core @ 3.5 Ghz (C/C++)
P. Dellenbach, J. Deschaud, B. Jacquet and F. Goulette: CT-ICP: Real-time Elastic LiDAR Odometry with Loop Closure. 2022 International Conference on Robotics and Automation (ICRA) 2022.
2 SOFT2
This method uses stereo information.
0.70 0.84 % 0.1 s 4 cores @ 2.5 Ghz (C/C++)
I. Cvišić, I. Marković and I. Petrović: SOFT2: Stereo Visual Odometry for Road Vehicles Based on a Point-to-Epipolar-Line Metric. IEEE Transactions on Robotics 2022.
3 MOLA-LO + LC
This method makes use of Velodyne laser scans.
code 0.72 3.97 % 0.03 s >8 cores @ 2.5 Ghz (C/C++)
4 ORB-SLAM2 1.92 2.03 % NVIDIA V100
R. Mur-Artal and J. Tard'{o}s: ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. TRO 2017.
5 SUMA++ 3.13 2.72 % NVIDIA V100
X. Chen, A. Milioto, E. Palazzolo, P. Gigu\`{e}re, J. Behley and C. Stachniss: SuMa++: Efficient LiDAR-based Semantic SLAM. IROS 2019.
Table as LaTeX | Only published Methods


Geometric and Semantic Mapping


We evaluate geometric completion and semantic estimation and rank the methods according to the confidence weighted mean intersection-over-union (mIoU). Geometric completion is evaluated via completeness and accuracy at a threshold of 20cm. Completeness is calculated as the fraction of ground truth points of which the distances to their closest reconstructed points are below the threshold. Accuracy instead measures the percentage of reconstructed points that are within a distance threshold to the ground truth points. As our ground truth reconstruction may not be complete, we prevent punishing reconstructed points by dividing the space into observed and unobserved regions, which are determined by the unobserved volume from a 3D occupancy map obtained using OctoMap. We further measure the F1 score as the harmonic mean of the completeness and the accuracy.

Method Setting Code Accuracy Completeness F1 mIoU Class Runtime Environment
1 S-DSP code 79.15 72.45 75.64 37.59 3s s 1 core @ 3.5 Ghz (C/C++)
G. Chen, Z. Wang, W. Dong and J. Alonso-Mora: Particle-based Instance-aware Semantic Occupancy Mapping in Dynamic Environments. 2024.
2 ORB-SLAM2 + PSPNet 81.77 74.89 78.15 32.48 NVIDIA V100
R. Mur-Artal and J. Tard'{o}s: ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. TRO 2017.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
3 SUMA++ 90.98 64.19 75.27 19.40 NVIDIA V100
X. Chen, A. Milioto, E. Palazzolo, P. Gigu\`{e}re, J. Behley and C. Stachniss: SuMa++: Efficient LiDAR-based Semantic SLAM. IROS 2019.
Table as LaTeX | Only published Methods





eXTReMe Tracker