Multi-Object Tracking and Segmentation (MOTS) Evaluation


The Multi-Object and Segmentation (MOTS) benchmark consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixelwise segmentation labels for every object. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML (adapted for the segmentation case).

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.
Additional information used by the methods
  • Stereo: Method uses left and right (stereo) images
  • Laser Points: Method uses point clouds from Velodyne laser scanner
  • GPS: Method uses GPS information
  • Online: Online method (frame-by-frame processing, no latency)
  • Additional training data: Use of additional data sources for training (see details)

CAR


Method Setting Code sMOTSA MOTSA MOTSP MOTSAL MODSA MODSP MT ML IDS Frag Runtime Environment
1 ViP-DeepLab 81.00 % 90.70 % 89.90 % 91.80 % 91.80 % 92.20 % 92.20 % 0.60 % 392 580 0.1 s 1 core @ 2.5 Ghz (C/C++)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
2 PointTrack 78.50 % 90.90 % 87.10 % 91.80 % 91.80 % 89.70 % 90.80 % 0.60 % 346 645 0.045 s GPU @ 2.5 Ghz (Python)
Z. Xu, W. Zhang, X. Tan, W. Yang, H. Huang, S. Wen, E. Ding and L. Huang: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation. Proceedings of the European Conference on Computer Vision (ECCV) 2020.
3 OPITrack code 78.00 % 90.40 % 87.20 % 91.80 % 91.80 % 89.70 % 91.30 % 0.80 % 542 832 0.09 s 1 core @ 2.5 Ghz (C/C++)
Y. Gao, H. Xu, Y. Zheng, J. Li and X. Gao: An Object Point Set Inductive Tracker for Multi-Object Tracking and Segmentation. IEEE Transactions on Image Processing 2022.
4 MAF_HDA
This is an online method (no batch processing).
code 77.20 % 87.70 % 88.40 % 88.90 % 88.90 % 90.90 % 82.00 % 0.80 % 415 706 0.09 s 4 cores @ 4.2 Ghz (C/C++)
Y. Song, Y. Yoon, K. Yoon and M. Jeon: Multi-Object Tracking and Segmentation with Embedding Mask-based Affinity Fusion in Hierarchical Data Association. IEEE Access 2022.
5 ReMOTS 75.90 % 86.70 % 88.20 % 88.70 % 88.70 % 90.70 % 84.50 % 0.60 % 716 905 3 s 1 core @ 2.5 Ghz (Python)
F. Yang, X. Chang, C. Dang, Z. Zheng, S. Sakti, S. Nakamura and Y. Wu: ReMOTS: Self-Supervised Refining Multi- Object Tracking and Segmentation. 2020.
6 GMPHD_SAF
This is an online method (no batch processing).
75.40 % 86.70 % 87.50 % 88.20 % 88.20 % 90.10 % 82.00 % 0.60 % 549 874 0.08 s 4 cores @ 4.2 Ghz (C/C++)
Y. Song and M. Jeon: Online Multi-Object Tracking and Segmentation with GMPHD Filter and Simple Affinity Fusion. arXiv preprint arXiv:2009.00100 2020.
7 MOTSFusion code 75.00 % 84.10 % 89.30 % 84.70 % 84.70 % 91.70 % 66.10 % 6.20 % 201 572 0.44 s GPU @ 2.5 Ghz (Python)
J. Luiten, T. Fischer and B. Leibe: Track to Reconstruct and Reconstruct to Track. IEEE Robotics and Automation Letters 2020.
8 SearchTrack code 74.80 % 86.80 % 86.80 % 88.50 % 88.50 % 89.70 % 80.00 % 1.50 % 614 983 0.19 s GPU @ 2.5 Ghz (Python)
Z. Tsai, Y. Tsai, C. Wang, H. Liao, Y. Lin and Y. Chuang: SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features. BMVC 2022.
9 EagerMOT code 74.50 % 83.50 % 89.60 % 84.80 % 84.80 % 92.10 % 67.10 % 3.50 % 457 811 0.011 s 4 cores @ 3.0 Ghz (Python)
A. Kim, A. Osep and L. Leal-Taix'e: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion. IEEE International Conference on Robotics and Automation (ICRA) 2021.
10 TrackR-CNN code 67.00 % 79.60 % 85.10 % 81.50 % 81.50 % 88.30 % 74.90 % 2.30 % 692 1058 0.5 s GPU @ 2.5 Ghz (Python)
P. Voigtlaender, M. Krause, A. O\usep, J. Luiten, B. Sekar, A. Geiger and B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
11 STC-Seg 66.20 % 81.10 % 82.80 % 82.90 % 82.90 % 86.30 % 71.90 % 2.00 % 676 1093 0.25 s 1 core @ 2.5 Ghz (C/C++)
Y. Liqi, W. Qifan, M. Siqi, W. Jingang and C. Yu: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2022.
Table as LaTeX | Only published Methods


PEDESTRIAN


Method Setting Code sMOTSA MOTSA MOTSP MOTSAL MODSA MODSP MT ML IDS Frag Runtime Environment
1 ViP-DeepLab 68.70 % 84.50 % 82.30 % 85.50 % 85.50 % 93.90 % 73.30 % 2.60 % 209 443 0.1 s 1 core @ 2.5 Ghz (C/C++)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
2 ReMOTS 66.00 % 81.30 % 82.00 % 83.20 % 83.20 % 94.00 % 62.60 % 5.60 % 391 551 3 s 1 core @ 2.5 Ghz (Python)
F. Yang, X. Chang, C. Dang, Z. Zheng, S. Sakti, S. Nakamura and Y. Wu: ReMOTS: Self-Supervised Refining Multi- Object Tracking and Segmentation. 2020.
3 MAF_HDA
This is an online method (no batch processing).
code 65.00 % 79.60 % 82.30 % 81.10 % 81.10 % 94.00 % 57.80 % 6.30 % 300 520 0.09 s 4 cores @ 4.2 Ghz (C/C++)
Y. Song, Y. Yoon, K. Yoon and M. Jeon: Multi-Object Tracking and Segmentation with Embedding Mask-based Affinity Fusion in Hierarchical Data Association. IEEE Access 2022.
4 GMPHD_SAF
This is an online method (no batch processing).
62.80 % 78.20 % 81.60 % 80.40 % 80.50 % 93.70 % 59.30 % 4.80 % 474 696 0.08 s 4 cores @ 4.2 Ghz (C/C++)
Y. Song and M. Jeon: Online Multi-Object Tracking and Segmentation with GMPHD Filter and Simple Affinity Fusion. arXiv preprint arXiv:2009.00100 2020.
5 PointTrack 61.50 % 76.50 % 81.00 % 77.40 % 77.40 % 93.80 % 48.90 % 9.30 % 176 632 0.045 s GPU @ 2.5 Ghz (Python)
Z. Xu, W. Zhang, X. Tan, W. Yang, H. Huang, S. Wen, E. Ding and L. Huang: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation. Proceedings of the European Conference on Computer Vision (ECCV) 2020.
6 OPITrack code 61.00 % 75.70 % 81.30 % 76.90 % 76.90 % 93.80 % 53.00 % 8.50 % 233 707 0.09 s 1 core @ 2.5 Ghz (C/C++)
Y. Gao, H. Xu, Y. Zheng, J. Li and X. Gao: An Object Point Set Inductive Tracker for Multi-Object Tracking and Segmentation. IEEE Transactions on Image Processing 2022.
7 SearchTrack code 60.60 % 78.90 % 78.20 % 80.70 % 80.80 % 93.00 % 60.40 % 4.40 % 390 714 0.19 s GPU @ 2.5 Ghz (Python)
Z. Tsai, Y. Tsai, C. Wang, H. Liao, Y. Lin and Y. Chuang: SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features. BMVC 2022.
8 MOTSFusion code 58.70 % 72.90 % 81.50 % 74.20 % 74.20 % 94.10 % 47.40 % 15.60 % 279 534 0.44 s 1 core @ 2.5 Ghz (C/C++)
J. Luiten, T. Fischer and B. Leibe: Track to Reconstruct and Reconstruct to Track. IEEE Robotics and Automation Letters 2020.
9 EagerMOT code 58.10 % 72.00 % 81.50 % 73.30 % 73.30 % 94.10 % 43.30 % 13.70 % 270 633 0.011 s 4 cores @ 3.0 Ghz (Python)
A. Kim, A. Osep and L. Leal-Taix'e: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion. IEEE International Conference on Robotics and Automation (ICRA) 2021.
10 MPNTrackSeg code 57.30 % 77.00 % 76.00 % 77.70 % 77.70 % 91.90 % 56.30 % 9.60 % 162 620 0.08 s 8 cores @ 2.5 Ghz (Python)
G. Bras\'o, O. Cetintas and L. Leal-Taix\'e: Multi-Object Tracking and Segmentation Via Neural Message Passing. International Journal of Computer Vision 2022.
11 MG-MOTS
This is an online method (no batch processing).
54.40 % 70.80 % 78.50 % 72.40 % 72.50 % 93.50 % 41.50 % 19.60 % 351 737 41 s GPU @ 2.5 Ghz (Python)
J. Seong: Online and real-time mask-guided multi- person tracking and segmentation. Pattern Recognition Letters 2023.
12 TrackR-CNN code 47.30 % 66.10 % 74.60 % 68.40 % 68.40 % 91.80 % 45.60 % 13.30 % 481 861 0.5 s GPU @ 2.5 Ghz (Python)
P. Voigtlaender, M. Krause, A. O\usep, J. Luiten, B. Sekar, A. Geiger and B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
13 STC-Seg 42.60 % 57.70 % 75.60 % 59.60 % 59.60 % 92.60 % 27.80 % 18.50 % 408 780 0.25 s 1 core @ 2.5 Ghz (C/C++)
Y. Liqi, W. Qifan, M. Siqi, W. Jingang and C. Yu: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2022.
Table as LaTeX | Only published Methods


Citation

When using this dataset in your research, we will be happy if you cite us:
@inproceedings{Voigtlaender2019CVPR,
  author = {Paul Voigtlaender and Michael Krause and Aljosa Osep and Jonathon Luiten and Berin Balachandar Gnana Sekar and Andreas Geiger and Bastian Leibe},
  title = {MOTS: Multi-Object Tracking and Segmentation},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2019}
}



eXTReMe Tracker