The KITTI Vision Benchmark Suite

A. Geiger | P. Lenz | C. Stiller | R. Urtasun

Multi-Object Tracking and Segmentation (MOTS) Evaluation

The Multi-Object and Segmentation (MOTS) benchmark consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixelwise segmentation labels for every object. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML (adapted for the segmentation case).

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.

Additional information used by the methods

Stereo: Method uses left and right (stereo) images
Laser Points: Method uses point clouds from Velodyne laser scanner
GPS: Method uses GPS information
Online: Online method (frame-by-frame processing, no latency)
Additional training data: Use of additional data sources for training (see details)

CAR

	Method	Setting	Code	sMOTSA	MOTSA	MOTSP	MOTSAL	MODSA	MODSP	MT	ML	IDS	Frag	Runtime	Environment
1	ViP-DeepLab			81.00 %	90.70 %	89.90 %	91.80 %	91.80 %	92.20 %	92.20 %	0.60 %	392	580	0.1 s	1 core @ 2.5 Ghz (C/C++)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
2	PointTrack			78.50 %	90.90 %	87.10 %	91.80 %	91.80 %	89.70 %	90.80 %	0.60 %	346	645	0.045 s	GPU @ 2.5 Ghz (Python)
Z. Xu, W. Zhang, X. Tan, W. Yang, H. Huang, S. Wen, E. Ding and L. Huang: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation. Proceedings of the European Conference on Computer Vision (ECCV) 2020.
3	OPITrack		code	78.00 %	90.40 %	87.20 %	91.80 %	91.80 %	89.70 %	91.30 %	0.80 %	542	832	0.09 s	1 core @ 2.5 Ghz (C/C++)
Y. Gao, H. Xu, Y. Zheng, J. Li and X. Gao: An Object Point Set Inductive Tracker for Multi-Object Tracking and Segmentation. IEEE Transactions on Image Processing 2022.
4	MAF_HDA		code	77.20 %	87.70 %	88.40 %	88.90 %	88.90 %	90.90 %	82.00 %	0.80 %	415	706	0.09 s	4 cores @ 4.2 Ghz (C/C++)
Y. Song, Y. Yoon, K. Yoon and M. Jeon: Multi-Object Tracking and Segmentation with Embedding Mask-based Affinity Fusion in Hierarchical Data Association. IEEE Access 2022.
5	ReMOTS			75.90 %	86.70 %	88.20 %	88.70 %	88.70 %	90.70 %	84.50 %	0.60 %	716	905	3 s	1 core @ 2.5 Ghz (Python)
F. Yang, X. Chang, C. Dang, Z. Zheng, S. Sakti, S. Nakamura and Y. Wu: ReMOTS: Self-Supervised Refining Multi- Object Tracking and Segmentation. 2020.
6	GMPHD_SAF			75.40 %	86.70 %	87.50 %	88.20 %	88.20 %	90.10 %	82.00 %	0.60 %	549	874	0.08 s	4 cores @ 4.2 Ghz (C/C++)
Y. Song and M. Jeon: Online Multi-Object Tracking and Segmentation with GMPHD Filter and Simple Affinity Fusion. arXiv preprint arXiv:2009.00100 2020.
7	MOTSFusion		code	75.00 %	84.10 %	89.30 %	84.70 %	84.70 %	91.70 %	66.10 %	6.20 %	201	572	0.44 s	GPU @ 2.5 Ghz (Python)
J. Luiten, T. Fischer and B. Leibe: Track to Reconstruct and Reconstruct to Track. IEEE Robotics and Automation Letters 2020.
8	SearchTrack		code	74.80 %	86.80 %	86.80 %	88.50 %	88.50 %	89.70 %	80.00 %	1.50 %	614	983	0.19 s	GPU @ 2.5 Ghz (Python)
Z. Tsai, Y. Tsai, C. Wang, H. Liao, Y. Lin and Y. Chuang: SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features. BMVC 2022.
9	EagerMOT		code	74.50 %	83.50 %	89.60 %	84.80 %	84.80 %	92.10 %	67.10 %	3.50 %	457	811	0.011 s	4 cores @ 3.0 Ghz (Python)
A. Kim, A. Osep and L. Leal-Taix'e: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion. IEEE International Conference on Robotics and Automation (ICRA) 2021.
10	TrackR-CNN		code	67.00 %	79.60 %	85.10 %	81.50 %	81.50 %	88.30 %	74.90 %	2.30 %	692	1058	0.5 s	GPU @ 2.5 Ghz (Python)
P. Voigtlaender, M. Krause, A. O\usep, J. Luiten, B. Sekar, A. Geiger and B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
11	STC-Seg			66.20 %	81.10 %	82.80 %	82.90 %	82.90 %	86.30 %	71.90 %	2.00 %	676	1093	0.25 s	1 core @ 2.5 Ghz (C/C++)
Y. Liqi, W. Qifan, M. Siqi, W. Jingang and C. Yu: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2022.

Table as LaTeX | Only published Methods

PEDESTRIAN

	Method	Setting	Code	sMOTSA	MOTSA	MOTSP	MOTSAL	MODSA	MODSP	MT	ML	IDS	Frag	Runtime	Environment
1	ViP-DeepLab			68.70 %	84.50 %	82.30 %	85.50 %	85.50 %	93.90 %	73.30 %	2.60 %	209	443	0.1 s	1 core @ 2.5 Ghz (C/C++)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
2	ReMOTS			66.00 %	81.30 %	82.00 %	83.20 %	83.20 %	94.00 %	62.60 %	5.60 %	391	551	3 s	1 core @ 2.5 Ghz (Python)
F. Yang, X. Chang, C. Dang, Z. Zheng, S. Sakti, S. Nakamura and Y. Wu: ReMOTS: Self-Supervised Refining Multi- Object Tracking and Segmentation. 2020.
3	MAF_HDA		code	65.00 %	79.60 %	82.30 %	81.10 %	81.10 %	94.00 %	57.80 %	6.30 %	300	520	0.09 s	4 cores @ 4.2 Ghz (C/C++)
Y. Song, Y. Yoon, K. Yoon and M. Jeon: Multi-Object Tracking and Segmentation with Embedding Mask-based Affinity Fusion in Hierarchical Data Association. IEEE Access 2022.
4	GMPHD_SAF			62.80 %	78.20 %	81.60 %	80.40 %	80.50 %	93.70 %	59.30 %	4.80 %	474	696	0.08 s	4 cores @ 4.2 Ghz (C/C++)
Y. Song and M. Jeon: Online Multi-Object Tracking and Segmentation with GMPHD Filter and Simple Affinity Fusion. arXiv preprint arXiv:2009.00100 2020.
5	PointTrack			61.50 %	76.50 %	81.00 %	77.40 %	77.40 %	93.80 %	48.90 %	9.30 %	176	632	0.045 s	GPU @ 2.5 Ghz (Python)
Z. Xu, W. Zhang, X. Tan, W. Yang, H. Huang, S. Wen, E. Ding and L. Huang: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation. Proceedings of the European Conference on Computer Vision (ECCV) 2020.
6	OPITrack		code	61.00 %	75.70 %	81.30 %	76.90 %	76.90 %	93.80 %	53.00 %	8.50 %	233	707	0.09 s	1 core @ 2.5 Ghz (C/C++)
Y. Gao, H. Xu, Y. Zheng, J. Li and X. Gao: An Object Point Set Inductive Tracker for Multi-Object Tracking and Segmentation. IEEE Transactions on Image Processing 2022.
7	SearchTrack		code	60.60 %	78.90 %	78.20 %	80.70 %	80.80 %	93.00 %	60.40 %	4.40 %	390	714	0.19 s	GPU @ 2.5 Ghz (Python)
Z. Tsai, Y. Tsai, C. Wang, H. Liao, Y. Lin and Y. Chuang: SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features. BMVC 2022.
8	MOTSFusion		code	58.70 %	72.90 %	81.50 %	74.20 %	74.20 %	94.10 %	47.40 %	15.60 %	279	534	0.44 s	1 core @ 2.5 Ghz (C/C++)
J. Luiten, T. Fischer and B. Leibe: Track to Reconstruct and Reconstruct to Track. IEEE Robotics and Automation Letters 2020.
9	EagerMOT		code	58.10 %	72.00 %	81.50 %	73.30 %	73.30 %	94.10 %	43.30 %	13.70 %	270	633	0.011 s	4 cores @ 3.0 Ghz (Python)
A. Kim, A. Osep and L. Leal-Taix'e: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion. IEEE International Conference on Robotics and Automation (ICRA) 2021.
10	MPNTrackSeg		code	57.30 %	77.00 %	76.00 %	77.70 %	77.70 %	91.90 %	56.30 %	9.60 %	162	620	0.08 s	8 cores @ 2.5 Ghz (Python)
G. Bras\'o, O. Cetintas and L. Leal-Taix\'e: Multi-Object Tracking and Segmentation Via Neural Message Passing. International Journal of Computer Vision 2022.
11	MG-MOTS			54.40 %	70.80 %	78.50 %	72.40 %	72.50 %	93.50 %	41.50 %	19.60 %	351	737	41 s	GPU @ 2.5 Ghz (Python)
J. Seong: Online and real-time mask-guided multi- person tracking and segmentation. Pattern Recognition Letters 2023.
12	TrackR-CNN		code	47.30 %	66.10 %	74.60 %	68.40 %	68.40 %	91.80 %	45.60 %	13.30 %	481	861	0.5 s	GPU @ 2.5 Ghz (Python)
P. Voigtlaender, M. Krause, A. O\usep, J. Luiten, B. Sekar, A. Geiger and B. Leibe: MOTS: Multi-Object Tracking and Segmentation. CVPR 2019.
13	STC-Seg			42.60 %	57.70 %	75.60 %	59.60 %	59.60 %	92.60 %	27.80 %	18.50 %	408	780	0.25 s	1 core @ 2.5 Ghz (C/C++)
Y. Liqi, W. Qifan, M. Siqi, W. Jingang and C. Yu: Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2022.

Table as LaTeX | Only published Methods

Citation

When using this dataset in your research, we will be happy if you cite us:
@inproceedings{Voigtlaender2019CVPR,
author = {Paul Voigtlaender and Michael Krause and Aljosa Osep and Jonathon Luiten and Berin Balachandar Gnana Sekar and Andreas Geiger and Bastian Leibe},
title = {MOTS: Multi-Object Tracking and Segmentation},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2019}
}

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Multi-Object Tracking and Segmentation (MOTS) Evaluation

CAR

PEDESTRIAN

Citation

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago