Object Tracking Evaluation 2012


The object tracking benchmark consists of 21 training sequences and 29 test sequences. Despite the fact that we have labeled 8 different classes, only the classes 'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those classes enough instances for a comprehensive evaluation have been labeled. The labeling process has been performed in two steps: First we hired a set of annotators, to label 3D bounding boxes as tracklets in point clouds. Since for a pedestrian tracklet, a single 3D bounding box tracklet (dimensions have been fixed) often fits badly, we additionally labeled the left/right boundaries of each object by making use of Mechanical Turk. We also collected labels of the object's occlusion state, and computed the object's truncation via backprojecting a car/pedestrian model into the image plane. We evaluate submitted results using the common metrics CLEAR MOT and MT/PT/ML. Since there is no single ranking criterion, we do not rank methods. Out development kit provides details about the data format as well as utility functions for reading and writing the label files.

The goal in the object tracking task is to estimate object tracklets for the classes 'Car' and 'Pedestrian'. We evaluate 2D 0-based bounding boxes in each image. We like to encourage people to add a confidence measure for every particular frame for this track. For evaluation we only consider detections/objects larger than 25 pixel (height) in the image and do not count Vans as false positives for cars or Sitting Persons as wrong positives for Pedestrians due to their similarity in appearance. As evaluation criterion we follow the CLEARMOT [1] and Mostly-Tracked/Partly-Tracked/Mostly-Lost [2] metrics. We do not rank methods by a single criterion, but bold numbers indicate the best method for a particular metric. To make the methods comparable, the time for object detection is not included in the specified runtime.

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

Note 1: On 01.06.2015 we have fixed several bugs in the evaluation script and also in the calculation of the CLEAR MOT metrics. We have furthermore fixed some problems in the annotations of the training and test set (almost completely occluded objects are no longer counted as false negatives). Furthermore, from now on vans are not counted as false positives for cars and sitting persons not as false positives for pedestrians. We have also improved the devkit with new illustrations and re-calculated the results for all methods. Please download the devkit and the annotations/labels with the improved ground truth for training again if you have downloaded the files prior to 20.05.2015. Please consider reporting these new number for all future submissions. The last leaderboards right before the changes can be found here!

Note 2: On 27.11.2015 we have fixed a bug in the evaluation script which prevented van labels from being loaded and led to don't care areas being evaluated. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!

Note 3: On 25.05.2016 we have fixed a bug in the evaluation script wrt. overcounting of ignored detections. Thanks to Adrien Gaidon for reporting this bug. Please download the devkit with the corrected evaluation script (if you want to evaluate on the training set) and consider reporting the new numbers for all future submissions. The leaderboard has been updated. The last leaderboards right before the changes can be found here!

Note 4: On 25.04.2017 a major update of the evaluation script includes the following changes: the counting of ignored detections was corrected; occlusion, truncation and minimum height handling was corrected; and the evaluation summary includes additional statistics. In detail, submitted detections are ignored (i.e. not considered) if they are classified as a "neighboring class" (i.e. 'Van' for 'Car' or 'Cyclist' for 'Pedestrian'), if they do not exceed the minimum height of 25px or if there is an overlap of 0.5 or greater with a 'Don't Care' area. In contrary, ground truth detections are ignored if the occlusion exceeds occlusion level 2, if the truncation exceeds the maximum truncation of 0 or if it belongs to a neighboring class (i.e. 'Van' for 'Car' or 'Cyclist' for 'Pedestrian'). We made sure that true positives, false positives, true negatives and false negatives are counted correctly. Finally, the evaluation summary now includes information about the number of ignored detections. We like to thank the following researchers for detailed feedback: Adrien Gaidon, Jonathan D. Kuck and Jose M. Buenaposada. The last leaderboards right before the changes can be found here!

Additional information used by the methods
  • Stereo: Method uses left and right (stereo) images
  • Laser Points: Method uses point clouds from Velodyne laser scanner
  • GPS: Method uses GPS information
  • Online: Online method (frame-by-frame processing, no latency)
  • Additional training data: Use of additional data sources for training (see details)

CAR


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
1 TuSimple
This is an online method (no batch processing).
86.62 % 83.97 % 72.46 % 6.77 % 293 501 0.6 s 1 core @ 2.5 Ghz (Matlab + C/C++)
2 NECMA 84.98 % 83.14 % 70.77 % 9.08 % 33 162 0.5 s 8 cores @ 2.5 Ghz (C/C++)
3 RRC-IIITH
This is an online method (no batch processing).
84.24 % 85.73 % 73.23 % 2.77 % 468 944 0.3 s 1 core @ 2.5 Ghz (C/C++)
4 RBPF 83.64 % 82.25 % 64.77 % 5.85 % 273 651 1 s 1 core @ 2.5 Ghz (Python)
5 IMMDP
This is an online method (no batch processing).
83.04 % 82.74 % 60.62 % 11.38 % 172 365 0.19 s 4 cores @ >3.5 Ghz (Matlab + C/C++)
6 DuEye
This is an online method (no batch processing).
80.64 % 83.52 % 61.85 % 5.85 % 356 991 0.15 s 1 core @ >3.5 Ghz (C/C++)
7 JCSTD
This is an online method (no batch processing).
80.57 % 81.81 % 56.77 % 7.38 % 61 643 0.11 s 1 core @ 2.5 Ghz (Matlab)
8 MCMOT-CPD 78.90 % 82.13 % 52.31 % 11.69 % 228 536 0.01 s 1 core @ 3.5 Ghz (Python)
B. Lee, E. Erdenee, S. Jin, M. Nam, Y. Jung and P. Rhee: Multi-class Multi-object Tracking Using Changing Point Detection. ECCVWORK 2016.
9 NOMT* 78.15 % 79.46 % 57.23 % 13.23 % 31 207 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
10 wan
This is an online method (no batch processing).
78.07 % 82.83 % 51.38 % 13.38 % 24 235 0.1 s 1 core @ 2.5 Ghz (C/C++)
11 LP-SSVM* 77.63 % 77.80 % 56.31 % 8.46 % 62 539 0.02 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
12 CCF-MOT
This is an online method (no batch processing).
77.08 % 78.36 % 52.62 % 13.08 % 69 391 1.1 s 1 core @ 3.6 Ghz (MATLAB)
13 MDP
This is an online method (no batch processing).
code 76.59 % 82.10 % 52.15 % 13.38 % 130 387 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
Y. Xiang, W. Choi, Y. Lin and S. Savarese: Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection. IEEE Winter Conference on Applications of Computer Vision (WACV) 2017.
14 DSM 76.15 % 83.42 % 60.00 % 8.31 % 296 868 0.1 s GPU @ 1.0 Ghz (Python)
15 SLP* 75.79 % 78.79 % 53.85 % 9.54 % 59 543 0.1 s 1 core @ 2.5 Ghz (Python + C/C++)
16 SCEA*
This is an online method (no batch processing).
75.58 % 79.39 % 53.08 % 11.54 % 104 448 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
17 CIWT*
This method uses stereo information.
This is an online method (no batch processing).
code 75.39 % 79.25 % 49.85 % 10.31 % 165 660 0.28 s 1 core @ 2.5 Ghz (C/C++)
A. Osep, W. Mehner, M. Mathias and B. Leibe: Combined Image- and World-Space Tracking in Traffic Scenes. ICRA 2017.
18 NOMT-HM*
This is an online method (no batch processing).
75.20 % 80.02 % 50.00 % 13.54 % 105 351 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
19 SSP* code 72.72 % 78.55 % 53.85 % 8.00 % 185 932 0.6 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
20 mbodSSP*
This is an online method (no batch processing).
code 72.69 % 78.75 % 48.77 % 8.77 % 114 858 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
21 TENSOR 71.18 % 79.15 % 47.85 % 11.69 % 418 947 0.04 s 1 core @ 2.5 Ghz (Matlab + C/C++)
22 MBKF
This is an online method (no batch processing).
69.77 % 83.03 % 41.23 % 11.38 % 410 971 0.01 s GPU @ 2.5 Ghz (C/C++)
23 DCO-X* code 68.11 % 78.85 % 37.54 % 14.15 % 318 959 0.9 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, K. Schindler and S. Roth: Detection- and Trajectory-Level Exclusion in Multiple Object Tracking. CVPR 2013.
24 NOMT 66.60 % 78.17 % 41.08 % 25.23 % 13 150 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
25 DBHM*
This is an online method (no batch processing).
66.22 % 77.72 % 52.15 % 8.15 % 1557 2060 0.15 s 4 cores @ 2.5 Ghz (C/C++)
26 RMOT*
This is an online method (no batch processing).
65.83 % 75.42 % 40.15 % 9.69 % 209 727 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
27 LP-SSVM 61.77 % 76.93 % 35.54 % 21.69 % 16 422 0.05 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
28 NOMT-HM
This is an online method (no batch processing).
61.17 % 78.65 % 33.85 % 28.00 % 28 241 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
29 ODAMOT
This is an online method (no batch processing).
59.23 % 75.45 % 27.08 % 15.54 % 389 1274 1 s 1 core @ 2.5 Ghz (Python)
A. Gaidon and E. Vig: Online Domain Adaptation for Multi-Object Tracking. British Machine Vision Conference (BMVC) 2015.
30 SSP code 57.85 % 77.64 % 29.38 % 24.31 % 7 704 0.6s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
31 SCEA
This is an online method (no batch processing).
57.03 % 78.84 % 26.92 % 26.62 % 17 461 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
32 mbodSSP
This is an online method (no batch processing).
code 56.03 % 77.52 % 23.23 % 27.23 % 0 699 0.01 s 1 core @ 2.7 Ghz (Python)
P. Lenz, A. Geiger and R. Urtasun: FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation. International Conference on Computer Vision (ICCV) 2015.
33 TDCS
This is an online method (no batch processing).
55.38 % 75.20 % 23.23 % 21.85 % 118 961 0.06 s 1 core @ 2.0 Ghz (Matlab + C/C++)
34 TBD code 55.07 % 78.35 % 20.46 % 32.62 % 31 529 10 s 1 core @ 2.5 Ghz (Matlab + C/C++)
A. Geiger, M. Lauer, C. Wojek, C. Stiller and R. Urtasun: 3D Traffic Scene Understanding from Movable Platforms. Pattern Analysis and Machine Intelligence (PAMI) 2014.
H. Zhang, A. Geiger and R. Urtasun: Understanding High-Level Semantics by Modeling Traffic Patterns. International Conference on Computer Vision (ICCV) 2013.
35 RMOT
This is an online method (no batch processing).
52.42 % 75.18 % 21.69 % 31.85 % 50 376 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
36 CEM code 51.94 % 77.11 % 20.00 % 31.54 % 125 396 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
37 MCF 45.92 % 78.25 % 14.92 % 37.23 % 21 581 0.01 s 1 core @ 2.5 Ghz (Python + C/C++)
L. Zhang, Y. Li and R. Nevatia: Global data association for multi-object tracking using network flows.. CVPR .
38 HM
This is an online method (no batch processing).
43.85 % 78.34 % 12.46 % 39.54 % 12 571 0.01 s 1 core @ 2.5 Ghz (Python)
A. Geiger: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms. 2013.
39 FMMOVT V2
This is an online method (no batch processing).
39.40 % 80.05 % 21.08 % 31.08 % 585 1122 0.05 s 1 core @ 2.5 Ghz (Python)
40 DP-MCF code 38.33 % 78.41 % 18.00 % 36.15 % 2716 3225 0.01 s 1 core @ 2.5 Ghz (Matlab)
H. Pirsiavash, D. Ramanan and C. Fowlkes: Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects. IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2011.
41 DCO code 37.28 % 74.36 % 15.54 % 30.92 % 220 612 0.03 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Andriyenko, K. Schindler and S. Roth: Discrete-Continuous Optimization for Multi-Target Tracking. CVPR 2012.
42 FMMOVT 31.88 % 77.68 % 21.38 % 34.92 % 511 930 0.05 s 1 core @ 2.5 Ghz (C/C++)
F. Alencar, C. Massera, D. Ridel and D. Wolf: Fast Metric Multi-Object Vehicle Tracking for Dynamical Environment Comprehension. Latin American Robotics Symposium (LARS), 2015 2015.
Table as LaTeX | Only published Methods

PEDESTRIAN


Method Setting Code MOTA MOTP MT ML IDS FRAG Runtime Environment
1 TuSimple
This is an online method (no batch processing).
58.15 % 71.93 % 30.58 % 24.05 % 138 818 0.6 s 1 core @ 2.5 Ghz (Matlab + C/C++)
2 ET-MOT
This is an online method (no batch processing).
51.44 % 72.65 % 25.43 % 18.21 % 396 1405 0.7 s GPU @ 2.5 Ghz (Python)
3 MDP
This is an online method (no batch processing).
code 47.22 % 70.36 % 24.05 % 27.84 % 87 825 0.9 s 8 cores @ 3.5 Ghz (Matlab + C/C++)
Y. Xiang, A. Alahi and S. Savarese: Learning to Track: Online Multi- Object Tracking by Decision Making. International Conference on Computer Vision (ICCV) 2015.
Y. Xiang, W. Choi, Y. Lin and S. Savarese: Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection. IEEE Winter Conference on Applications of Computer Vision (WACV) 2017.
4 NOMT* 46.62 % 71.45 % 26.12 % 34.02 % 63 666 0.09 s 16 cores @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
5 MCMOT-CPD 45.94 % 72.44 % 20.62 % 34.36 % 143 764 0.01 s 1 core @ 3.5 Ghz (Python)
B. Lee, E. Erdenee, S. Jin, M. Nam, Y. Jung and P. Rhee: Multi-class Multi-object Tracking Using Changing Point Detection. ECCVWORK 2016.
6 CCF-MOT
This is an online method (no batch processing).
44.52 % 68.38 % 24.40 % 37.11 % 211 976 1.1 s 1 core @ 3.6 Ghz (MATLAB)
7 JCSTD
This is an online method (no batch processing).
44.20 % 72.09 % 16.49 % 33.68 % 53 917 0.11 s 1 core @ 2.5 Ghz (Matlab)
8 SCEA*
This is an online method (no batch processing).
43.91 % 71.86 % 16.15 % 43.30 % 56 641 0.06 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
9 RMOT*
This is an online method (no batch processing).
43.77 % 71.02 % 19.59 % 41.24 % 153 748 0.02 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
10 LP-SSVM* 43.76 % 70.48 % 20.62 % 34.36 % 73 809 0.02 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
11 CIWT*
This method uses stereo information.
This is an online method (no batch processing).
code 43.37 % 71.44 % 13.75 % 34.71 % 112 901 0.28 s 1 core @ 2.5 Ghz (C/C++)
A. Osep, W. Mehner, M. Mathias and B. Leibe: Combined Image- and World-Space Tracking in Traffic Scenes. ICRA 2017.
12 MBKF
This is an online method (no batch processing).
42.88 % 72.33 % 27.49 % 20.96 % 501 1430 0.01 s GPU @ 2.5 Ghz (C/C++)
13 NECMA 42.67 % 72.51 % 30.58 % 39.18 % 49 529 0.5 s 8 cores @ 2.5 Ghz (C/C++)
14 NOMT-HM*
This is an online method (no batch processing).
39.26 % 71.14 % 21.31 % 41.92 % 184 863 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
15 LXT-MOT 39.16 % 72.31 % 14.43 % 35.40 % 233 905 0.3 s GPU @ 2.5 Ghz (Python)
16 NOMT 36.93 % 67.75 % 17.87 % 42.61 % 34 789 0.09 s 16 core @ 2.5 Ghz (C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
17 RMOT
This is an online method (no batch processing).
34.54 % 68.06 % 14.43 % 47.42 % 81 685 0.01 s 1 core @ 3.5 Ghz (Matlab)
J. Yoon, M. Yang, J. Lim and K. Yoon: Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision (WACV) 2015.
18 LP-SSVM 33.33 % 67.38 % 12.37 % 45.02 % 72 818 0.05 s 1 core @ 2.5 Ghz (Matlab + C/C++)
S. Wang and C. Fowlkes: Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions. International Journal of Computer Vision 2016.
19 SCEA
This is an online method (no batch processing).
33.13 % 68.45 % 9.62 % 46.74 % 16 717 0.05 s 1 core @ 4.0 Ghz (Matlab + C/C++)
J. Yoon, C. Lee, M. Yang and K. Yoon: Online Multi-object Tracking via Structural Constraint Event Aggregation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2016.
20 CEM code 27.54 % 68.48 % 8.93 % 51.89 % 96 608 0.09 s 1 core @ >3.5 Ghz (Matlab + C/C++)
A. Milan, S. Roth and K. Schindler: Continuous Energy Minimization for Multitarget Tracking. IEEE TPAMI 2014.
21 NOMT-HM
This is an online method (no batch processing).
27.49 % 67.99 % 15.12 % 50.52 % 73 732 0.09 s 8 cores @ 2.5 Ghz (Matlab + C/C++)
W. Choi: Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor . ICCV 2015.
Table as LaTeX | Only published Methods

Related Datasets

  • TUD Datasets: "TUD Multiview Pedestrians" and "TUD Stadmitte" Datasets.
  • PETS 2009: The Datasets for the "Performance Evaluation of Tracking and Surveillance"" Workshop.
  • EPFL Terrace: Multi-camera pedestrian videos.
  • ETHZ Sequences: Inner City Sequences from Mobile Platforms.

Citation

When using this dataset in your research, we will be happy if you cite us:
@INPROCEEDINGS{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}
}



eXTReMe Tracker