A Baseline for 3D Multi-Object Tracking [on] [la] [AB3DMOT]

Submitted on 6 Jul. 2020 05:43 by
Xinshuo Weng (Carnegie Mellon University)

Running time:0.0047s
Environment:1 core @ 2.5 Ghz (python)

Method Description:
3D multi-object tracking (MOT) is an essential
component technology for many real-time
applications such as autonomous driving or
assistive robotics. However, recent works for 3D
MOT tend to focus more on developing accurate
systems giving less regard to computational cost
and system complexity. In contrast, this work
proposes a simple yet accurate real-time baseline
3D MOT system. We use an off-the-shelf 3D object
detector to obtain oriented 3D bounding boxes from
the LiDAR point cloud. Then, a combination of 3D
Kalman filter and Hungarian algorithm is used for
state estimation and data association. Although
our baseline system is a straightforward
combination of standard methods, we obtain the
state-of-the-art results. To evaluate our baseline
system, we propose a new 3D MOT extension to the
official KITTI 2D MOT evaluation along with two
new metrics. Our proposed baseline method for 3D
MOT establishes new state-of-the-art performance
on 3D MOT for KITTI, improving the 3D MOTA from
72.23 of prior art to 76.47. Surprisingly, by
projecting our 3D tracking results to the 2D image
plane and compare against published 2D MOT
methods, our system places 2nd on the official
KITTI leaderboard. Also, our proposed 3D MOT
method runs at a rate of 214.7 FPS, 65 times
faster than the state-of-the-art 2D MOT system.
Latex Bibtex:
archivePrefix = {arXiv},
arxivId = {1907.03961},
author = {Weng, Xinshuo and Kitani, Kris},
eprint = {1907.03961},
journal = {arXiv:1907.03961},
title = {{A Baseline for 3D Multi-Object
url = {},
year = {2019}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

CAR 83.84 % 85.24 % 83.86 % 88.25 %
PEDESTRIAN 39.63 % 64.87 % 40.37 % 90.27 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 88.32 % 96.98 % 92.44 % 33954 1059 4491 9.52 % 38354 908
PEDESTRIAN 49.85 % 84.73 % 62.77 % 11639 2098 11707 18.86 % 14394 420

CAR 66.92 % 21.69 % 11.38 % 9 224
PEDESTRIAN 16.84 % 41.58 % 41.58 % 170 940

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

eXTReMe Tracker