Method

MOTSFusion[st][at] [MOTSFusion]
https://github.com/tobiasfshr/MOTSFusion

Submitted on 28 Mar. 2019 13:43 by
Jonathon Luiten (RWTH Aachen University)

Running time:0.44s
Environment:GPU (Python)

Method Description:
First we build tracklets by calculating a segmentation mask for each detection and linking these over time using optical flow. We then fuse these tracklets into 3D object reconstuctions using depth and ego motion estimates. These 3D reconstructions are then used to estimate the 3D motion of objects, which is used to merge tracklets into long-term tracks, bridging occlusion gaps of up to 20 frames. This also allows us to fill in missing detections.
Parameters:
Detections = RRC
Latex Bibtex:
@article{luiten2019MOTSFusion,
title={Track to Reconstruct and Reconstruct to Track},
author={Luiten, Jonathon and Fischer, Tobias and Leibe, Bastian},
journal={arXiv preprint arXiv:1910.00130},
year={2019},
publisher={arXiv}
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.


Benchmark MOTA MOTP MODA MODP
CAR 84.83 % 85.21 % 85.63 % 88.28 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 88.76 % 98.02 % 93.16 % 33634 681 4260 6.12 % 38414 1155

Benchmark MT PT ML IDS FRAG
CAR 73.08 % 24.15 % 2.77 % 275 759

This table as LaTeX


[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker