The KITTI Vision Benchmark Suite

Method

Cascade TWiX [on] [C-TWiX]
https://github.com/Guepardow/TWiX/

Submitted on 16 Nov. 2024 16:18 by
Mehdi Miah (Polytechnique Montréal)

Running time:		0.01 s
Environment:		8 cores @ >3.5 Ghz (Python)

Method Description:

No CMC, no visual appearances, no ReID, no BEV, no LIDAR, no depth estimation; only spatio-temporal coordinates (xmin, ymin, xmax, ymax, t).

Detections : Permatrack
Pipeline : Online Cascade matching (cf C-BIoU)

We focused on the association step by training a Transformer-based network to predict the similarity matrix between past tracks and current detections.

Parameters:

Maximal temporal gap at STA/LTA : 0.0s/0.8s
Past window size : 0.4s
Future window size : 1/fps sec
Matching association thresholds at STA/LTA : 0.4/-0.6

Tracking maximum age : 0.8s
Tracking Minimal score : 50%

Latex Bibtex:

@article{miah2024learningdata,
title = {Learning data association for multi-object tracking using only coordinates},
journal = {Pattern Recognition},
volume = {160},
pages = {111169},
year = {2025},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2024.111169},
url = {https://www.sciencedirect.com/science/article/pii/S0031320324009208},
author = {Mehdi Miah and Guillaume-Alexandre Bilodeau and Nicolas Saunier},
keywords = {Tracking, Transformer, Data association, Motion, Multi-object tracking}
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

Benchmark	MOTA	MOTP	MODA	MODP
CAR	90.03 %	85.62 %	91.03 %	88.24 %
PEDESTRIAN	64.32 %	75.52 %	65.34 %	92.28 %

Benchmark	recall	precision	F1	TP	FP	FN	FAR	#objects	#trajectories
CAR	92.83 %	99.14 %	95.88 %	35907	313	2772	2.81 %	44173	1011
PEDESTRIAN	71.66 %	92.24 %	80.66 %	16732	1407	6617	12.65 %	21306	460

Benchmark	MT	PT	ML	IDS	FRAG
CAR	82.15 %	14.92 %	2.92 %	344	620
PEDESTRIAN	42.61 %	39.86 %	17.53 %	236	896

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago