The KITTI Vision Benchmark Suite

Method

SAM2-based Multi-object Tracking and Segmentation using Zero-shot Learning [Seg2Track-SAM2]
github.com/hcmr-lab/Seg2Track-SAM2

Submitted on 9 Sep. 2025 16:24 by
Diogo Mendonça (Universidade de Coimbra)

Running time:		1 s
Environment:		GPU @ 1.5 Ghz (Python)

Method Description:

This method extends SAM2 to multi-object tracking
and segmentation in a zero-shot setting. Objects are
initialized with a detector and refined over time
through object reinforcement, ensuring consistent
masks across frames without extra training.

Parameters:

\detection_threshold=0.5
\removal_threshold=0.1

Latex Bibtex:

@misc{mendonça2025seg2tracksam2sam2basedmultiobjecttracking,
title={Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization},
author={Diogo Mendonça and Tiago Barros and Cristiano Premebida and Urbano J. Nunes},
year={2025},
eprint={2509.11772},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.11772},
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics (adapted for the segmentation case): CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

Benchmark	sMOTSA	MOTSA	MOTSP	MODSA	MODSP
CAR	68.70 %	81.00 %	86.20 %	81.20 %	88.90 %
PEDESTRIAN	49.70 %	68.10 %	77.40 %	68.50 %	92.70 %

Benchmark	recall	precision	F1	TP	FP	FN	FAR	#objects	#trajectories
CAR	88.70 %	92.20 %	90.40 %	32616	2757	4144	24.80 %	47510	989
PEDESTRIAN	81.60 %	86.20 %	83.80 %	16882	2711	3815	24.40 %	27345	417

Benchmark	MT	PT	ML	IDS	FRAG
CAR	71.60 %	25.70 %	2.70 %	95	302
PEDESTRIAN	56.30 %	29.60 %	14.10 %	79	326

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago