The KITTI Vision Benchmark Suite

Method

S3MOT: Monocular 3D Object Tracking with Selective State Space Model [on] [S3MOT]
https://github.com/bytepioneerX/s3mot

Submitted on 28 Oct. 2024 11:47 by
haikun yan (Wuhan University)

Running time:		0.03 s
Environment:		1 core @ 2.5 Ghz (Python)

Method Description:

We introduce S3MOT, a Selective State Space model-
based MOT method that efficiently infers 3D motion
and object associations from 2D images through three
core components: (i) Fully Convolutional, One-stage
Embedding (FCOE), which uses dense feature maps for
contrastive learning to enhance the representational
robustness of extracted Re-ID features, mitigating
challenges from occlusions and perspective
variations; (ii) VeloSSM, a specialized SSM-based
encoder-decoder structure, addresses scale
inconsistency and refines motion predictions by
modeling temporal dependencies in velocity dynamics;
and (iii) Hungarian State Space Model (HSSM), which
employs input-adaptive spatiotemporal scanning and
merging, grounded in SSM principles, to associate
diverse tracking cues efficiently and ensure
reliable tracklet-detection assignments

Parameters:

N/A

Latex Bibtex:

@article{yan2025s3mot,
title={S3MOT: Monocular 3D Object Tracking with
Selective State Space Model},
author={Yan, Zhuohao and Feng, Shaoquan and Li,
Xingxing and Zhou, Yuxuan and Xia, Chunxi and Li,
Shengyu},
journal={arXiv preprint arXiv:2504.18068},
year={2025}
}

Detailed Results

From all 29 test sequences, our benchmark computes the HOTA tracking metrics (HOTA, DetA, AssA, DetRe, DetPr, AssRe, AssPr, LocA) [1] as well as the CLEARMOT, MT/PT/ML, identity switches, and fragmentation [2,3] metrics. The tables below show all of these metrics.

Benchmark	HOTA	DetA	AssA	DetRe	DetPr	AssRe	AssPr	LocA
CAR	76.86 %	76.95 %	77.41 %	83.79 %	83.41 %	81.01 %	87.99 %	87.87 %

Benchmark	TP	FP	FN
CAR	32493	1899	2053

Benchmark	MOTA	MOTP	MODA	IDSW	sMOTA
CAR	86.93 %	86.60 %	88.51 %	543	74.27 %

Benchmark	MT rate	PT rate	ML rate	FRAG
CAR	85.08 %	13.69 %	1.23 %	239

Benchmark	# Dets	# Tracks
CAR	34546	1122

This table as LaTeX

This figure as: png pdf

[1] J. Luiten, A. Os̆ep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. IJCV 2020.
[2] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[3] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago