Method

S3MOT: Monocular 3D Object Tracking with Selective State Space Model [on] [S3MOT]
[Anonymous Submission]

Submitted on 28 Oct. 2024 11:47 by
[Anonymous Submission]

Running time:0.03 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
We introduce S3MOT, a Selective State Space model-
based MOT method that efficiently infers 3D motion
and object associations from 2D images through three
core components: (i) Fully Convolutional, One-stage
Embedding (FCOE), which uses dense feature maps for
contrastive learning to enhance the representational
robustness of extracted Re-ID features, mitigating
challenges from occlusions and perspective
variations; (ii) VeloSSM, a specialized SSM-based
encoder-decoder structure, addresses scale
inconsistency and refines motion predictions by
modeling temporal dependencies in velocity dynamics;
and (iii) Hungarian State Space Model (HSSM), which
employs input-adaptive spatiotemporal scanning and
merging, grounded in SSM principles, to associate
diverse tracking cues efficiently and ensure
reliable tracklet-detection assignments
Parameters:
N/A
Latex Bibtex:

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.


Benchmark MOTA MOTP MODA MODP
CAR 86.96 % 86.56 % 88.65 % 89.55 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 95.26 % 94.90 % 95.08 % 37711 2028 1875 18.23 % 44931 1433

Benchmark MT PT ML IDS FRAG
CAR 84.77 % 14.00 % 1.23 % 582 762

This table as LaTeX


[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker