Method

S3MOT: Monocular 3D Object Tracking with Selective State Space Model [on] [S3MOT]
[Anonymous Submission]

Submitted on 28 Oct. 2024 11:47 by
[Anonymous Submission]

Running time:0.03 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
We introduce S3MOT, a Selective State Space model-
based MOT method that efficiently infers 3D motion
and object associations from 2D images through three
core components: (i) Fully Convolutional, One-stage
Embedding (FCOE), which uses dense feature maps for
contrastive learning to enhance the representational
robustness of extracted Re-ID features, mitigating
challenges from occlusions and perspective
variations; (ii) VeloSSM, a specialized SSM-based
encoder-decoder structure, addresses scale
inconsistency and refines motion predictions by
modeling temporal dependencies in velocity dynamics;
and (iii) Hungarian State Space Model (HSSM), which
employs input-adaptive spatiotemporal scanning and
merging, grounded in SSM principles, to associate
diverse tracking cues efficiently and ensure
reliable tracklet-detection assignments
Parameters:
N/A
Latex Bibtex:

Detailed Results

From all 29 test sequences, our benchmark computes the HOTA tracking metrics (HOTA, DetA, AssA, DetRe, DetPr, AssRe, AssPr, LocA) [1] as well as the CLEARMOT, MT/PT/ML, identity switches, and fragmentation [2,3] metrics. The tables below show all of these metrics.


Benchmark HOTA DetA AssA DetRe DetPr AssRe AssPr LocA
CAR 76.86 % 76.95 % 77.41 % 83.79 % 83.41 % 81.01 % 87.99 % 87.87 %

Benchmark TP FP FN
CAR 32493 1899 2053

Benchmark MOTA MOTP MODA IDSW sMOTA
CAR 86.93 % 86.60 % 88.51 % 543 74.27 %

Benchmark MT rate PT rate ML rate FRAG
CAR 85.08 % 13.69 % 1.23 % 239

Benchmark # Dets # Tracks
CAR 34546 1122

This table as LaTeX


This figure as: png pdf

[1] J. Luiten, A. Os̆ep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. IJCV 2020.
[2] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[3] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker