The KITTI Vision Benchmark Suite

Method

Joint Homography and Interacting Tracking [JHIT]

Submitted on 25 Jul. 2024 07:42 by
Paul Claasen (University of Pretoria)

Running time:		0.01 s
Environment:		1 core @ 3.5 Ghz (Python)

Method Description:

By modelling the camera projection matrix as part
of track state vectors, JHIT removes the explicit
influence of camera motion compensation techniques
on predicted track position states which is
prevalent in previous approaches. Expanding upon
this, static and dynamic camera motion models are
combined through the use of an IMM filter. A
simple bounding box motion model is used to
predict bounding box positions to incorporate
image plane information. JHIT dynamically mixes
bounding-box-based BIoU scores with ground-plane-
based Mahalanobis distances in an IMM-like fashion
to perform association. Finally, JHIT makes use of
dynamic process and measurement noise estimation
techniques.

Parameters:

\sigma_x=14.843421875000002, \sigma_y=16.249625,
\alpha_1=0.83375, \alpha_2=0.5, d_{conf}=0.5,
d_{high}=0.29999999999999993, \Omega=100.0, b=0.0,
\alpha_3=0.9065624999999999, p_{I,I}=0.9,
p_{G,G}=0.9, p_{s,s}=0.71625, p_{d,d}=0.71625

Latex Bibtex:

@misc{claasen2024interactingmultiplemodelbasedjoin
t,
title={Interacting Multiple Model-based
Joint Homography Matrix and Multiple Object State
Estimation},
author={Paul Johannes Claasen and Johan
Pieter de Villiers},
year={2024},
eprint={2409.02562},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.02562},
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.

Benchmark	MOTA	MOTP	MODA	MODP
CAR	90.29 %	85.61 %	90.78 %	88.33 %
PEDESTRIAN	63.13 %	74.60 %	65.13 %	92.14 %

Benchmark	recall	precision	F1	TP	FP	FN	FAR	#objects	#trajectories
CAR	94.51 %	97.32 %	95.89 %	36985	1020	2150	9.17 %	47304	861
PEDESTRIAN	74.87 %	88.80 %	81.24 %	17481	2205	5867	19.82 %	24447	358

Benchmark	MT	PT	ML	IDS	FRAG
CAR	84.77 %	12.00 %	3.23 %	168	251
PEDESTRIAN	45.02 %	35.74 %	19.24 %	463	964

This table as LaTeX

[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Method

Detailed Results

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago