Method

Instance Diversity-Enriched Active Learning for Monocular 3D Detection [IDEAL-M3D]


Submitted on 15 May. 2025 22:40 by
Johannes Meier (TU Munich)

Running time:0.04 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
Monocular 3D detection suffers from expensive 3D
labeling, making active learning (AL) crucial for
efficient annotation. Existing AL methods for this
task are inefficient because they select entire
images (labeling non-informative instances) and
use uncertainty-based selection, which biases the
model toward distant, depth-ambiguous objects. To
address this, the paper proposes IDEAL-M3D, the
first instance-level AL pipeline for monocular 3D
detection. IDEAL-M3D uses an explicitly diverse,
fast-to-train ensemble to drive selection and
achieves significant resource savings, reaching
similar or better 3D Average Precision AP3D
compared to training the same detector on the
whole dataset.
Parameters:
Baseline: MonoLSS
Latex Bibtex:
@InProceedings{Meier_2026_WACV,
author = {Meier, Johannes and Günther,
Florian and Marin, Riccardo and Dhaouadi, Oussema
and Kaiser, Jacques and Cremers, Daniel},
title = {{IDEAL-M3D:} Instance Diversity-
Enriched Active Learning for Monocular 3D
Detection},
booktitle = {Proceedings of the IEEE/CVF
Winter Conference on Applications of Computer
Vision (WACV)},
year = {2026}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.32 % 93.51 % 85.98 %
Car (Orientation) 96.22 % 93.27 % 85.51 %
Car (3D Detection) 27.06 % 18.87 % 16.73 %
Car (Bird's Eye View) 35.33 % 25.44 % 22.25 %
Pedestrian (Detection) 83.73 % 68.50 % 63.35 %
Pedestrian (Orientation) 75.99 % 61.20 % 56.16 %
Pedestrian (3D Detection) 13.73 % 8.50 % 7.52 %
Pedestrian (Bird's Eye View) 16.74 % 10.65 % 8.83 %
Cyclist (Detection) 75.87 % 58.75 % 50.33 %
Cyclist (Orientation) 68.10 % 50.82 % 43.58 %
Cyclist (3D Detection) 6.93 % 4.12 % 3.71 %
Cyclist (Bird's Eye View) 8.85 % 5.39 % 4.87 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker