Method

MonoLSS: Learnable Sample Selection For Monocular 3D Detection [MonoLSS]


Submitted on 2 Jan. 2023 08:24 by
Jia Jinrang (Baidu Inc.)

Running time:0.04 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
In the field of autonomous driving, monocular 3D
detection is a critical task which estimates 3D
properties (depth, dimension, and orientation) of
objects in a single RGB image. Previous works have
used features in a heuristic way to learn 3D
properties, without considering that inappropriate
features could have adverse effects. In this paper,
sample selection is introduced that only suitable
samples should be trained to regress the 3D
properties. To select samples adaptively, we propose
a Learnable Sample Selection (LSS) module, which is
based on Gumbel-Softmax and a relative-distance
sample divider. The LSS module works under a warm-up
strategy leading to an improvement in training
stability. Additionally, since the LSS module
dedicated to 3D property sample selection relies on
object-level features, we further develop a data
augmentation method named MixUp3D to enrich 3D
property samples which conforms to imaging
principles without introducing ambiguity.
Parameters:
Nono
Latex Bibtex:
@inproceedings{monolss,
title={MonoLSS: Learnable Sample Selection For
Monocular 3D Detection},
author={Li, Zhenjia and Jia, Jinrang and Shi,
Yifeng},
booktitle={International Conference on 3D Vision},
year={2024}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.19 % 93.42 % 83.62 %
Car (Orientation) 95.99 % 93.11 % 83.14 %
Car (3D Detection) 26.11 % 19.15 % 16.94 %
Car (Bird's Eye View) 34.89 % 25.95 % 22.59 %
Pedestrian (Detection) 82.88 % 67.78 % 60.87 %
Pedestrian (Orientation) 75.13 % 60.28 % 53.85 %
Pedestrian (3D Detection) 17.09 % 11.27 % 10.00 %
Pedestrian (Bird's Eye View) 18.40 % 12.34 % 10.54 %
Cyclist (Detection) 74.54 % 54.63 % 47.98 %
Cyclist (Orientation) 65.31 % 47.09 % 41.74 %
Cyclist (3D Detection) 7.23 % 4.34 % 3.92 %
Cyclist (Bird's Eye View) 8.88 % 5.52 % 4.98 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker