Method

MultiStream Detection Network for LiDAR and Camera based 3D Object Detection [MuStD]
[Anonymous Submission]

Submitted on 12 Sep. 2024 14:33 by
[Anonymous Submission]

Running time:67 ms
Environment:>8 cores @ 2.5 Ghz (Python)

Method Description:
Multimodal approaches that fuse data from LiDAR
and RGB cameras have significant potential in
improving 3D object detection accuracy. However,
existing fusion methods often fail to fully
integrate 3D geometric and RGB spatial
information. To overcome these limitations, we
propose MultiStream Detection (MuStD) network,
designed to optimize the integration of LiDAR and
RGB data. MuStD network has three parallel
streams: the LiDAR-PillarNet stream for extracting
sparse 2D pillar features, the LiDAR-Height
Compression stream for Bird's-Eye View (BEV)
features, and the 3D Multimodal (MM) stream, which
combines RGB and LiDAR features using UV mapping
and polar coordinate indexing. This novel
architecture effectively captures both geometric
and texture information, addressing the challenges
of feature fusion.
Parameters:
OPTIMIZER: adam_onecycle
LR: 0.01
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9
Latex Bibtex:

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 97.91 % 97.21 % 94.04 %
Car (Orientation) 97.88 % 97.03 % 93.74 %
Car (3D Detection) 91.03 % 84.36 % 80.78 %
Car (Bird's Eye View) 94.62 % 91.13 % 88.28 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker