Method

EPNet++ [EPNet++]


Submitted on 9 Aug. 2021 11:25 by
Zhe Liu (Huazhong University of Science and Technology)

Running time:0.1 s
Environment:GPU @ 2.5 Ghz (Python)

Method Description:
Recently, fusing the LiDAR point cloud and camera
image to improve the performance and robustness of
3D object detection has received more and more
attention, as these two modalities naturally
possess strong complementarity.
In this paper, we propose EPNet++ for multi-modal
3D object detection by introducing a novel Cascade
Bi-directional Fusion~(CB-Fusion) module and a
Multi-Modal Consistency~(MC) loss.
\zliu{More concretely, the proposed CB-Fusion
module enhances point features with plentiful
semantic information absorbed from the image
features in a cascade bi-directional interaction
fusion manner, leading to more powerful and
discriminative feature representations.}
The MC loss explicitly guarantees the consistency
between predicted scores from two modalities to
obtain more comprehensive and reliable confidence
scores. The experimental results on the KITTI,
JRDB and SUN-RGBD datasets demonstrate the
superiority of EPNet++ over the state-of-the-art
methods.
Parameters:
None
Latex Bibtex:
@ARTICLE{9983516, author={Liu, Zhe and Huang,
Tengteng and Li, Bingling and Chen, Xiwu and Wang,
Xi and Bai, Xiang}, journal={IEEE Transactions on
Pattern Analysis and Machine Intelligence},
title={EPNet++: Cascade Bi-Directional Fusion for
Multi-Modal 3D Object Detection}, year={2022},
volume={}, number={}, pages={1-18}, doi=
{10.1109/TPAMI.2022.3228806}}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.73 % 95.17 % 92.10 %
Car (Orientation) 96.70 % 95.00 % 91.82 %
Car (3D Detection) 91.37 % 81.96 % 76.71 %
Car (Bird's Eye View) 95.41 % 89.00 % 85.73 %
Pedestrian (Detection) 68.58 % 58.10 % 55.58 %
Pedestrian (Orientation) 51.89 % 43.29 % 40.98 %
Pedestrian (3D Detection) 52.79 % 44.38 % 41.29 %
Pedestrian (Bird's Eye View) 56.24 % 48.47 % 45.73 %
Cyclist (Detection) 80.27 % 68.30 % 63.00 %
Cyclist (Orientation) 79.81 % 67.26 % 61.75 %
Cyclist (3D Detection) 76.15 % 59.71 % 53.67 %
Cyclist (Bird's Eye View) 78.57 % 62.94 % 56.62 %
This table as LaTeX


2D object detection results.
This figure as: png eps pdf txt gnuplot



Orientation estimation results.
This figure as: png eps pdf txt gnuplot



3D object detection results.
This figure as: png eps pdf txt gnuplot



Bird's eye view results.
This figure as: png eps pdf txt gnuplot



2D object detection results.
This figure as: png eps pdf txt gnuplot



Orientation estimation results.
This figure as: png eps pdf txt gnuplot



3D object detection results.
This figure as: png eps pdf txt gnuplot



Bird's eye view results.
This figure as: png eps pdf txt gnuplot



2D object detection results.
This figure as: png eps pdf txt gnuplot



Orientation estimation results.
This figure as: png eps pdf txt gnuplot



3D object detection results.
This figure as: png eps pdf txt gnuplot



Bird's eye view results.
This figure as: png eps pdf txt gnuplot




eXTReMe Tracker