Method

Fine-grained Multi-level Fusion for Anti-occlusion Monocular 3D Object Detection [FMF-occlusion-net]


Submitted on 13 Oct. 2022 06:47 by
He Liu (Tsinghua University)

Running time:0.16 s
Environment:1 core @ 2.5 Ghz (Python + C/C++)

Method Description:
We propose a deep fine-grained multi-level fusion
architecture for monocular 3D object detection,
with an additionally designed anti-occlusion
optimization process. We integrate the monocular
3D features with the pseudo-LiDAR filter
generation network between fine-grained multi-
level layers. Our network utilizes the inherent
multi-scale and promotes depth and semantic
information flow in different stages. The new
architecture can obtain features that incorporate
more reliable depth information. We propose a
novel loss function that aims at alleviating the
problem of occlusion.
Parameters:
We pad and resize the original image to 1720 ×
512 for training and testing. We run stochastic
gradient descent (SGD) optimizer with a momentum
0.9 and a weight decay 0.0005. The iteration
number for the training process is set to 40,000
using the “poly” learning rate policy, the base
learning rate to 0.01, and power to 0.9.
Latex Bibtex:
@article{liu2022fine,
title={Fine-grained Multi-level Fusion for Anti-
occlusion Monocular 3D Object Detection},
author={Liu, He and Liu, Huaping and Wang, Yikai
and Sun, Fuchun and Huang, Wenbing},
journal={IEEE Transactions on Image Processing},
year={2022},
publisher={IEEE}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 92.33 % 78.21 % 61.58 %
Car (Orientation) 91.51 % 75.95 % 59.55 %
Car (3D Detection) 20.28 % 13.12 % 9.56 %
Car (Bird's Eye View) 27.39 % 17.60 % 13.25 %
Pedestrian (Detection) 49.26 % 34.74 % 30.37 %
Pedestrian (Orientation) 38.13 % 26.28 % 22.91 %
Pedestrian (3D Detection) 7.62 % 5.23 % 4.28 %
Pedestrian (Bird's Eye View) 8.69 % 5.62 % 5.25 %
Cyclist (Detection) 37.41 % 23.59 % 21.20 %
Cyclist (Orientation) 23.82 % 15.24 % 13.84 %
Cyclist (3D Detection) 1.87 % 1.60 % 1.66 %
Cyclist (Bird's Eye View) 1.91 % 1.65 % 1.75 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker