Method

PS-SVDM with a pre-trained model [PS-SVDM]


Submitted on 28 Apr. 2024 11:36 by
YuGuang Shi (University of Science and Technology Beijing)

Running time:1 s
Environment:1 core @ 2.5 Ghz (Python)

Method Description:
One of the key problems in 3D object detection is
to reduce the accuracy gap between methods based
on LiDAR sensors and those based on monocular
cameras. A recently proposed framework for
monocular 3D detection based on Pseudo-Stereo has
received considerable attention in the community.
However, three problems have been discovered in
existing practices: (1) relying on a high-
performance monocular depth estimator, (2) the
generated image suffering from visual holes,
deformations, and artifacts, and (3) being
difficult to be compatible with geometry-based
stereo detectors. In this work, we propose a novel
pseudo-stereo 3D detection framework without depth
estimation, called PS-SVDM. This framework
utilizes a diffusion model to generate a high-
quality virtual right view from a left image to
mimic the stereo camera signal. With this
representation, we can apply various existing
stereo image-based detection algorithms.
Parameters:
\alpha=0.8
Latex Bibtex:
@article{shi2023svdm,
title={SVDM: Single-View Diffusion Model for
Pseudo-Stereo 3D Object Detection},
author={Shi, Yuguang},
journal={arXiv preprint arXiv:2307.02270},
year={2023}
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 94.49 % 87.55 % 78.21 %
Car (Orientation) 94.20 % 86.88 % 77.34 %
Car (3D Detection) 29.22 % 18.13 % 15.35 %
Car (Bird's Eye View) 38.18 % 24.82 % 20.89 %
Pedestrian (Detection) 46.43 % 34.15 % 30.90 %
Pedestrian (Orientation) 33.74 % 24.19 % 21.63 %
Pedestrian (3D Detection) 12.93 % 8.33 % 7.20 %
Pedestrian (Bird's Eye View) 15.03 % 9.75 % 8.37 %
Cyclist (Detection) 46.46 % 30.95 % 27.00 %
Cyclist (Orientation) 29.75 % 19.50 % 17.08 %
Cyclist (3D Detection) 7.98 % 4.57 % 3.66 %
Cyclist (Bird's Eye View) 9.20 % 5.34 % 4.31 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker