The KITTI Vision Benchmark Suite

Method

Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints [Shift R-CNN (mono)]
https://arxiv.org/abs/1905.09970

Submitted on 30 Jan. 2019 16:08 by
Vlad Paunescu (Arnia Software)

Running time:		0.25 s
Environment:		GPU @ 1.5 Ghz (Python)

Method Description:

We propose Shift R-CNN, a hybrid model for monocular
3D object detection, which combines deep learning with
the power of geometry. We adapt a Faster R-CNN
network for regressing initial 2D and 3D object properties
and combine it with a least squares solution for the
inverse 2D to 3D geometric mapping problem, using the
camera projection matrix. The closed-form solution of
the mathematical system, along with the initial output of
the adapted Faster R-CNN are then passed through a
final ShiftNet network that refines the result using our
newly proposed Volume Displacement Loss. Our novel,
geometrically constrained deep learning approach to
monocular 3D object detection obtains top results on
KITTI 3D Object Detection Benchmark, being the best
among all monocular methods that do not use any pre-
trained network for depth estimation.

Parameters:

Not applicable.

Latex Bibtex:

@ARTICLE {shiftrcnn,
author = "Andretti Naiden and Vlad Paunescu
and Gyeongmo Kim and ByeongMoon Jeon and Marius
Leordeanu",
title = "Shift R-CNN: Deep Monocular 3D
Object Detection With Closed-form Geometric
Constraints",
journal = "ICIP",
year = "2019",
url = "https://arxiv.org/abs/1905.09970"
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark	Easy	Moderate	Hard
Car (Detection)	94.07 %	88.48 %	78.34 %
Car (Orientation)	93.75 %	87.47 %	77.19 %
Car (3D Detection)	6.88 %	3.87 %	2.83 %
Car (Bird's Eye View)	11.84 %	6.82 %	5.27 %
Pedestrian (Detection)	70.86 %	51.30 %	46.37 %
Pedestrian (Orientation)	64.73 %	46.56 %	41.86 %
Pedestrian (3D Detection)	7.95 %	4.66 %	4.16 %
Pedestrian (Bird's Eye View)	8.58 %	5.66 %	4.49 %
Cyclist (Detection)	63.24 %	42.96 %	38.22 %
Cyclist (Orientation)	51.95 %	34.77 %	31.10 %
Cyclist (3D Detection)	0.48 %	0.29 %	0.31 %
Cyclist (Bird's Eye View)	0.76 %	0.38 %	0.41 %