Method

StereoDETR: Stereo-based Transformer for 3D Object Detection [st] [StereoDETR]
https://github.com/shiyi-mu/StereoDETR-OPEN

Submitted on 25 Nov. 2025 18:15 by
mu shiyi (shanghai University)

Running time:0.02 s
Environment:GPU @ 2.5 Ghz (Python)

Method Description:
We propose StereoDETR, an efficient stereo 3D
object detection framework based on DETR.
StereoDETR consists of two branches: a monocular
DETR branch and a stereo branch. The DETR branch
is built upon 2D DETR with additional channels for
predicting object scale, orientation, and sampling
points. The stereo branch leverages low-cost
multi-scale disparity features to predict object-
level depth maps. These two branches are coupled
solely through a differentiable depth sampling
strategy. It achieves binocular-level accuracy
while maintaining monocular-level inference speed.
Parameters:
\batch_size=16
Latex Bibtex:
@misc{mu2025stereodetrstereobasedtransformer3d,
title={StereoDETR: Stereo-based Transformer
for 3D Object Detection},
author={Shiyi Mu and Zichong Gu and Zhiqi Ai
and Anqi Liu and Yilin Gao and Shugong Xu},
year={2025},
eprint={2511.18788},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.18788},
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 96.39 % 93.45 % 83.67 %
Car (Orientation) 95.95 % 92.64 % 82.72 %
Car (3D Detection) 59.45 % 41.17 % 35.13 %
Car (Bird's Eye View) 72.77 % 54.53 % 46.41 %
Pedestrian (Detection) 76.89 % 59.62 % 54.58 %
Pedestrian (Orientation) 66.66 % 50.70 % 46.02 %
Pedestrian (3D Detection) 33.18 % 23.25 % 19.86 %
Pedestrian (Bird's Eye View) 36.97 % 25.89 % 22.26 %
Cyclist (Detection) 65.83 % 44.76 % 39.97 %
Cyclist (Orientation) 51.45 % 34.96 % 31.17 %
Cyclist (3D Detection) 39.09 % 24.29 % 20.77 %
Cyclist (Bird's Eye View) 42.19 % 26.78 % 22.74 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot



2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker