The KITTI Vision Benchmark Suite

Depth Prediction Evaluation

The depth completion and depth prediction evaluation are related to our work published in Sparsity Invariant CNNs (THREEDV 2017). It
contains over 93 thousand depth maps with corresponding raw LiDaR scans and RGB images, aligned with the "raw data" of the KITTI dataset.
Given the large amount of training data, this dataset shall allow a training of complex deep learning models for the tasks of depth completion
and single image depth prediction. Also, we provide manually selected images with unpublished depth maps to serve as a benchmark for those
two challenging tasks.

The structure of all provided depth maps is aligned with the structure of our raw data to easily find corresponding left and right images,
or other provided information.

Note: On 12.04.2018 we have fixed a small error in the file data_depth_velodyne.zip, please download this file again if you have an old version.

All methods providing less than 100 % density have been interpolated using simple background interpolation as explained in the corresponding header file in the development kit.

SILog: Scale invariant logarithmic error [log(m)*100] (for more info click on the formula below)

sqErrorRel: Relative squared error (percent)
absErrorRel: Relative absolute error (percent)
iRMSE: Root mean squared error of the inverse depth [1/km]

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.

Additional information used by the methods

Additional training data: Use of additional data sources for training (see details)

	Method	Setting	Code	SILog	sqErrorRel	absErrorRel	iRMSE	Runtime	Environment
1	G2I			7.34	0.93	6.01	7.37	0.1 s	1 core @ 2.5 Ghz (C/C++)

2	×Net			7.51	0.93	6.14	7.62	0.1 s	1 core @ 2.5 Ghz (C/C++)

3	UniDepthV2			7.74	0.91	5.53	7.19	0.1 s	GPU @ 2.5 Ghz (Python + C/C++)

4	UniDepth		code	8.13	1.09	6.54	8.24	0.1 s	GPU @ 2.5 Ghz (Python)
L. Piccinelli, Y. Yang, C. Sakaridis, M. Segu, S. Li, L. Van Gool and F. Yu: UniDepth: Universal Monocular Metric Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024.
5	HyperDepth			9.16	1.55	7.70	10.15	0.1 s	1 core @ 2.5 Ghz (Python)

6	RegDepth			9.19	1.55	7.71	10.18	0.1 s	1 core @ 2.5 Ghz (Python)

7	MSFusion			9.37	1.51	7.62	10.15	0.1 s	1 core @ 2.5 Ghz (Python)

8	DCDepth		code	9.60	1.54	7.83	10.12	0.07 s	1 core @ 2.5 Ghz (Python)
K. Wang, Z. Yan, J. Fan, W. Zhu, X. Li, J. Li and J. Yang: DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain. Advances in Neural Information Processing Systems (NeurIPS) 2024.
9	NDDepth		code	9.62	1.59	7.75	10.62	0.1s	GPU @ 2.5 Ghz (Python)
S. Shao, Z. Pei, W. Chen, X. Wu and Z. Li: NDDepth: Normal-Distance Assisted Monocular Depth Estimation. International Conference on Computer Vision (ICCV) 2023.
10	IEBins		code	9.63	1.60	7.82	10.68	0.1s	GPU @ 2.5 Ghz (Python)
S. Shao, Z. Pei, X. Wu, Z. Liu, W. Chen and Z. Li: IEBins: Iterative Elastic Bins for Monocular Depth Estimation. Advances in Neural Information Processing Systems (NeurIPS) 2023.
11	VA-DepthNet		code	9.84	1.66	7.96	10.44	0.1 s	1 core @ 2.5 Ghz (Python)
C. Liu, S. Kumar, S. Gu, R. Timofte and L. Van Gool: VA-DepthNet: A Variational Approach to Single Image Depth Prediction. International Conference on Learning Representations (ICLR) 2023.
12	DiffusionDepth-I		code	9.85	1.64	8.06	10.58	0.2 s	1 core @ 2.5 Ghz (C/C++)
Y. Duan, X. Guo and Z. Zhu: Diffusiondepth: Diffusion denoising approach for monocular depth estimation. arXiv preprint arXiv:2303.05021 2023.
13	iDisc		code	9.89	1.77	8.11	10.73	0.1 s	1 core @ 2.5 Ghz (C/C++)
L. Piccinelli, C. Sakaridis and F. Yu: iDisc: Internal Discretization for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
14	MG		code	9.93	1.68	7.99	10.63	0.1 s	1 core @ 2.5 Ghz (C/C++)
C. Liu, S. Kumar, S. Gu, R. Timofte and L. Van Gool: Single Image Depth Prediction Made Better: A Multivariate Gaussian Take. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
15	URCDC-Depth		code	10.03	1.74	8.24	10.71	0.1 s	1 core @ 2.5 Ghz (C/C++)
S. Shao, Z. Pei, W. Chen, R. Li, Z. Liu and Z. Li: URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation. IEEE Transactions on Multimedia (TMM) 2023.
16	BinsFormer		code	10.14	1.69	8.23	10.90	0.1 s	1 core @ 2.5 Ghz (C/C++)
Z. Li, X. Wang, X. Liu and J. Jiang: BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation. arXiv preprint arXiv:2204.00987 2022.
17	TrapNet			10.15	1.66	7.92	10.45	0.1 s	1 core @ 2.5 Ghz (Python)
C. Ning and H. Gan: Trap Attention: Monocular Depth Estimation with Manual Traps. Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition 2023.
18	PixelFormer			10.28	1.82	8.16	10.84	0.1 s	1 core @ 2.5 Ghz (Python)
A. Agarwal and C. Arora: Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention. WACV 2023.
19	RED-T			10.36	1.92	8.11	10.82	0.1 s	GPU @ 2.5 Ghz (Python)
K. Shim, J. Kim, G. Lee and B. Shim: Depth-Relative Self Attention for Monocular Depth Estimation. 2023.
20	NeWCRFs			10.39	1.83	8.37	11.03	0.1 s	1 core @ 2.5 Ghz (Python)
W. Yuan, X. Gu, Z. Dai, S. Zhu and P. Tan: NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2022.
21	DepthFormer		code	10.69	1.84	8.68	11.39	0.1 s	1 core @ 2.5 Ghz (Python)
Z. Li, Z. Chen, X. Liu and J. Jiang: Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation. arXiv preprint arXiv:2203.14211 2022.
22	ViP-DeepLab			10.80	2.19	8.94	11.77	0.1 s	GPU @ 2.5 Ghz (Python)
S. Qiao, Y. Zhu, H. Adam, A. Yuille and L. Chen: ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2021.
23	SideRT			11.42	2.25	9.28	11.88	0.02 s	GPU @ 1.5 Ghz (Python)
C. Shu, Z. Chen, L. Chen, K. Ma, M. Wang and H. Ren: SideRT: A Real-time Pure Transformer Architecture for Single Image Depth Estimation. 2022.
24	PWA			11.45	2.30	9.05	12.32	0.06 s	GPU @ 2.5 Ghz (Python)
S. Lee, J. Lee, B. Kim, E. Yi and J. Kim: Patch-Wise Attention Network for Monocular Depth Estimation. Proceedings of the AAAI Conference on Artificial Intelligence 2021.
25	BANet			11.55	2.31	9.34	12.17	0.04 s	GPU @ 1.5 Ghz (Python + C/C++)
S. Aich, J. Vianney, M. Islam, M. Kaur and B. Liu: Bidirectional Attention Network for Monocular Depth Estimation. IEEE International Conference on Robotics and Automation (ICRA) 2021.
26	BTS		code	11.67	2.21	9.04	12.23	0.06 s	GPU @ 2.5 Ghz (Python + C/C++)
J. Lee, M. Han, D. Ko and I. Suh: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation. 2019.
27	DL_61 (DORN)		code	11.77	2.23	8.78	12.98	0.5 s	GPU @ 2.5 Ghz (Python + C/C++)
H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression Network for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
28	RefinedMPL			11.80	2.31	10.09	13.39	0.05 s	GPU @ 2.5 Ghz (Python + C/C++)
J. Vianney, S. Aich and B. Liu: RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving. arXiv preprint arXiv:1911.09712 2019.
29	DLE		code	11.81	2.22	9.09	12.49	0.09 s	NVIDIA Tesla V100
C. Liu, S. Gu, L. Gool and R. Timofte: Deep Line Encoding for Monocular 3D Object Detection and Depth Prediction. Proceedings of the British Machine Vision Conference (BMVC) 2021.
30	PFANet			11.84	2.46	9.23	12.63	0.1 s	GPU @ 2.5 Ghz (Python)
Y. Xu, C. Peng, M. Li, Y. Li and S. Du: Pyramid Feature Attention Network for Monocular Depth Prediction. 2021 IEEE International Conference on Multimedia and Expo (ICME) 2021.
31	GAC		code	12.13	2.61	9.41	12.65	0.05 s	GPU @ 2.5 Ghz (Python)
Y. Liu, Y. Yuan and M. Liu: Ground-aware Monocular 3D Object Detection for Autonomous Driving. IEEE Robotics and Automation Letters 2021.
32	DL_SORD_SL			12.39	2.49	10.10	13.48	0.8 s	GPU @ 2.5 Ghz (Python + C/C++)
R. Diaz and A. Marathe: Soft Labels for Ordinal Regression. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
33	VNL		code	12.65	2.46	10.15	13.02	0.5 s	1 core @ 2.5 Ghz (C/C++)
Y. Wei, Y. Liu, C. Shen and Y. Yan: Enforcing geometric constraints of virtual normal for depth prediction. 2019.
34	P3Depth		code	12.82	2.53	9.92	13.71	0.1 s	GPU @ 2.5 Ghz (Python)
V. Patil, C. Sakaridis, A. Liniger and L. Van Gool: P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
35	MS-DPT		code	12.83	3.62	11.01	13.43	0.1 s	GPU @ 2.5 Ghz (Python)
J. Song and S. Lee: Knowledge Distillation of Multi-scale Dense Prediction Transformer for Self-supervised Depth Estimation. 2023.
36	DS-SIDENet_ROB			12.86	2.87	10.03	14.40	0.35 s	GPU @ 2.5 Ghz (Python)
H. Ren, M. El-Khamy and J. Lee: Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding. IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) 2019.
37	DL_SORD_SQ			13.00	2.95	10.38	13.78	0.88 s	GPU @ 2.5 Ghz (Python + C/C++)
R. Diaz and A. Marathe: Soft Labels for Ordinal Regression. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
38	PAP			13.08	2.72	10.27	13.95	0.18 s	GPU @ 2.5 Ghz (Python + C/C++)
Z. Zhang, Z. Cui, C. Xu, Y. Yan, N. Sebe and J. Yang: Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
39	CADepth-Net		code	13.34	3.33	10.67	13.61	0.08 s	1 core @ 2.5 Ghz (Python)
J. Yan, H. Zhao, P. Bu and Y. Jin: Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation. 2021.
40	VGG16-UNet			13.41	2.86	10.60	15.06	0.16 s	GPU @ 2.5 Ghz (Python + C/C++)
X. Guo, H. Li, S. Yi, J. Ren and X. Wang: Learning monocular depth by distilling cross-domain stereo networks. Proceedings of the European Conference on Computer Vision (ECCV) 2018.
41	DORN_ROB			13.53	3.06	10.35	15.96	2 s	GPU @ 2.5 Ghz (Python)
H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression Network for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.
42	g2s		code	14.16	3.65	11.40	15.53	0.04 s	GPU @ 1.5 Ghz (Python)
H. Chawla, A. Varma, E. Arani and B. Zonooz: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation. 2021 IEEE International Conference on Robotics and Automation (ICRA) 2021.
43	MT-SfMLearner			14.25	3.72	12.52	15.83	0.04s	GPU @ 1.5 Ghz (Python)
A. Varma., H. Chawla., B. Zonooz. and E. Arani.: Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics. Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, 2022.
44	MLDA-Net			14.42	3.41	11.67	16.12	0.2 s	1 core @ 2.5 Ghz (Python)
X. Song, W. Li, D. Zhou, Y. Dai, J. Fang, H. Li and L. Zhang: MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation. IEEE Transactions on Image Processing 2021.
45	DABC_ROB			14.49	4.08	12.72	15.53	0.7 s	GPU @ 2.0 Ghz (Matlab)
R. Li, K. Xian, C. Shen, Z. Cao, H. Lu and L. Hang: Deep attention-based classification network for robust depth prediction. Proceedings of the Asian Conference on Computer Vision (ACCV) 2018.
46	BTSREF_RVC		code	14.67	3.12	12.42	16.84	0.1 s	1 core @ >3.5 Ghz (Python)
J. Lee, M. Han, D. Ko and I. Suh: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 2019.
47	SDNet		code	14.68	3.90	12.31	15.96	0.2 s	GPU @ 2.5 Ghz (C/C++)
M. Ochs, A. Kretz and R. Mester: SDNet: Semantic Guided Depth Estimation Network. German Conference on Pattern Recognition (GCPR) 2019.
48	APMoE_base_ROB		code	14.74	3.88	11.74	15.63	0.2 s	GPU @ 3.5 Ghz (Matlab), Geforce Titan X
S. Kong and C. Fowlkes: Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arxiv 1805.01556 2018.
49	DiPE			14.84	4.04	12.28	15.69	0.01 s	GPU @ 2.5 Ghz (Python)
H. Jiang, L. Ding, Z. Sun and R. Huang: DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020.
50	CSWS_E_ROB			14.85	3.48	11.84	16.38	0.2 s	1 core @ 2.5 Ghz (C/C++), Titian GTX 108
M. Bo Li: Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference. 2018.
51	R-MSFM		code	15.09	3.57	11.80	17.60	1 s	1 core @ 2.5 Ghz (Python)
Z. Zhou, X. Fan, P. Shi and Y. Xin: R-msfm: Recurrent multi-scale feature modulation for monocular depth estimating. Proceedings of the IEEE/CVF international conference on computer vision 2021.
52	HBC			15.18	3.79	12.33	17.86	0.05 s	GPU @ 2.5 Ghz (Python)
H. Jiang and R. Huang: Hierarchical Binary Classification for Monocular Depth Estimation. IEEE International Conference on Robotics and Biomimetics 2019.
53	SGDepth		code	15.30	5.00	13.29	15.80	0.1 s	GPU @ 2.5 Ghz (Python)
M. Klingner, J. Termöhlen, J. Mikolajczyk and T. Fingscheidt: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. ECCV 2020.
54	DHGRL			15.47	4.04	12.52	15.72	0.2 s	GPU @ 2.5 Ghz (Python)
Z. Zhang, C. Xu, J. Yang, Y. Tai and L. Chen: Deep hierarchical guidance and regularization learning for end-to-end depth estimation. Pattern Recognition 2018.
55	GCNDepth		code	15.54	4.26	12.75	15.99	0.05 s	GPU @ 2.5 Ghz (Python)
A. Masoumian, H. Rashwan, S. Abdulwahab, J. Cristiano and D. Puig: GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network. arXiv preprint arXiv:2112.06782 2021.
56	packnSFMHR_RVC		code	15.80	4.73	12.28	17.96	0.5 s	GPU @ 2.5 Ghz (Python)
V. Guizilini, R. Ambrus, S. Pillai, A. Raventos and A. Gaidon: 3D Packing for Self-Supervised Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
57	MultiDepth		code	16.05	3.89	13.82	18.21	0.01 s	GPU @ 1.5 Ghz (Python)
L. Liebel and M. Körner: MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification. IEEE Intelligent Transportation Systems Conference (ITSC) 2019.
58	LSIM			17.92	6.88	14.04	17.62	0.08 s	GPU @ 2.5 Ghz (Python)
M. Goldman, T. Hassner and S. Avidan: Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation. Computer Vision and Pattern Recognition Workshops (CVPRW) 2019.

Table as LaTeX | Only published Methods

Related Datasets

SYNTHIA Dataset: SYNTHIA is a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations as well as pixel-wise depth information. The dataset consists of +200,000 HD images from video streams and +20,000 HD images from independent snapshots.
Middlebury Stereo Evaluation: The classic stereo evaluation benchmark, featuring four test images in version 2 of the benchmark, with very accurate ground truth from a structured light system. 38 image pairs are provided in total.
Make3D Range Image Data: Images with small-resolution ground truth used to learn and evaluate depth from single monocular images.
Virtual KITTI Dataset: Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions.
Scene Flow Dataset: The Freiburg Scene Flow Dataset collection has been used to train convolutional networks for disparity, optical flow, and scene flow estimation. The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences.

Citation

When using this dataset in your research, we will be happy if you cite us:
@inproceedings{Uhrig2017THREEDV,
author = {Jonas Uhrig and Nick Schneider and Lukas Schneider and Uwe Franke and Thomas Brox and Andreas Geiger},
title = {Sparsity Invariant CNNs},
booktitle = {International Conference on 3D Vision (3DV)},
year = {2017}
}

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Depth Prediction Evaluation

Related Datasets

Citation

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago