The KITTI Vision Benchmark Suite

A. Geiger | P. Lenz | C. Stiller | R. Urtasun

Semantic Segmentation Evaluation

This is the KITTI semantic segmentation benchmark. It consists of 200 semantically annotated train as well as 200 test images corresponding to the KITTI Stereo and Flow Benchmark 2015. The data format and metrics are conform with The Cityscapes Dataset.

The data can be downloaded here:

Note: On 12.04.2018 we have fixed several annotation errors in the dataset, please download the dataset again if you have an old version.

Our evaluation table ranks all methods according to the PASCAL VOC intersection-over-union metric (IoU). IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively. Like in cityscapes we also use an instance-level intersection over union iIoU = iTP/(iTP+FP+iFN). In contrast to the standard IoU measure, iTP and iFN are computed by weighting the contribution of each pixel by the ratio of the class’ average instance size to the size of the respective ground truth instance.

IoU class: Intersection over Union for each class IoU=TP/(TP+FP+FN)
iIoU class: Instance Intersection over Union iIoU=iTP/(iTP+FP+iFN)
IoU category: Intersection over Union for each category IoU=TP/(TP+FP+FN)
iIoU category: Instance Intersection over Union for each category iIoU=iTP/(iTP+FP+iFN)

Important Policy Update: As more and more non-published work and re-implementations of existing work is submitted to KITTI, we have established a new policy: from now on, only submissions with significant novelty that are leading to a peer-reviewed paper in a conference or journal are allowed. Minor modifications of existing algorithms or student research projects are not allowed. Such work must be evaluated on a split of the training set. To ensure that our policy is adopted, new users must detail their status, describe their work and specify the targeted venue during registration. Furthermore, we will regularly delete all entries that are 6 months old but are still anonymous or do not have a paper associated with them. For conferences, 6 month is enough to determine if a paper has been accepted and to add the bibliography information. For longer review cycles, you need to resubmit your results.

Additional information used by the methods

Laser Points: Method uses point clouds from Velodyne laser scanner
Depth: Method uses depth from stereo.
Video: Method uses 2 or more temporally adjacent images
Additional training data: Use of additional data sources for training (see details)

	Method	Setting	Code	IoU class	iIoU class	IoU category	iIoU category	Runtime	Environment
1	DepthMatch			76.60	48.48	90.16	75.05	1.2 s	1 core @ 2.5 Ghz (Python)
J. Huang, J. Li, S. Vityazev, A. Dvorkovich and R. Fan: DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization. IEEE Signal Processing Letters 2025.
2	WRP		code	76.44	50.92	89.63	73.69	1 s	GPU @ >3.5 Ghz (Python)
A. Ganeshan, A. Vallet, Y. Kudo, S. Maeda, T. Kerola, R. Ambrus, D. Park and A. Gaidon: Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle- consistency. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2021.
3	UJS_model			75.11	47.71	89.53	75.75	0.26 s	1 core @ 2.5 Ghz (Python + C/C++)
L. Yingfeng Cai and Z. Hai Wang: Multi-Target Pan-Class Intrinsic Relevance Driven Model for Improving Semantic Segmentation in Autonomous Driving. IEEE Transactions on Image Processing (TIP) 2021.
4	RoadFormer+			73.13	45.88	88.75	73.46	0.04 s	1 core @ 2.5 Ghz (C/C++)
J. Huang, J. Li, N. Jia, Y. Sun, C. Liu, Q. Chen and R. Fan: RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion. IEEE Transactions on Intelligent Vehicles 2024.
5	VideoProp-LabelRelax			72.82	48.68	88.99	75.26	n s	GPU @ 1.5 Ghz (Python + C/C++)
B. Yi Zhu*: Improving Semantic Segmentation via Video Propagation and Label Relaxation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019.
6	SN_DN161_fat_pyrx8			68.89	40.45	87.06	67.93	1 s	6 x Tesla V100
P. Bevandić, M. Oršić, I. Grubišić, J. Šarić and S. Šegvić: Multi-domain semantic segmentation with overlapping labels. WACV 2022.
7	MSeg1080_RVC		code	62.64	31.62	86.59	68.05	0.49 s	1 core @ 3.0 Ghz (Python)
J. Lambert, Z. Liu, O. Sener, J. Hays and V. Koltun: MSeg: A Composite Dataset for Multi- domain Semantic Segmentation. Computer Vision and Pattern Recognition (CVPR) 2020.
8	Chroma UDA			60.36	31.70	80.73	61.91	0.4 s	GPU @ 2.5 Ghz (Python)
O. Erkent and C. Laugier: Semantic Segmentation with Unsupervised Domain Adaptation Under Varying Weather Conditions for Autonomous Vehicles. IEEE Robotics and Automation Letters 2020.
9	IfN-DomAdap-Seg			59.50	30.28	81.57	61.91	1 s	GPU @ 2.0 Ghz (Python)
J. Bolte, M. Kamp, A. Breuer, S. Homoceanu, P. Schlicht, F. Hüger, D. Lipinski and T. Fingscheidt: Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain. Proc. of CVPR - Workshops 2019.
10	SegStereo		code	59.10	28.00	81.31	60.26	0.6 s	Nvidia GTX Titan Xp
G. Yang, H. Zhao, J. Shi, Z. Deng and J. Jia: SegStereo: Exploiting Semantic Information for Disparity Estimation. ECCV 2018.
11	MCANet		code	58.52	24.00	83.04	54.06	0.003 s	1 GPU NVIDIA geForce RTX
T. Singha, D. Pham and A. Krishna: Multi-encoder Context Aggregation Network for Structured and Unstructured Urban Street Scene Analysis. IEEE Access 2023.
12	SDBNetV2			56.77	23.11	81.08	50.77	0.004 s	1 GPU NVIDIA geForce RTX
T. Singha, D. Pham and A. Krishna: Improved Short-term Dense Bottleneck network for efficient scene analysis. Computer Vision and Image Understanding 2023.
13	TiCoSS			54.77	23.99	79.77	58.07	0.30 s	1 core @ 2.5 Ghz (Python + C/C++)

14	EPNet			54.59	24.24	79.47	51.83	0.03 s	NVIDIA 4090 (Python)

15	SGDepth		code	53.04	24.36	78.65	55.95	0.1 s	GPU @ 2.5 Ghz (Python)
M. Klingner, J. Termöhlen, J. Mikolajczyk and T. Fingscheidt: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. ECCV 2020.
16	SDBNet		code	51.80	18.72	78.00	44.46	0.01 s	1 GPU NVIDIA geForce RTX
T. Singha, D. Pham and A. Krishna: SDBNet: Lightweight Real-Time Semantic Segmentation Using Short-Term Dense Bottleneck. Proc. DICTA 2022.
17	SDNet		code	51.14	17.74	79.62	50.45	0.2 s	GPU @ 2.5 Ghz (Python + C/C++)
M. Ochs, A. Kretz and R. Mester: SDNet: Semantic Guided Depth Estimation Network. German Conference on Pattern Recognition (GCPR) 2019.
18	test			50.07	20.41	73.86	40.32	1 s	1 core @ 2.5 Ghz (C/C++)

19	SFRSeg		code	49.27	17.43	77.91	46.88	0.005 s	1 GPU NVIDIA geForce RTX
T. Singha, D. Pham and A. Krishna: A real-time semantic segmentation model using iteratively shared features in multiple sub- encoders. Pattern Recognition 2023.
20	APMoE_seg_ROB		code	47.96	17.86	78.11	49.17	0.2 s	GPU @ 3.5 Ghz (Matlab/C++)
S. Kong and C. Fowlkes: Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arxiv 1805.01556 2018.
21	LIISIESS			46.73	19.82	76.04	49.91	NA s	1 core @ 2.5 Ghz (C/C++)
L. Sun, J. Bockman and S. Changming: A Framework for Leveraging Inter-image Information in Stereo Images for Enhanced Semantic Segmentation. IEEE Transactions on Instrumentation and Measurement 2023.

Table as LaTeX | Only published Methods

Related Datasets

The Cityscapes Dataset: The cityscapes dataset was recorded in 50 German cities and offers high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames.
Wilddash: Wilddash is a benchmark for semantic and instance segmentation. It aims to improve the expressiveness of performance evaluation for computer vision algorithms in regard to their robustness under real-world conditions.

Citation

When using this dataset in your research, we will be happy if you cite us:
@article{Alhaija2018IJCV,
author = { AlhaijaandHassan and MustikovelaandSiva and MeschederandLars and GeigerandAndreas and RotherandCarsten},
title = {Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes},
journal = {International Journal of Computer Vision (IJCV)},
year = {2018}
}

The KITTI Vision Benchmark Suite

A project of Karlsruhe Institute of Technologyand Toyota Technological Institute at Chicago

Semantic Segmentation Evaluation

Related Datasets

Citation

A project of Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago