Novel View Synthesis

Novel View Appearance Synthesis (50% Drop Rate)


We select 5 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 50% drop rate. We select one frame every ∼ 0.8 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

Method Setting Code PSNR SSIM LPIPS Runtime Environment
1 ExtraGS 23.58 0.868 0.148 0.01 s GPU @ 2.5 Ghz (Python)
2 HUGS 23.38 0.870 0.121 0.02 s 1 core @ 2.5 Ghz (C/C++)
3 MVSRegNeRF 22.48 0.829 0.256 2 s 1 core @ 2.5 Ghz (C/C++)
F. Bian, S. Xiong, R. Yi and L. Ma: Multi-view stereo-regulated NeRF for urban scene novel view synthesis. The Visual Computer 2024.
4 PointNeRF++ code 22.44 0.828 0.212 20 s 1 core @ 2.5 Ghz (C/C++)
W. Sun, E. Trulls, Y. Tseng, S. Sambandam, G. Sharma, A. Tagliasacchi and K. Yi: PointNeRF++: A multi-scale, point-based Neural Radiance Field. European Conference on Computer Vision 2024.
5 PNF 22.07 0.820 0.221 15 s GPU @ 2.5 Ghz (Python)
A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert and T. Funkhouser: Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. CVPR 2022.
6 mip-NeRF code 21.54 0.778 0.365 10 s 1 core @ 2.5 Ghz (Python)
J. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla and P. Srinivasan: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV 2021.
7 NeRF
This method uses stereo information.
code 21.18 0.779 0.343 10 s 1 core @ 2.5 Ghz (Python)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.
8 FVS code 20.00 0.790 0.193 0.2 s 1 core @ 2.5 Ghz (C/C++)
G. Riegler and V. Koltun: Free View Synthesis. ECCV 2020.
9 PBNR code 19.91 0.811 0.191 0.1 s 1 core @ 2.5 Ghz (C/C++)
G. Kopanas, J. Philip, T. Leimkühler and G. Drettakis: Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 2021.
10 Point-NeRF code 19.44 0.796 0.266 1 s 1 core @ 2.5 Ghz (C/C++)
Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli and U. Neumann: Point-nerf: Point-based neural radiance fields. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022.
11 PCL
This method uses stereo information.
12.81 0.576 0.549 0.2 s 1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021.
Table as LaTeX | Only published Methods


Novel View Semantic Synthesis (50% Drop Rate)


Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of the class label, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

Method Setting Code mIoU Class mIoU Category Runtime Environment
1 PNF 73.06 84.97 15 s GPU @ 2.5 Ghz (Python)
A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. Guibas, A. Tagliasacchi, F. Dellaert and T. Funkhouser: Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. CVPR 2022.
2 HUGS 72.65 85.64 0.02 s 1 core @ 2.5 Ghz (C/C++)
3 GT Image + PSPNet 63.82 78.25 0.2 s 1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
4 FVS + PSPNet 60.86 74.61 0.4 s 1 core @ 2.5 Ghz (C/C++)
ERROR: Wrong syntax in BIBTEX file.
5 PBNR + PSPNet 58.43 71.99 1 s 1 core @ 2.5 Ghz (C/C++)
G. Kopanas, J. Philip, T. Leimkühler and G. Drettakis: Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 2021.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
6 NeRF + PSPNet 49.57 69.14 15 s GPU @ 2.5 Ghz (Python)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
7 mip-NeRF + PSPNet 48.25 67.47 15 s GPU @ 2.5 Ghz (Python)
J. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla and P. Srinivasan: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV 2021.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
8 PCL + PSPNet code 37.21 44.55 0.4 s 1 core @ 2.5 Ghz (C/C++)
Y. Liao, J. Xie and A. Geiger: KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. ARXIV 2021.
H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia: Pyramid Scene Parsing Network. CVPR 2017.
Table as LaTeX | Only published Methods


Novel View Appearance Synthesis (90% Drop Rate)


We select 10 static scenes with a driving distance of ∼ 50 meters each for evaluating NVS at a 90% drop rate. We select one frame every ∼ 4.0 meters driving distance (corresponding to the overall average distance between frames) to avoid redundancy when the vehicle is slow. We release 50% of the frames for training and retain 50% for evaluation. Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS). Our evaluation table ranks all methods according to the peak signal-to-noise ratio (PSNR). We also evaluate structural similarity index (SSIM) and perceptual smilarity (LPIPS).

Method Setting Code PSNR SSIM LPIPS Runtime Environment
1 DGNerf code 17.33 0.714 0.397 1 s 1 core @ 2.5 Ghz (C/C++)
2 MVSRegNeRF 17.20 0.702 0.424 2 s 1 core @ 2.5 Ghz (C/C++)
F. Bian, S. Xiong, R. Yi and L. Ma: Multi-view stereo-regulated NeRF for urban scene novel view synthesis. The Visual Computer 2024.
3 NeRF
This method uses stereo information.
15.74 0.648 0.590 10 s 1 core @ 2.5 Ghz (C/C++)
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi and R. Ng: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020.
Table as LaTeX | Only published Methods


Novel View Semantic Synthesis (90% Drop Rate)


Our evaluation table ranks all methods according to the confidence weighted mean intersection-over-union (mIoU). The weighted IoU of one class can be defined as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of the class label, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.