Method

Point Virtual Transformer V1 [PointVit V1]


Submitted on 5 Aug. 2025 10:14 by
Veerain Sood (Texas A & M)

Running time:.006 s
Environment:1 core @ 2.5 Ghz (Python + C/C++)

Method Description:
This is a single stage transformer architecture
operating directly on voxelized LiDAR points.
PointViT V1 introduces depth based virtual points,
integrates self attention with local depthwise
convolutions, and employs a multiscale voxel
feature encoder.
Parameters:
Number of transformer heads: 4
Latex Bibtex:
@misc{sood2026pointvirtualtransformer,
title={Point Virtual Transformer},
author={Veerain Sood and Bnalin and Gaurav
Pandey},
year={2026},
eprint={2602.06406},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.06406},
}

Detailed Results

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).


Benchmark Easy Moderate Hard
Car (Detection) 99.36 % 94.04 % 86.46 %
Car (Orientation) 99.34 % 93.98 % 86.37 %
Car (3D Detection) 91.16 % 79.93 % 72.51 %
Car (Bird's Eye View) 95.94 % 89.95 % 82.40 %
This table as LaTeX


2D object detection results.
This figure as: png eps txt gnuplot



Orientation estimation results.
This figure as: png eps txt gnuplot



3D object detection results.
This figure as: png eps txt gnuplot



Bird's eye view results.
This figure as: png eps txt gnuplot




eXTReMe Tracker