Instead of solely relying on point cloud
features, we leverage the mature field of 2D
object detection to reduce the search space in the
3D space. Then, we use the Pillar Feature Encoding
network for object localization in the reduced
point cloud. We also propose a novel approach for
masking point clouds to further improve the
localization of objects. On the KITTI test set our
method outperforms other multi-sensor SOTA
approaches for 3D pedestrian localization (Bird's
Eye View) while achieving a significantly faster
runtime of 14 Hz. |
@inproceedings{paigwar:hal-03354114,
TITLE = {{Frustum-PointPillars: A Multi-Stage
Approach for 3D Object Detection using RGB Camera
and LiDAR}},
AUTHOR = {Paigwar, Anshul and Sierra-Gonzalez,
David and Erkent, {\"O}zg{\"u}r and Laugier,
Christian},
URL = {https://hal.archives-ouvertes.fr/hal-
03354114},
BOOKTITLE = {{International Conference on
Computer Vision, ICCV, Workshop on Autonomous
Vehicle Vision}},
ADDRESS = {California, United States},
YEAR = {2021},
MONTH = Oct,
PDF = {https://hal.archives-ouvertes.fr/hal-
03354114/file/Frustum_Pointpillars_ICCV.pdf},
HAL_ID = {hal-03354114},
HAL_VERSION = {v1},
}
|