Andreas Geiger

Publications of Bernhard Jaeger

End-to-end Autonomous Driving: Challenges and Frontiers
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger and H. Li
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Abstract: The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. To facilitate future research, we maintain an active repository that contains up-to-date links to relevant literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.
Latex Bibtex Citation:
@article{Chen2024PAMI,
  author = {Li Chen and Penghao Wu and Kashyap Chitta and Bernhard Jaeger and Andreas Geiger and Hongyang Li},
  title = {End-to-end Autonomous Driving: Challenges and Frontiers},
  journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year = {2024}
}
Geometric Transform Attention
T. Miyato, B. Jaeger, M. Welling and A. Geiger
International Conference on Learning Representations (ICLR), 2024
Abstract: As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional encoding schemes are suboptimal for 3D vision tasks, as they do not respect their underlying 3D geometric structure. Based on this hypothesis, we propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation determined by the geometric relationship between queries and key-value pairs. By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead.
Latex Bibtex Citation:
@inproceedings{Miyato2024ICLR,
  author = {Takeru Miyato and Bernhard Jaeger and Max Welling and Andreas Geiger},
  title = {Geometric Transform Attention},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}
Hidden Biases of End-to-End Driving Models
B. Jaeger, K. Chitta and A. Geiger
International Conference on Computer Vision (ICCV), 2023
Abstract: End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source of improvements is unclear. We identify two biases that recur in nearly all state-of-the-art methods and are critical for the observed progress on CARLA: (1) lateral recovery via a strong inductive bias towards target point following, and (2) longitudinal averaging of multimodal waypoint predictions for slowing down. We investigate the drawbacks of these biases and identify principled alternatives. By incorporating our insights, we develop TF++, a simple end-to-end method that ranks first on the Longest6 and LAV benchmarks, gaining 11 driving score over the best prior work on Longest6.
Latex Bibtex Citation:
@inproceedings{Jaeger2023ICCV,
  author = {Bernhard Jaeger and Kashyap Chitta and Andreas Geiger},
  title = {Hidden Biases of End-to-End Driving Models},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year = {2023}
}
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz and A. Geiger
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Abstract: How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.
Latex Bibtex Citation:
@article{Chitta2022PAMI,
  author = {Kashyap Chitta and Aditya Prakash and Bernhard Jaeger and Zehao Yu and Katrin Renz and Andreas Geiger},
  title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},
  journal = {Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year = {2023}
}
End-to-end Autonomous Driving: Challenges and Frontiers
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger and H. Li
Arxiv, 2023
Abstract: The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
Latex Bibtex Citation:
@article{Chen2023ARXIVa,
  author = {Li Chen and Penghao Wu and Kashyap Chitta and Bernhard Jaeger and Andreas Geiger and Hongyang Li},
  title = {End-to-end Autonomous Driving: Challenges and Frontiers},
  journal = {Arxiv},
  year = {2023}
}


eXTReMe Tracker