Constructing Autonomous Vehicles That Reason with NVIDIA Alpamayo

Autonomous vehicle (AV) research is undergoing a rapid shift. The sphere is being reshaped by the emergence of reasoning-based vision–language–motion (VLA) models that bring human-like considering to AV decision-making. These models may be viewed as implicit world models operating in a semantic space, allowing AVs to unravel complex problems step-by-step and to generate reasoning traces that mirror human thought processes. This shift extends beyond the models themselves: traditional open-loop evaluation is not any longer sufficient to carefully assess such models, and recent evaluation tools are required.

Recently, NVIDIA introduced Alpamayo, a family of models, simulation tools, and datasets to enable development of reasoning-based AV architectures. Our goal is to supply researchers and developers with a versatile, fast, and scalable platform for evaluating, and ultimately training, modern reasoning-based AV architectures in realistic closed-loop settings.

On this blog, we introduce Alpamayo and methods to rise up and running with reasoning-based AV development:

Part 1: Introducing NVIDIA Alpamayo 1, an open, 10B reasoning VLA model, in addition to methods to use the model to each generate trajectory predictions and review the corresponding reasoning traces.
Part 2: Introducing the Physical AI dataset, considered one of the biggest and most geographically diverse open AV datasets available that permits training and evaluating these models.
Part 3: Introducing NVIDIA AlpaSim, an open-source end-to-end simulation tool designed for evaluating end-to-end models

These three key components provide the essential pieces needed to start out constructing reasoning-based VLA models: a base model, large-scale data for training, and a simulator for testing and evaluation.