NVIDIA Cosmos™ family of open world models is redefining how we model and simulate the actual world. Built for robotics, autonomous systems, and simulation-driven AI, Cosmos world foundation models (WFMs) enable machines to see, imagine, and reason about physical reality.
With the launch of Cosmos Predict 2.5 and Cosmos Transfer 2.5, world generation has taken one other major breakthrough. These models extend the Cosmos family into longer horizons, richer viewpoints, and more adaptive domain transformations — laying the groundwork for scalable physical AI.
🔮 Cosmos Predict 2.5: World Generation
Cosmos Predict 2.5 merges what were once three separate models — Text2World, Image2World, and Video2World — right into a single, unified architecture able to generating consistent, controllable video worlds from several input modalities. Trained on 200 million high-quality clips and enhanced right into a unified model with a brand new reinforcement learning (RL) algorithm, it outperforms Cosmos Predict 1 in quality and prompt alignment for generating high-quality synthetic video data from single frames.
✨ Key Highlights
- One Powerful Model Cosmos Predict 2.5 has the capabilities of Text2World, Image2World, and Video2World right into a single model and utilizes Cosmos Reason 1, a Physical AI reasoning vision language model (VLM), because the text encoder. Saving cost on compute and time needed to post-train and construct your Physical AI workflows.
- Prolonged Video Horizons Produces sequences as much as 30 seconds, maintaining spatial-temporal coherence — essential for simulation, long-horizon prediction, and robotic planning.
- Multi-View Generation Creates synchronized camera views for realistic multi-camera setups in autonomous vehicle (AV) training or robot vision with camera control.
- Grounded Prompt Alignment Integrates Cosmos Reason as a text-scene encoder, tightening semantic grounding and reducing hallucinations.
- Efficiency-Driven Design Improves upon overall quality, inference speed, and resource efficiency through architectural refinement despite its scale and capability.
🔁 Cosmos Transfer 2.5: Spatially Controlled World Transformation
While Predict 2.5 creates worlds, Transfer 2.5 transforms them — enabling high-fidelity, spatially conditioned world-to-world translation. In comparison with Cosmos Transfer 1-7B, CosmosTransfer 2.5-2B is far smaller, with higher prompt and physics alignment, and ends in less hallucination and error accumulation for long video generations.
✨ Key Highlights
-
Smaller, faster, and enhanced quality It’s 3.5x smaller than its predecessor yet faster and higher quality — optimized for deployment in each research and production pipelines.
-
Policy training for robots Robot policy models trained with Cosmos Transfer 2.5-2B augmentation significantly outperform others in generalizing to novel environments.
-
Higher adherence to regulate signals for autonomous vehicles
The evaluation of 3D lane and cuboid detection on generated multi-view videos—using real-world scenarios because the control input—shows as much as a 60% improvement over the previous model (Transfer1-7B-Sample-AV), using LATR for lane detection and BEVFormer for cuboid detection. -
Multi-camera consistency for autonomous vehicles Cosmos Transfer 2.5 improves on Cosmos Transfer 1 by distributing control blocks more evenly throughout the network for smoother integration of conditioning information.
-
Less error accumulation Transfer 2.5 shows less error accumulation for all 4 control modalities (edge/blur/depth/segmentation) in comparison with Cosmos-Transfer1-7B.
Other Updates from Cosmos Platform
🧠 Cosmos Reason 1: Reasoning Vision Language Model
A part of the Cosmos WFMs, NVIDIA Cosmos Reason is an open, customizable, 7-billion-parameter reasoning Vision Language Model (VLM) for physical AI and robotics. The model enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and customary sense to grasp and act in the actual world. Cosmos Reason has topped the Physical Reasoning leaderboard. The model can also be available as an NVIDIA NIM, which offers secure, easy-to-use microservices for deploying high-performance generative AI across any environment.
🔍 Cosmos Dataset Search: Large-scale Data Search and Retrieval
To speed up model post-training, NVIDIA Cosmos Dataset Search is a vector-based workflow that permits physical AI developers to immediately search and retrieve targeted scenarios from massive training datasets. It uses the Cosmos Embed NIM to enable highly accurate semantic search and connects to NVIDIA Cosmos Curator to refine datasets and retrieve queried data with incredible efficiency and accuracy. With the power to look billions of clips in seconds, Cosmos Dataset Search dramatically shortens post-training—cutting development cycles from years to days.
🧩 Use cases and Workflows
🔗 Cosmos Cookbook The Cosmos Cookbook offers developers step-by-step recipes and post-training scripts to quickly construct, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
Read the 🔗 latest white paper from the NVIDIA Research team for more details on the capabilities and benchmarks of Cosmos Predict 2.5 and Cosmos Transfer 2.5.
🧠 Resources
💪 Get Began today
All Cosmos WFMs can be found on Hugging Face – model checkpoints here.
- Cosmos Predict 2.5 – Multimodal world foundation model for generating next frames based on input prompts. Explore GitHub repository for inference and post-training scripts.
- Cosmos Transfer 2.5 – Multicontrol world foundation model for data augmentation from structured video inputs. Explore GitHub for inference and post-training scripts.
Join our community for normal updates, Q&A, livestreams and hands-on tutorials!

