Constructing powerful physical AI models requires diverse, controllable, and physically-grounded data at scale. Collecting large-scale, diverse real-world datasets for training might be expensive, time-intensive, and dangerous. NVIDIA Cosmos open world foundation models (WFMs) address these challenges by enabling scalable, high-fidelity synthetic data generation for physical AI and the augmentation of existing datasets.
The NVIDIA Cosmos Cookbook is a comprehensive guide for using Cosmos WFMs and tools. It includes step-by-step recipes for inference, curation, post-training, and evaluation.
For scalable data-generation workflows, the Cookbook includes a wide range of recipes based on NVIDIA Cosmos Transfer, a world-to-world style transfer model.
For scalable data-generation workflows, the Cookbook includes a wide range of recipes based on NVIDIA Cosmos Transfer, a world-to-world style transfer model. On this blog, we’ll sample Cosmos Transfer recipes to alter video backgrounds, add recent environmental conditions to driving data, and generate data for multiple use cases corresponding to robotics navigation and concrete traffic scenarios.
Augmenting video data
To scale existing, real datasets, developers often look to generate realistic variations of the identical scene by modifying backgrounds, lighting, or object properties without breaking temporal consistency.
The Multi-Control Recipes section within the Cookbook demonstrates use those various control modalities to perform guided video augmentations using Cosmos Transfer. Moreover, core concepts explain how the strategic combination of various control modalities is required for achieving high-fidelity, structurally consistent video results. Developers can use depth, edge, segmentation, and vis controls—together with text prompts—to exactly tweak video attributes corresponding to background, lighting, object geometry, color, or texture while maintaining the temporal and spatial consistency of specified regions.
This recipe is very beneficial for robotics developers, where recognizing human gestures (e.g., waving or greeting) across different environments and conditions is expensive and time-consuming to capture.
Control modalities
- Depth: Maintains 3D realism and spatial consistency by respecting distance and perspective.
- Segmentation: Used to completely transform objects, people, or backgrounds.
- Edge: Preserves the unique structure, shape, and layout of the video.
- Vis: By default, applies a smoothing/blur effect, where the underlying visual characteristics remain unchanged.
Technical overview
- Control fusion: Combines multiple conditioning signals (
edge,seg,vis) to balance geometric preservation and photorealistic synthesis. - Mask-aware editing: Binary or inverted masks define editable regions, ensuring localized transformations.
- Parameterization: Each modality’s influence is tuned via
control_weightin JSON configs, enabling reproducible control across editing tasks.
Core recipes
1. Background change: Replace with realistic backgrounds using filtered_edge, seg (mask_inverted), and vis to preserve subject motion.


2. Lighting change: Modify illumination conditions (e.g., day to nighttime, indoor to outdoor) using edge + vis.


3. Color/texture change: Alter surface appearance with pure edge control for stable structure retention. This preserves all other structures as defined by object edges.


4. Object change: Transform object class or shape using low-weight edge, high-weight seg (mask), and moderate vis.


Example commands
Start with Cosmos Transfer 2.5 here. You’ll find the configurations for the all core recipes utilized in this tutorial here.
Generating recent environments for autonomous driving development


This recipe collection demonstrates how Cosmos Transfer might be used for domain adaptation and artificial data augmentation in autonomous vehicle (AV) research. By transforming real-world or simulated driving videos across diverse environmental conditions, developers can create wealthy datasets for training more robust perception or planning models.
Technical overview
- Multi-control inference: The pipeline combines 4 control modalities—
depth,edge,seg, andvis—each with tunablecontrol_weightparameters to balance realism, structure, and semantic fidelity. - Prompt-conditioned generation: Text prompts define conditions corresponding to “night with shiny street lamps,” “winter with heavy snow,” or “sunset with reflective roads.”
Example command for base parameters
{
// Update the paramater values for control weights, seed, guidance in below json file
"seed": 5000,
"prompt_path": "assets/prompt_av.json", // Update the prompt within the json file accordingly
"video_path": "assets/av_car_input.mp4",
"guidance": 3,
"depth": {
"control_weight": 0.4
},
"edge": {
"control_weight": 0.1
},
"seg": {
"control_weight": 0.5
},
"vis": {
"control_weight": 0.1
}
}
More example commands for this workflow might be found here.
Making robots more mobile with Sim2Real data augmentation


Robotics navigation models often struggle to generalize from simulation to reality as a consequence of visual and physical domain gaps. The Sim2Real Data Augmentation recipe demonstrates how Cosmos Transfer improves Sim2Real performance for mobile robots by generating photorealistic, domain-adapted data from simulation.
Technical overview
The pipeline integrates with NVIDIA X-Mobility and Mobility Gen:
- Mobility Gen: Built on Isaac Sim, it generates high-fidelity datasets with RGB, depth, and segmentation ground truth for wheeled and legged robots.
- X-Mobility: Learns navigation policies from each on-policy and off-policy data.
- Cosmos Transfer: Applies multimodal controls (edge: 0.3, seg: 1.0) to differ lighting, materials, and textures while preserving geometry, motion, and annotations.


Example command to arrange inputs for Cosmos Transfer
uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_nav2_100k data/x_mobility_isaac_sim_nav2_100k_input_videos
uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_random_160k data/x_mobility_isaac_sim_random_160k_input_videos
More example commands for this workflow might be found here.
Generating synthetic data for smart city applications


Also included within the cookbook is an end-to-end workflow that generates photorealistic synthetic data for urban traffic scenarios, accelerating the event of perception and vision-language models (VLMs) for smart city applications. The workflow simulates dynamic city traffic scenes in CARLA and is then processed through Cosmos Transfer to supply high-quality, visually authentic videos and annotated datasets.


Access the synthetic data generation workflow here.
In synthetic data generation, assessing the standard of generated content is crucial to make sure realistic and reliable results. Read this case study that demonstrates how Cosmos Reason, a reasoning vision language model, might be used to evaluate physical plausibility—evaluating whether the interactions and movements in synthetic videos align with the basic laws and constraints of real-world physics.
The way to use and contribute your personal synthetic data generation recipe
To make use of the Cosmos Cookbook, start by exploring the inferencing or post-training recipes, which give step-by-step instructions for tasks like video generation, sim-to-real augmentation, or model training. Each recipe outlines a workflow and points you to the relevant executable scripts within the scripts/ directory.
For deeper background on topics corresponding to control modalities, data curation, or evaluation, see the concepts guides. All recipes include setup requirements and command examples to show you how to reproduce or adapt results.
As an open source community platform, the Cosmos Cookbook brings together NVIDIA engineers, researchers, and developers to share practical techniques and extend the ecosystem through collaboration. Contributors are welcome so as to add recent recipes, refine workflows, and share insights to advance post-training and deployment best practices for Cosmos models. Follow the below steps for contributing to the major Cookbook repository.
1. Fork and arrange
Fork the Cosmos Cookbook repository, then clone and configure:
git clone https://github.com/YOUR-USERNAME/cosmos-cookbook.gitcd cosmos-cookbookgit distant add upstream https://github.com/nvidia-cosmos/cosmos-cookbook.git# Install dependenciesjust install# Confirm setup
just serve-internal # Visit http://localhost:8000
2. Create a branch
git checkout -b recipe/descriptive-name # or docs/, fix/, etc.
3. Make changes
Add your content following the templates below, then test: just serve-internal # Preview changes
just test # Run validation
4. Commit and push
git add .git commit -m "Add Transfer weather augmentation recipe"
git push origin recipe/descriptive-name
5. Create pull request
Create a pull request and submit PR for review
6. Address feedback
Update your branch based on review comments:
git add .git commit -m "Address review feedback"
git push origin recipe/descriptive-name
The PR updates robotically. Once approved, the team will merge your contribution.
7. Sync your fork
Before starting recent work:
git checkout majorgit fetch upstreamgit merge upstream/major
More details on templates and guidelines might be found here
Start
Explore more recipes with the Cosmos Cookbook for your personal use cases.
The Cosmos Cookbook is designed to create a dedicated space where the Cosmos team and community can openly share and contribute practical knowledge. We’d like to receive your patches and contributions to assist construct this beneficial resource together. Learn more about contribute.
Learn more about NVIDIA Research at NeurIPS.
On the forefront of AI innovation, NVIDIA Research continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, simulation, and more. Explore the cutting-edge breakthroughs now.
Stay awake so far by subscribing to NVIDIA news, following NVIDIA AI on LinkedIn, Instagram, X and Facebook, and joining the NVIDIA Cosmos forum.
