The way to Scale Data Generation for Physical AI with the NVIDIA Cosmos Cookbook

-


Constructing powerful physical AI models requires diverse, controllable, and physically-grounded data at scale. Collecting large-scale, diverse real-world datasets for training might be expensive, time-intensive, and dangerous. NVIDIA Cosmos open world foundation models (WFMs) address these challenges by enabling scalable, high-fidelity synthetic data generation for physical AI and the augmentation of existing datasets.

The NVIDIA Cosmos Cookbook is a comprehensive guide for using Cosmos WFMs and tools. It includes step-by-step recipes for inference, curation, post-training, and evaluation. 

For scalable data-generation workflows, the Cookbook includes a wide range of recipes based on NVIDIA Cosmos Transfer, a world-to-world style transfer model.

For scalable data-generation workflows, the Cookbook includes a wide range of recipes based on NVIDIA Cosmos Transfer, a world-to-world style transfer model. On this blog, we’ll sample Cosmos Transfer recipes to alter video backgrounds, add recent environmental conditions to driving data, and generate data for multiple use cases corresponding to robotics navigation and concrete traffic scenarios.

Augmenting video data

To scale existing, real datasets, developers often look to generate realistic variations of the identical scene by modifying backgrounds, lighting, or object properties without breaking temporal consistency.

The Multi-Control Recipes section within the Cookbook demonstrates use those various control modalities to perform guided video augmentations using Cosmos Transfer. Moreover, core concepts explain how the strategic combination of various control modalities is required for achieving high-fidelity, structurally consistent video results. Developers can use depth, edge, segmentation, and vis controls—together with text prompts—to exactly tweak video attributes corresponding to background, lighting, object geometry, color, or texture while maintaining the temporal and spatial consistency of specified regions. 

This recipe is very beneficial for robotics developers, where recognizing human gestures (e.g., waving or greeting) across different environments and conditions is expensive and time-consuming to capture. 

Control modalities

  • Depth: Maintains 3D realism and spatial consistency by respecting distance and perspective.
  • Segmentation: Used to completely transform objects, people, or backgrounds.
  • Edge: Preserves the unique structure, shape, and layout of the video.
  • Vis: By default, applies a smoothing/blur effect, where the underlying visual characteristics remain unchanged.

Technical overview

  • Control fusion: Combines multiple conditioning signals (edge, seg, vis) to balance geometric preservation and photorealistic synthesis.
  • Mask-aware editing: Binary or inverted masks define editable regions, ensuring localized transformations.
  • Parameterization: Each modality’s influence is tuned via control_weight in JSON configs, enabling reproducible control across editing tasks.

Core recipes

1. Background change: Replace with realistic backgrounds using filtered_edge, seg (mask_inverted), and vis to preserve subject motion.

A GIF of a person waving as the background changes to a deep blue ocean using Cosmos TransferA GIF of a person waving as the background changes to a deep blue ocean using Cosmos Transfer
Figure 1. Background change using Cosmos Transfer

2. Lighting change: Modify illumination conditions (e.g., day to nighttime, indoor to outdoor) using edge + vis.

GIF of a person waving as the lighting changes using Cosmos TransferGIF of a person waving as the lighting changes using Cosmos Transfer
Figure 2. Lighting change using Cosmos Transfer

3. Color/texture change: Alter surface appearance with pure edge control for stable structure retention. This preserves all other structures as defined by object edges.

A GIF of a person waving as his black t-shirt color changes to red using Cosmos TransferA GIF of a person waving as his black t-shirt color changes to red using Cosmos Transfer
Figure 3. Color and texture change using Cosmos Transfer

4. Object change: Transform object class or shape using low-weight edge, high-weight seg (mask), and moderate vis.

A GIF of a humanoid sorting fruits and vegetables in a lab, with some items changing into packaged food using Cosmos Transfer.A GIF of a humanoid sorting fruits and vegetables in a lab, with some items changing into packaged food using Cosmos Transfer.
Figure 4. Object change using Cosmos Transfer

Example commands

Start with Cosmos Transfer 2.5 here. You’ll find the configurations for the all core recipes utilized in this tutorial here.

Generating recent environments for autonomous driving development

A GIF of nine different car-and-city scenes across weather and lighting, showing domain adaptation and synthetic data augmentation for autonomous driving with Cosmos Transfer.A GIF of nine different car-and-city scenes across weather and lighting, showing domain adaptation and synthetic data augmentation for autonomous driving with Cosmos Transfer.
Figure 5. Cosmos Transfer output showcasing domain adaptation and artificial data augmentation in autonomous driving use cases

This recipe collection demonstrates how Cosmos Transfer might be used for domain adaptation and artificial data augmentation in autonomous vehicle (AV) research. By transforming real-world or simulated driving videos across diverse environmental conditions, developers can create wealthy datasets for training more robust perception or planning models.

Technical overview

  • Multi-control inference: The pipeline combines 4 control modalities—depth, edge, seg, and vis—each with tunable control_weight parameters to balance realism, structure, and semantic fidelity.
  • Prompt-conditioned generation: Text prompts define conditions corresponding to “night with shiny street lamps,” “winter with heavy snow,” or “sunset with reflective roads.”

Example command for base parameters

{
    // Update the paramater values for control weights, seed, guidance in below json file
    "seed": 5000,
    "prompt_path": "assets/prompt_av.json",           // Update the prompt within the json file accordingly
    "video_path": "assets/av_car_input.mp4",
    "guidance": 3,
    "depth": {
        "control_weight": 0.4
    },
    "edge": {
        "control_weight": 0.1
    },
    "seg": {
        "control_weight": 0.5
    },
    "vis": {
        "control_weight": 0.1
    }
}

More example commands for this workflow might be found here.

Making robots more mobile with Sim2Real data augmentation

Three GIFs of a warehouse scene with RGB image and segmentation mask on top, and a photorealistic Cosmos rendering below.Three GIFs of a warehouse scene with RGB image and segmentation mask on top, and a photorealistic Cosmos rendering below.
Figure 6. A GIF showing the input RGB video and segmentation mask (top) and the photorealistic output from Cosmos Transfer 1 (bottom).

Robotics navigation models often struggle to generalize from simulation to reality as a consequence of visual and physical domain gaps. The Sim2Real Data Augmentation recipe demonstrates how Cosmos Transfer improves Sim2Real performance for mobile robots by generating photorealistic, domain-adapted data from simulation. 

Technical overview
The pipeline integrates with NVIDIA X-Mobility and Mobility Gen:

  • Mobility Gen: Built on Isaac Sim, it generates high-fidelity datasets with RGB, depth, and segmentation ground truth for wheeled and legged robots.
  • X-Mobility: Learns navigation policies from each on-policy and off-policy data.
  • Cosmos Transfer: Applies multimodal controls (edge: 0.3, seg: 1.0) to differ lighting, materials, and textures while preserving geometry, motion, and annotations.
Side-by-side images of a mobile robot navigating a taped path; with Cosmos, it detects and avoids a transparent bin that the baseline system does not.Side-by-side images of a mobile robot navigating a taped path; with Cosmos, it detects and avoids a transparent bin that the baseline system does not.
Figure 7. Visual showing how Cosmos-augmented data successfully identifies the transparent obstacle and navigates around it, demonstrating enhanced perception capabilities for difficult transparent objects.

Example command to arrange inputs for Cosmos Transfer

uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_nav2_100k data/x_mobility_isaac_sim_nav2_100k_input_videos
uv run scripts/examples/transfer1/inference-x-mobility/xmob_dataset_to_videos.py data/x_mobility_isaac_sim_random_160k data/x_mobility_isaac_sim_random_160k_input_videos

More example commands for this workflow might be found here.

Generating synthetic data for smart city applications

A reference architecture diagram showing synthetic data generation pipeline for smart cityA reference architecture diagram showing synthetic data generation pipeline for smart city
Figure 8. Synthetic data generation pipeline for smart city

Also included within the cookbook is an end-to-end workflow that generates photorealistic synthetic data for urban traffic scenarios, accelerating the event of perception and vision-language models (VLMs) for smart city applications. The workflow simulates dynamic city traffic scenes in CARLA and is then processed through Cosmos Transfer to supply high-quality, visually authentic videos and annotated datasets.

Daytime synthetic video of a busy city intersection with multiple lanes of cars and a few pedestrians waiting at crosswalks, captured from an elevated camera angle as vehicles move through green traffic lights under clear skies.Daytime synthetic video of a busy city intersection with multiple lanes of cars and a few pedestrians waiting at crosswalks, captured from an elevated camera angle as vehicles move through green traffic lights under clear skies.
Figure 9. Synthetic video of a busy traffic intersection during daytime

Access the synthetic data generation workflow here.

In synthetic data generation, assessing the standard of generated content is crucial to make sure realistic and reliable results. Read this case study that demonstrates how Cosmos Reason, a reasoning vision language model, might be used to evaluate physical plausibility—evaluating whether the interactions and movements in synthetic videos align with the basic laws and constraints of real-world physics.

The way to use and contribute your personal synthetic data generation recipe

To make use of the Cosmos Cookbook, start by exploring the inferencing or post-training recipes, which give step-by-step instructions for tasks like video generation, sim-to-real augmentation, or model training. Each recipe outlines a workflow and points you to the relevant executable scripts within the scripts/ directory.

For deeper background on topics corresponding to control modalities, data curation, or evaluation, see the concepts guides. All recipes include setup requirements and command examples to show you how to reproduce or adapt results.

As an open source community platform, the Cosmos Cookbook brings together NVIDIA engineers, researchers, and developers to share practical techniques and extend the ecosystem through collaboration. Contributors are welcome so as to add recent recipes, refine workflows, and share insights to advance post-training and deployment best practices for Cosmos models. Follow the below steps for contributing to the major Cookbook repository.

1. Fork and arrange

Fork the Cosmos Cookbook repository, then clone and configure:

  • git clone https://github.com/YOUR-USERNAME/cosmos-cookbook.git
  • cd cosmos-cookbook
  • git distant add upstream https://github.com/nvidia-cosmos/cosmos-cookbook.git
  • # Install dependencies 
  • just install
  • # Confirm setup
just serve-internal  # Visit http://localhost:8000

2. Create a branch

git checkout -b recipe/descriptive-name  # or docs/, fix/, etc.

3. Make changes

Add your content following the templates below, then test: just serve-internal  # Preview changes

just test           # Run validation

4. Commit and push

  • git add .
  • git commit -m "Add Transfer weather augmentation recipe"
git push origin recipe/descriptive-name

5. Create pull request

Create a pull request and submit PR for review

6. Address feedback

Update your branch based on review comments:

  • git add .
  • git commit -m "Address review feedback"
git push origin recipe/descriptive-name

The PR updates robotically. Once approved, the team will merge your contribution.

7. Sync your fork

Before starting recent work:

  • git checkout major
  • git fetch upstream
  • git merge upstream/major

More details on templates and guidelines might be found here

Start

Explore more recipes with the Cosmos Cookbook for your personal use cases.

The Cosmos Cookbook is designed to create a dedicated space where the Cosmos team and community can openly share and contribute practical knowledge. We’d like to receive your patches and contributions to assist construct this beneficial resource together. Learn more about contribute.

Learn more about NVIDIA Research at NeurIPS.

On the forefront of AI innovation, NVIDIA Research continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, simulation, and more. Explore the cutting-edge breakthroughs now.

Stay awake so far by subscribing to NVIDIA news, following NVIDIA AI  on LinkedIn, Instagram, X and Facebook, and joining the NVIDIA Cosmos forum.





Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x