Latest Open Models and Datasets

-


Humanoid pick and place
NVIDIA Isaac GR00T N1 utilized in object manipulation.

At its annual GTC conference, NVIDIA has unveiled a trio of groundbreaking open-source releases geared toward accelerating physical AI development. Release of a brand new suite of world foundation models (WFMs) with multicontrols called Cosmos Transfer, a highly curated Physical AI Dataset, and the primary open model for general humanoid reasoning called NVIDIA Isaac GR00T N1 – represent a big breakthrough in physical AI technology, offering developers powerful tools and resources to advance robotics systems, and enhance autonomous vehicle technology.



Latest World Foundation Model – Cosmos Transfer

Cosmos Transfer, the most recent addition to NVIDIA’s Cosmosâ„¢ world foundation models (WFMs), introduces a brand new level of control and accuracy in generating virtual world scenes.

Available in 7 billion parameter size, the model utilizes multicontrols to guide the generation of high-fidelity world scenes from structural inputs, ensuring precise spatial alignment and scene composition.



How it really works

The model is built by training individual ControlNets individually for every sensor modality used to capture the simulated world.

Input types include 3D bounding box map, Trajectory map, Depth map, Segmentation map.

  • At inference time, developers can use various input types, including structured visual or geometric data comparable to segmentation maps, depth maps, edge maps, human motion keypoints, LiDAR scans, trajectories, HD maps, and 3D bounding boxes to guide the output.
  • The control signals from each control branch are multiplied by their corresponding adaptive spatiotemporal control maps after which summed before being added to the transformer blocks of the bottom model.
  • The generated output is photorealistic video sequences with controlled layout, object placement, and motion. Developers can control the output in multiple ways, comparable to preserving structure and appearance or allowing appearance variations while maintaining structure.

Outputs from Cosmos Transfer various environments and weather conditions.

Cosmos Transfer coupled with the NVIDIA Omniverse platform is driving controllable synthetic data generation for robotics and autonomous vehicle development at scale. Find more Cosmos Transfer Examples on GitHub.

Cosmos Transfer samples built using post-training base model are also available for autonomous vehicles.



Open Physical AI Dataset

NVIDIA has also released Physical AI Dataset, an open-source dataset on Hugging Face for developing physical AI. This commercial-grade, pre-validated dataset consists of 15 terabytes of information representing greater than 320,000 trajectories for robotics training, plus as much as 1,000 Universal Scene Description (OpenUSD) assets, including a SimReady collection.

The dataset is designed for post-training foundation models like Cosmos Predict world foundation models, providing developers with high-quality, diverse data to reinforce their AI models.



Purpose Built Model for Humanoids – NVIDIA Isaac GR00T N1

One other exciting announcement is the discharge of NVIDIA Isaac GR00T N1, the world’s first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and pictures, to perform manipulation tasks in diverse environments. The NVIDIA Isaac GR00T-N1-2B model is offered on Hugging Face.

Isaac GR00T N1 was trained on an expansive humanoid dataset, consisting of real captured data, synthetic data generated using components of the NVIDIA Isaac GR00T Blueprint, and internet-scale video data. It’s adaptable through post-training for specific embodiments, tasks and environments.

Isaac GR00T N1 uses a single model and set of weights to enable manipulation behaviors on various humanoid robots, comparable to the Fourier GR-1 and 1X Neo. It demonstrates robust generalization across a variety of tasks, including grasping and manipulating objects with one or each arms, in addition to transferring items between arms. It could possibly also execute complex, multi-step tasks that require sustained contextual understanding and the mixing of diverse skills. These capabilities make it well-suited for applications in material handling, packaging, and inspection.

Isaac GR00T N1 contains a dual-system architecture inspired by human cognition, consisting of the next complementary components:

  • Vision-Language Model (System 2): This methodical considering system relies on NVIDIA-Eagle with SmolLM-1.7B. It interprets the environment through vision and language instructions, enabling robots to reason about their environment and directions, and plan the appropriate actions.
  • Diffusion Transformer (System 1): This motion model generates continuous actions to manage the robot’s movements, translating the motion plan made by System 2 into precise, continuous robot movements.



Path Forward

Post-training is the trail forward to advancing autonomous systems, creating specialized models for downstream physical AI tasks.

Take a look at GitHub for Cosmos Predict and Cosmos Transfer inference scripts. Explore the Cosmos Transfer research paper for more details.

The NVIDIA Isaac GR00T-N1-2B model is offered on Hugging Face. Sample datasets and PyTorch scripts for post-training using custom user datasets, which is compatible with the Hugging Face LeRobot format can be found on GitHub. For more information concerning the Isaac GR00T N1 model, see the research paper.

Follow NVIDIA on Hugging Face for more updates.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x