NVIDIA Nemotron isn’t only a family of models. It’s an open collection of models, datasets, and training recipes that anyone can use to construct, customize, and deploy their very own AI systems.
From lightweight edge models to frontier-scale LLMs, Nemotron gives developers full visibility into how the models are trained, what data they use, and methods to adapt them for brand spanking new use cases. No black boxes.
NVIDIA also uses Nemotron internally to learn methods to design the subsequent generation of accelerated infrastructure, including GPUs, software, and networking, by experimenting within the open with the community.
TL;DR
What it’s: Nemotron is NVIDIA’s family of open models, datasets, recipes for training and deploying AI at any scale. Models are available in three sizes: Nano, Super, and Ultra, covering every little thing from edge devices to data centers, including each text and multimodal variants.
Why it matters: Because you may inspect the information, modify the models, and deploy them nonetheless you would like.
Key breakthroughs:
- Hybrid Transformer + Mamba Architecture: Nemotron Nano V2 fuses Transformers with Mamba-2 state-space layers to deliver as much as 20X faster inference and on-device reasoning without sacrificing accuracy.
- Pondering budgets: configurable reasoning depth to balance answer quality vs. cost in production.
- FP4 pre-training: 4-bit precision training on Blackwell GPUs, achieving world-class results at dramatically lower energy use.
- Data-centric optimization: curated and artificial datasets that cut pre-training time by as much as 4× while improving model accuracy.
For developers:
Use Nemotron as your foundation for custom AI. From fine-tuning enterprise copilots to deploying agentic systems at the sting.
Run, adapt, and extend: every little thing you would like including models and datasets, is open on huggingface.co/nvidia
Help us define the longer term, vote for Nemotron features we must always prioritize on our roadmap.
| Model | Parameters | Modality | Strengths | Ideal Use Cases |
|---|---|---|---|---|
| NVIDIA-Nemotron-Nano-9B-v2 | 9B | Text | Hybrid Transformer–SSM architecture delivering 6–20× faster inference with transformer-level accuracy. Optimized for speed and efficiency. | Edge and near-edge AI agents, chatbots, and light-weight copilots. |
| Llama-3.1-Nemotron-Nano-VL-8B-V1 | 8B | Multimodal (Vision + Language) | Combines Nemotron reasoning with Llama 3.1 vision-language capabilities for cross-modal understanding. | Multimodal document intelligence, OCR parsing, and AI assistants that “see and reason.” |
| Llama-3.3-Nemotron-Super-49B-v1.5 | 49B | Text | Balanced accuracy and performance for enterprise-scale AI. Built with Nemotron datasets and reasoning recipes. | Enterprise copilots, RAG systems, workflow automation, and domain fine-tuning. |
| Llama-3.1-Nemotron-Ultra-253B-v1 | 253B | Text | Frontier-scale reasoning and alignment optimized for research. Co-designed with NVIDIA’s full-stack hardware and data infrastructure. | Large-scale research, long-context reasoning, and infrastructure acceleration. |
License: Open model license and repo-specific terms are published per asset.
Why NVIDIA Builds Nemotron
Nemotron advances a straightforward idea: openness accelerates progress.
By releasing open weights, open datasets, and transparent training and alignment recipes, developers can reproduce, audit, and extend what NVIDIA builds, then tailor it to their data, compliance needs, and domain-specific use cases.
At the identical time, Nemotron is an element of NVIDIA’s strategy of “extreme co-design” that Jensen Huang discussed on the Bg2 Pod. That is where every layer of the stack, across chips, systems, software, algorithms, and data, is designed together as one unified system. Training and optimizing models within the open provides insights that shape NVIDIA’s hardware and software roadmap, from GPU architectures and networking to memory scheduling and kernel design.
Which means every improvement discovered through Nemotron, faster reasoning, higher convergence, lower energy, eventually flows back into the platform you utilize, whether you’re serving a small Nano model at the sting or scaling Ultra within the cloud.
Hearken to this podcast to study why we’re constructing Nemotron and have committed to probably the most open approach to model development.
Breakthroughs in Efficiency and Accuracy
The most recent Nemotron research brings together architectural innovation, precision advances, and intelligent reasoning controls — all designed to make models each smarter and faster.
Nemotron Nano V2 — Hybrid Transformer + Mamba Architecture
Nemotron Nano V2 introduces a hybrid Transformer–Mamba architecture that mixes the long-range reasoning power of Transformers with the sequential efficiency of Mamba-2 state-space layers.
Most attention layers are replaced with Mamba modules, which process sequences in linear time and constant memory per token, while just a few key attention layers remain to preserve full-context reasoning.
The result: 6–20X higher inference throughput on the identical hardware with minimal loss in accuracy across reasoning and generation benchmarks. This efficiency makes on-device and near-edge AI assistants practical for real-time decision-making and multimodal agents that have to think fast and act locally.
FP4 Pre-Training — 4-Bit Precision at Full Intelligence
Nemotron demonstrates energy-efficient FP4 training on Blackwell GPUs using NVIDIA’s Transformer Engine. This ends in world-class accuracy at dramatically lower energy cost, proving that four-bit precision can train frontier-level models without sacrificing intelligence.
This breakthrough helps reduce the carbon footprint and infrastructure cost of large-scale AI development.
Pondering Budgets — Reason Smarter, Spend Less
Reasoning quality often scales with how long a model “thinks.” Nemotron introduces configurable pondering budgets, allowing developers and businesses to manage reasoning depth, balancing answer quality and operational cost.
- Shorter pondering = faster, cheaper responses.
- Longer pondering = deeper reasoning and better accuracy.
Now, you may tune reasoning the identical way you tune batch size or context length.
Sizing Overview
Nemotron includes text and multimodal large language models across three weight classes:
- Nano – small, fast, and edge-ready
- Super – mid-range, balanced for enterprise tasks
- Ultra – frontier-scale for cutting-edge research
All models are trained with shared open data recipes, offering reproducible baselines that anyone can extend.
Data-Centric Efficiency
Smarter data, not only larger data, drives Nemotron’s performance gains.
Refined pre-training datasets, curated and enhanced with synthetic data, speed up convergence by as much as 4X, producing more capable models on the identical compute budget.
This “data flywheel” approach means developers can train faster, cheaper, and with higher accuracy, constantly improving performance through higher data.
Open Datasets for AI Development
NVIDIA makes its pretraining and post-training datasets available on Hugging Face so developers can inspect the underlying data and use it to coach their very own models.
Nemotron datasets play a critical role in how efficiently models learn. Smarter data results in smarter models with the identical, and even less, compute. These datasets are optimized not only for size but for efficiency, diversity, and reasoning quality, showing how intelligent data design can speed up model convergence and improve real-world performance.
These open datasets cover a wide selection of domains and training stages:
| Dataset | Strengths |
|---|---|
| Nemotron-Pretraining-Code-v1 | Preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the subsequent generation of intelligent, globally-capable models. |
| Nemotron-Post-Training-Dataset-v2 | Post-training dataset with math, code, general reasoning, and instruction following capabilities improvements. |
| Llama-Nemotron-VLM-Dataset-v1 | High-quality annotations that support world class vision-language understanding with permissive license for training. |
| Nemotron-Personas-Datasets | Synthetically-generated personas grounded in real-world demographic, geographic and personality trait distributions |
Projects like Nemotron-MIND concentrate on structured math dialogue reasoning, while Nemotron-CrossThink combines topics from science, math, and humanities to push reasoning beyond narrow domains.
Real-World Applications
Nemotron isn’t only a research effort, it’s the muse behind practical, open-source workflows
developers can deploy today. Listed below are just a few tutorials for teams to start constructing on it:
- RAG Agent: Use Nemotron with the NeMo framework to create retrieval-augmented generation (RAG) agents that pull out of your private data and deliver context-aware answers in real time.
- Computer Use Agent: Construct an agent that may autonomously execute multi-step tasks within the Bash shell, including navigating files, summarizing documents, and analyzing log files.
- Report Generator AI Agent: Learn methods to construct an agent that robotically compiles and summarizes reports using open Nemotron models, easily customizable for enterprise or research use.
- AI Coding Agent: Construct a robust AI coding assistant directly in your workstation. Adjust the model’s pondering budget to balance reasoning depth, speed, and compute cost.
- Multimodal Document Intelligence: Mix NVIDIA Llama Nemotron Nano VL with vision-language reasoning to parse, understand, and summarize complex documents containing text, charts, and pictures.
- Deep Researcher Blueprint: Explore the AI-Q NVIDIA Blueprint, a reference design for constructing research-grade agentic workflows that plan, reason, and synthesize insights across multiple data sources.
With Nemotron, developers can start small, scale fast, and customize deeply, all while keeping data ownership, transparency, and control.
Transparency & Openness
Nemotron commits to inspectable lineage (models + data), shareable recipes, and reproducible evaluation so teams can audit, customize, and deploy safely. In case your policies require changing the information mix, language balance, or alignment method, the artifacts and documentation are there so you may reproduce, and improve, our results.
We encourage forks, PRs, and benchmark reproductions. Bring your findings back—that’s how the entire community gets faster, safer, and smarter together.
Construct the Future With Us
Every breakthrough in AI starts with collaboration.
Nemotron, greater than an NVIDIA project, is supposed to be a platform for builders. From weekend hackers, to enterprise developers.
Whether you’re fine-tuning models, optimizing inference, or experimenting with different architectures, your insights directly shape how the subsequent generation of AI applications and infrastructure will evolve.
That is your invitation to:
- Customize your model. Wonderful-tune Nemotron together with your own data using the NeMo framework.
- Contribute back. Fork models, improve datasets, or publish recent recipes that make training faster and safer.
- Experiment. Use open weights to check recent ideas in reasoning, alignment, or multimodal learning.
- Benchmark. Share what works (and what doesn’t) so others can learn faster.
- Influence our upcoming features. Vote on what features we must always prioritize for future versions of Nemotron.
- Construct together. Join 1000’s of developers using open NVIDIA models to power every little thing from agentic AI to robotics and edge systems.
Nemotron and NeMo are your constructing blocks for AI applications. Open, accelerated, and able to customize.
Every pull request, dataset, or benchmark helps design the AI infrastructure of the longer term. Learn more about our approach to Nemotron on the most recent NVIDIA AI Podcast.
Let’s construct it together.

