Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Agentic AI systems need models with the specialized depth to unravel dense technical problems autonomously. They have to excel at reasoning, coding, and long-context evaluation, while remaining efficient enough to run repeatedly at scale.

Multi-agent systems generate as much as 15x the tokens of normal chats, re-sending history, tool outputs, and reasoning steps at every turn. Over long tasks, this “context explosion” causes goal drift, where agents step by step lose alignment with the unique objective. And using massive reasoning models for each sub-task—the “pondering tax”—makes multi-agent applications too expensive and sluggish for practical use.

Today, we’re releasing Nemotron 3 Super to handle these limitations. The brand new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications reminiscent of software development and cybersecurity triaging. This model follows the introduction of Nemotron 3 Nano in December.

Super addresses the “pondering tax” with its hybrid mixture-of-experts (MoE) architecture. It delivers over 5x throughput than the previous Nemotron Super. This model tackles the “context explosion” with a native 1M-token context window that offers agents long-term memory for aligned, high-accuracy reasoning. The model is fully open with open weights, datasets, and recipes so developers can easily customize, optimize, and deploy it on their very own infrastructure.

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

What makes Nemotron 3 Super different

See it in motion

Diving deep into the architecture

Hybrid Mamba-Transformer MoE backbone

Latent MoE

Multi-token prediction (MTP)

Native NVFP4 pretraining

How we trained Nemotron 3 Super

Pretraining

Supervised fine-tuning

Multi-environment reinforcement learning

Benchmarking Nemotron 3 Super

The “Super + Nano” deployment pattern

Constructing with Super’s open resources

Model weights

End-to-end training and evaluation recipes

Deployment cookbooks

Advantageous-tuning cookbooks

Open datasets

Open training and evaluation infrastructure

Start

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Recent MIT class uses anthropology to enhance chatbots

A Large-Scale Synthetic Dataset Generated from Programming Concept Seeds

Hustlers are cashing in on China’s OpenClaw AI craze

When Data Lies: Finding Optimal Strategies for Penalty Kicks with Game Theory

Yann LeCun’s $1B bet against LLMs

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

What makes Nemotron 3 Super different

See it in motion

Diving deep into the architecture

Hybrid Mamba-Transformer MoE backbone

Latent MoE

Multi-token prediction (MTP)

Native NVFP4 pretraining

How we trained Nemotron 3 Super

Pretraining

Supervised fine-tuning

Multi-environment reinforcement learning

Benchmarking Nemotron 3 Super

The “Super + Nano” deployment pattern

Constructing with Super’s open resources

Model weights

End-to-end training and evaluation recipes

Deployment cookbooks

Advantageous-tuning cookbooks

Open datasets

Open training and evaluation infrastructure

Start

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.