NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

The NVIDIA Blackwell architecture powered the fastest time to coach across every MLPerf Training v5.1 benchmark, marking a clean sweep in the most recent round of results. As developers experiment with latest architectures, and models proceed to grow in size, more training compute is important. Meeting this need for delivered compute requires innovation across every layer of the AI stack—from chips and systems to software—advancing performance at an unprecedented pace.

MLPerf Training v5.1 is the most recent within the long-running series of industry benchmarks designed to measure AI training performance. This version measures the time to coach seven models, representing a wide selection of use cases, each to a specified goal accuracy. The Blackwell architecture, which powers each NVIDIA Blackwell and NVIDIA Blackwell Ultra GPUs, delivered the very best performance on every benchmark at maximum scale and at each submitted scale.

Benchmark	Time to coach	Maximum Submission scale
Llama 3.1 405B pretraining	10 minutes	5,120 Blackwell GPUs
LLama 3.1 8B pretraining	5.2 minutes	512 Blackwell Ultra GPUs
Llama 2 70B LoRA fine-tuning	0.40 minutes	512 Blackwell Ultra GPUs
FLUX.1	12.5 minutes	1,152 Blackwell GPUs
DLRM-DCNv2	0.71 minutes	64 Blackwell GPUs
R-GAT	0.84 minutes	256 Blackwell GPUs
RetinaNet	1.4 minutes	512 Blackwell GPUs

Table 1. The NVIDIA platform delivers the fastest time to coach on every model currently tested in MLPerf Training

MLPerf™ Training v5.0 and v5.1 results retrieved from www.mlcommons.org on November 12, 2025, from the next entries: 5.0-0082, 5.1-0002, 5.1-0004, 5.1-0060, 5.1-0070, 5.1-0072. The MLPerf™ name and logo are trademarks of MLCommons Association in america and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.

The NVIDIA platform was also the just one to submit results on all benchmarks. On this post, we take a better have a look at these results and the technology innovations that powered them.

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

NVIDIA makes the industry’s first FP4 training submissions with NVFP4

Blackwell Ultra delivers a big leap for LLM training

NVIDIA Blackwell sets latest Llama 3.1 405B training record

Blackwell Ultra sets the bar for Llama 3.1 8B training performance

Highest performance on latest FLUX.1 benchmark

Llama 2 70B LoRA software optimizations

Key takeaways

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Streaming datasets: 100x More Efficient

Google’s Sundar Pichai warns of “irrationality” in trillion-dollar AI investment boom

Generative AI Will Redesign Cars, But Not the Way Automakers Think

How you can Get Began with Neural Shading for Your Game or Application

Five Years of Constructing the Foundation of Open Machine Learning

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

NVIDIA makes the industry’s first FP4 training submissions with NVFP4

Blackwell Ultra delivers a big leap for LLM training

NVIDIA Blackwell sets latest Llama 3.1 405B training record

Blackwell Ultra sets the bar for Llama 3.1 8B training performance

Highest performance on latest FLUX.1 benchmark

Llama 2 70B LoRA software optimizations

Key takeaways

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.