Contained in the NVIDIA Rubin Platform: Six Latest Chips, One AI Supercomputer

AI has entered an industrial phase.

What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that constantly convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of information.

To deliver these capabilities at scale, next generation AI factories must process tons of of 1000’s input tokens to offer the long-context required for agentic reasoning, complex workflows, and multimodal pipelines, while sustaining real-time inference under constraints on power, reliability, security, deployment velocity, and price.

The NVIDIA Rubin platform was designed specifically for this recent reality.

Extreme co-design is the muse of the Rubin platform. GPUs, CPUs, networking, security, software, power delivery, and cooling are architected together as a single system slightly than optimized in isolation. By doing so, the Rubin platform treats the information center, not a single GPU server, because the unit of compute. This approach establishes a brand new foundation for producing intelligence efficiently, securely, and predictably at scale. It ensures that performance and efficiency delay in production deployments, not only isolated component benchmarks.

This technical deep dive explains why AI factories demand a brand new architectural approach; how NVIDIA Vera Rubin NVL72 functions as a rack-scale architecture; and the way the Rubin platform’s silicon, software, and systems translate into sustained performance and lower cost per token at scale.

The blog is organized as follows:

Why AI factories need a brand new platform: The shift to reasoning-driven, always-on AI and the constraints that now define scale: power, reliability, security, and speed of deployment.
Meet the NVIDIA Rubin platform: The rack-scale platform thesis and the core breakthroughs that enable sustained intelligence production.
Six recent chips, one AI supercomputer: The six-chip architecture and the way GPUs, CPUs, networking, and infrastructure operate as one coherent system.
From chips to systems: NVIDIA Vera Rubin superchip to DGX SuperPOD: How Rubin scales from superchips to racks to NVIDIA DGX SuperPOD-scale AI factory deployments.
Software and developer experience: The software stack that makes rack-scale programmable, from NVIDIA CUDA and NVIDIA CUDA-X to training and inference frameworks.
Operating at AI factory scale: The production foundations: operations, reliability, security, energy efficiency, and ecosystem readiness.
Performance and efficiency at scale: How Rubin converts architecture into real gains at scale, including one-fourth as many GPUs to coach, 10x higher inference throughput, and 10x lower cost per token.
Why Rubin is the AI factory platform: How extreme co-design delivers predictable performance, economics, and scalability in real deployments.

Feature	Grace CPU	Vera CPU
Cores	72 Neoverse V2 cores	88 NVIDIA Custom Olympus cores
Threads	72	176 Spatial Multi-Threading
L2 Cache per core	1MB	2MB
Unified L3 Cache	114MB	162MB
Memory bandwidth (BW)	As much as 512GB/s	As much as 1.2TB/s
Memory capability	As much as 480GB LPDDR5X	As much as 1.5TB LPDDR5X
SIMD	4x 128b SVE2	6x 128b SVE2 FP8
NVLINK-C2C	900GB/s	1.8TB/s
PCIe/CXL	Gen5	Gen6/CXL 3.1
Confidential compute	NA	Supported

Feature	Blackwell	Rubin
Transistors (full chip)	208B	336B
Compute dies	2	2
NVFP4 inference (PFLOPS)	10	50*
FP8 training (PFLOPS)	5	17.5
Softmax acceleration (SFU EX2 Ops/Clk/SM for FP32 \| FP16)	16	32 \| 64

Feature	Hopper GPU	Blackwell GPU	Rubin GPU
FP32 vector (TFLOPS)	67	80	130
FP32 matrix (TFLOPS)	495	227*	400*
FP64 vector (TFLOPS)	34	40	33
FP64 matrix (TFLOPS)	67	150*	200*

Interconnect	Blackwell	Rubin
NVLink (GPU-GPU)(GB/s, bi-directional)	1,800	3,600
NVLink-C2C (CPU-GPU)(GB/s, bi-directional)	900	1,800
PCIe Interface(GB/s, bi-directional)	256 (Gen 6)	256 (Gen 6)

Feature	BlueField-3	BlueField-4
Bandwidth	400 Gb/s	800 Gb/s
Compute	16 Arm A78 Cores	64 Arm Neoverse V2 6x Compute Performance
Memory bandwidth	75 GB/s	250 GB/s
Memory capability	32GB	128GB
Cloud networking	32K hosts	128K hosts
Data-in-transitencryption	400Gb/s	800Gb/s
NVMe storage disaggregation	10M IOPs at 4K	20M IOPs at 4K

Feature	Blackwell		Rubin
Key component	Spectrum-X SN5000 series	ConnectX-8 SuperNIC	Spectrum-X SN6000 series	ConnectX-9 SuperNIC
Chip	Spectrum-4	ConnectX-8	Spectrum-6	ConnectX-9
Maximum bandwidth	51.2 Tb/s per switch chip(64 x 800 Gb/s)	800 Gb/s (2 x 400G) per GPU	102.4 Tb/s per switch chip(128 x 800 Gb/s)	1600 Gb/s (2 x 800 GB/s) per GPU
SerDes	100G PAM4	100/200G PAM4	200G PAM4	200G PAM4
Protocol	Ethernet	Ethernet, InfiniBand	Ethernet	Ethernet, InfiniBand
Connectivity	OSFP	OSFP, QSFP112	OSFP	OSFP, QSFP112

Contained in the NVIDIA Rubin Platform: Six Latest Chips, One AI Supercomputer

1. Why AI factories need a brand new platform

2. Meet the NVIDIA Rubin platform

3. Six recent chips, one AI supercomputer

Vera CPU: Purpose-built for AI factories

From NVIDIA Grace to Vera—scaling the CPU for AI factories

NVIDIA Olympus core with spatial multithreading

Scalable Coherency Fabric—deterministic data movement

Memory bandwidth and coherent execution

Software compatibility and secure operation

The info engine for AI factories

Rubin GPU: Execution engine for transformer-era AI

Sustained compute and execution scaling

Converging AI and scientific computing

Transformer Engine

Memory and decode efficiency

Scale-up interconnect—built for communication-dominated AI

Built for AI factory workloads

NVLink 6 Switch: The rack-scale scale-up fabric

All-to-all scaling for MoE and reasoning

In-network compute for collective operations

Operability at AI factory scale

ConnectX-9: Pushing the bounds of AI scale-out bandwidth

Endpoint control for bursty AI traffic

Performance isolation for multi-tenant AI factories

Secure endpoints for AI infrastructure

From endpoint control to infrastructure offload

BlueField-4 DPU: Powering the operating system of the AI factory

Infrastructure acceleration at AI factory scale

Built for secure, multi-tenant operation

NVIDIA Inference Context Memory Storage—AI-native storage infrastructure

Operating the AI factory as a system

Spectrum-6 Ethernet switch: Scale-out and scale-across for AI factories

Spectrum-X Ethernet fabric

Spectrum-X Ethernet Photonics: Redefining network efficiency at AI scale

Built for real AI traffic patterns

Advancing the material without re-architecting the network

4. From chips to systems: NVIDIA Vera Rubin superchip to DGX SuperPOD

NVIDIA Vera Rubin superchip

Vera Rubin NVL72 compute tray

Vera Rubin NVL72 NVLink switch tray

Spectrum-X Ethernet switching for scale-out AI factories

NVIDIA DGX SuperPOD: the AI factory deployment unit

5. Software and developer experience

CUDA-X libraries—the performance foundation

Large-scale training—from research to production with NVIDIA NeMo

Inference frameworks and optimization—serving real-time intelligence

A developer-ready programmable rack-scale platform

6. Operating at AI factory scale

Deployment and operations

Enterprise software and lifecycle support

Reliability, availability, and serviceability

Rack-scale resiliency: Designed from the bottom up

Intelligent resiliency across the interconnect

Silicon-level health monitoring with zero downtime

Predictive operations at AI factory scale

Full stack confidential computing

Third–generation confidential computing: rack-level security

From device-level security to rack-scale trust

Cryptographic attestation and verifiable compliance

Unified security across the whole rack

Energy for tokens: thermal and power innovations

Rack-level power smoothing and site-level energy storage

Power optimization and grid awareness for sustainable AI factory scale

A seamless transition enabled by a mature ecosystem

Where operations meets performance

7. Performance and efficiency at scale

Unlocking the 10T MoE era via extreme co-design

Real-time reasoning at scale

Redefining the Pareto frontier

8. Why Rubin is the AI factory platform

9. Learn more

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.