NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past yr, token consumption has grown multifold and now exceeds 10 quadrillion tokens per yr. And while the vast majority of tokens have been generated from humans interacting with AI, the brand new era is one through which most tokens will likely be generated from AI interacting with AI.

Modern agentic systems plan tasks, invoke tools, execute code, retrieve data, and coordinate across continuous multistep workflows with quite a few AI agents. These interactions generate large volumes of reasoning tokens, expand KV cache, and require CPU-based sandboxed environments to check and validate results generated by accelerated computing systems. This places low latency, high throughput demands across GPUs, CPUs, scale-up domains, scale-out networks, and storage.

Delivering useful intelligence for these modern agentic systems requires fleets of purpose-built rack-scale systems that function together as one coherent AI supercomputer. This post introduces the NVIDIA Vera Rubin POD, a set of 5 specialized rack-scale systems built on the third-generation NVIDIA MGX rack architecture for the era of agentic AI.