The NVIDIA GB200 NVL72 pushes AI infrastructure to latest limits, enabling breakthroughs in training large-language models and running scalable, low-latency inference workloads. Increasingly, Kubernetes plays a central role for deploying and scaling these workloads efficiently whether on-premises or within the cloud. Nonetheless, rapidly evolving AI workloads, infrastructure requirements, and latest hardware architectures pose latest challenges in Kubernetes orchestration and resource management.
On this post, we introduce a brand new Kubernetes abstraction called ComputeDomains to cover the complexity involved in ensuring each employee of a multi-node workload is capable of perform secure, GPU-to-GPU memory operations across node boundaries over a multi-node NVLink fabric.Â
Made available as a part of the NVIDIA DRA driver for GPUs, ComputeDomains bridge low‑level GPU constructs (NVIDIA NVLink and NVIDIA IMEX) with modern Kubernetes‑native scheduling concepts (dynamic resource allocation, DRA for brief) to supply the foundational support required for running distributed, multi-node workloads on modern GPU hardware. Without ComputeDomains, multi‑node NVLink setups would need to be manually defined and glued in place, limiting the pliability Kubernetes is designed to supply and coming at the fee of security isolation, fault isolation, and price efficiency.Â
While this work has been validated on NVIDIA DGX GB200, NVIDIA’s blueprint for GB200 NVL72 systems, ComputeDomains are designed to generalize across any current or future architecture that supports multi‑node NVLink, including future NVL576 systems.
On this post, we concentrate on the basics: what ComputeDomains are, why they’re essential, and the way you should utilize them to run your personal distributed, multi-node workloads on Kubernetes.
From single-node to multi-node GPU computing
To grasp why ComputeDomains are essential, it helps to look briefly at how GPU system design has evolved over time.
Earlier generations of NVIDIA DGX systems maximized performance by packing as many GPUs as possible right into a single server connected with high-bandwidth NVLink. This design delivered strong intra-node scaling but was limited to workloads that fit inside a single system. With the introduction of NVIDIA Multi-Node NVLink (MNNVL), that limitation disappears. GPUs in several servers can now communicate at full NVLink bandwidth through NVIDIA NVLink Switches, transforming a complete rack right into a single, unified GPU fabric. This allows seamless performance scaling across nodes and forms the idea for ultra-fast distributed training and inference.Â
GPU communication libraries corresponding to NVIDIA NCCL and NVIDIA NVSHMEM have been prolonged to take advantage of this fabric, while frameworks corresponding to PyTorch construct on top of them for fast cross-node, cross-GPU communication. These libraries mechanically detect and use the fastest available fabric (e.g. NVLink, RDMA, InfiniBand, or Ethernet) so distributed applications achieve optimal performance without code changes, no matter topology.
With ComputeDomains, we offer the really helpful approach to support Multi-Node NVLink on Kubernetes. As such, they already serve because the common layer on top of which several higher-level components in the general NVIDIA Kubernetes stack are built, including the KAI scheduler, NVIDIA Dynamo, and NVIDIA DGX Cloud Lepton.Â
The next figure depicts the NVIDIA GB200 NVL72 rack topology utilized by DGX GB200 systems. This is only one example of the variety of system that ComputeDomains unlock on Kubernetes.


Supporting multi-node NVLink on Kubernetes
So, what goes into supporting multi-node NVLink on Kubernetes and the way do ComputeDomains help with that? What’s secret’s the NVIDIA Internode Memory Exchange Service (IMEX), software on the GPU-driver level that lets GPUs communicate across nodes. With IMEX, every individual GPU memory export/import operation is subject to fine-grained access control. IMEX operates across a gaggle of nodes often called an IMEX domain.
Please consult with the figure below to achieve a greater understanding of the connection between NVLink domains, IMEX domains and the opposite levels of GPU partitioning which are possible in a multi-node NVLink environment.


ComputeDomains could be considered a generalization of IMEX domains. While IMEX domains exist at the driving force layer and define which nodes can communicate via NVLink, ComputeDomains generalize this idea and extend it into Kubernetes. They represent the next‑level concept of connectivity (or reachability) between the distributed employees of a multi‑node workload. The proven fact that IMEX is used underneath to enable that connectivity is an implementation detail.
In essence, ComputeDomains dynamically create, manage, and tear down IMEX domains as multi‑node workloads are scheduled to nodes and run to completion.
As a substitute of requiring static, pre‑configured IMEX setups, ComputeDomains reply to scheduling events in real time, mechanically forming IMEX domains across the set of nodes where a distributed job lands.
IMEX essentially provides re-configurable isolation boundaries—and ComputeDomains manage those in a fluid, transparent way. With ComputeDomains, each workload gets its own isolated IMEX domain and shared IMEX channel, ensuring GPU‑to‑GPU communication between all employees of a job while being securely isolated from other jobs. A ComputeDomain follows the workload and dynamically adjusts its topology as workload grows or shrinks. When workload finishes, its corresponding IMEX domain and channels are mechanically cleaned up, freeing up resources for future jobs.
Isolation without compromising on utilization
As indicated above, IMEX primitives are supposed to be an implementation detail hidden underneath the ComputeDomain abstraction. With that said, we argue that a sturdy, battle-tested solution for dynamically forming IMEX domains around a workload is fundamentally needed for 3 reasons:
- Security isolation: In a zero-trust environment, there may be a transparent need for neighboring GPU jobs to be securely isolated despite being physically NVLink-connected.
- Fault isolation: Neighboring jobs, even when trusted, must not step onto one another’s toes.
- Cost efficiency: Resource utilization should be kept high even with (1) and (2) in place, which is particularly relevant in multi-tenant environments.
Security isolation could arguably be achieved with static NVLink partitions, but that will drastically inhibit resource utilization.
In a trusted environment, security isolation may not at all times be of the strongest concern. Nonetheless, job reliability at all times is—and, because of this, fault isolation is, too. An IMEX domain is a distributed system of stateful nature. It is of course subject to failure scenarios and transient conditions that will result in a degraded or inconsistent state. Especially at scale, this can occur at a tangible rate. In those situations, the blast radius needs to be contained to simply a single job.
Conceptually, the safest approach to maximize fault isolation is to each temporally and spatially tie a person IMEX domain to simply one specific workload—which is what the ComputeDomain implementation ensures under the hood.
Without ComputeDomains, one would need to statically arrange long-lived IMEX domains and hence compromise on each (1) and (2). Any home-grown solution for dynamically orchestrating IMEX domains would eventually evolve into something like ComputeDomains and can transform difficult to construct. By providing a generic solution, we are able to save our users from having to undergo that effort themselves, and centralize lessons learned.
Using ComputeDomains in Kubernetes
ComputeDomains are provided by the NVIDIA DRA driver for GPUs. Within the near term, the DRA driver will probably be shipped with the NVIDIA GPU Operator. For now, it must be installed manually, with a Helm chart.
Detailed installation instructions and prerequisites could be found here. Generally, Kubernetes 1.32 or later is required with DRA APIs enabled in addition to CDI. You should definitely actually enable ComputeDomain support upon DRA driver installation (that’s the default), and to run it in an environment that has NVLink partitions arrange spanning multiple nodes (for instance in a GB200 NVL72 rack, or across racks).
The motive force is under heavy development. We recommend staying up-to-date by following our GitHub project; you may read in regards to the latest release, (v25.8.0) here.Â
Deploying workloads
Let’s walk through an example of making and using a ComputeDomain. The next Kubernetes specification declares a ComputeDomain with the name compute-domain-0:
apiVersion: resource.nvidia.com/v1beta1
kind: ComputeDomain
metadata:
name: compute-domain-0
spec:
numNodes: 0 # <-- this field is deprecated and will at all times be set to 0
channel:
resourceClaimTemplate:
name: compute-domain-0-rct
No workload refers to this ComputeDomain yet. At this point, it’s merely an API object. A ComputeDomain follows workload: It’ll form just in time around workload pods after they are literally scheduled onto nodes.Â
Next, let’s specify a workload and put compute-domain-0 to make use of by referencing it within the workload.
Say we would like to run a job distributed amongst 18 nodes. The goal is to make use of 4 GPUs per node and to determine (all-to-all) NVLink reachability between all 72 GPUs involved.
To that end, on this case, we’re going to run one Kubernetes pod per node. Each pod requests:
- 4 GPUs.
- To land in the identical NVLink partition as all the opposite pods of this workload (for physical reachability).
- To hitch the previously specified
ComputeDomain(for logical reachability).
The next Kubernetes deployment specification example achieves all that, with key concepts explained inline:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-mnnvl-workload
spec:
# Ask for this deployment to be comprised of 18 pods.
replicas: 18
selector:
matchLabels:
job: ex1
template:
metadata:
labels:
job: ex1
spec:
# Associate all pods on this deployment with the precise ComputeDomain
# that was previously created. To that end, consult with the so-called
# resource claim template related to that domain. The name of that
# template on this case is defined as `compute-domain-0-rct` within the
# ComputeDomain API object. Here we also define a brand new name `cd-0` that
# is consumed by the container spec below.
resourceClaims:
- name: cd-0
resourceClaimTemplateName: compute-domain-0-rct
# Define a `podAffinity` rule to be certain that all pods will land on nodes
# in the identical NVLink partition. Specifically, require all pods to land on
# nodes which have the _same_ value set for the `nvidia.com/gpu.clique`
# node label. This label is about by the NVIDIA GPU Operator (based on
# static NVLink configuration state).
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: job
operator: In
values:
- ex1
topologyKey: nvidia.com/gpu.clique
containers:
- name: ctr
image: ubuntu:22.04
command: ["mnnvl-workload"]
resources:
claims:
- name: cd-0 # See `resourceClaims` above.
limits:
nvidia.com/gpu: 4 # Request 4 GPUs.
For clarity, the instance above makes the connection to the previously specified ComputeDomain by declaring resourceClaimTemplateName: compute-domain-0-rct. The concept of the resource claim template may make more sense now: Under the hood, one unique resource claim is generated per pod on this deployment.
The instance above also shows a typical method to be certain that a set of pods gets placed onto nodes which are all a part of the identical NVLink partition (by aligning on the nvidia.com/gpu.clique node label value). When a ComputeDomain is speculated to expand beyond a person NVLink partition, this constraint must be removed or modified.
Complete and comprehensive examples (including a set of acceptance tests that could be run to confirm that ComputeDomains are arrange and dealing appropriately) could be present in the DRA driver documentation.
Known limitations and future work
Version 25.8.0 of the NVIDIA DRA Driver for GPUs includes significant improvements for ComputeDomains. Beyond that, more enhancements are on the roadmap toward more flexible scheduling and ease-of-use.Â
Listed here are two of the currently known limitations and the work planned to alleviate them:
- Currently, just one pod per node could be a part of any given ComputeDomain. Users have to pay attention to what number of GPUs can be found in a node, after which typically grab all of them from inside a single workload pod. The applying in that pod then must subdivide its work across those GPUs. We’re planning to remove this constraint to make the notion of individual nodes less relevant. It’ll then be possible for the applying to be composed of many single-GPU pods that will or is probably not placed next to one another on the identical node. In that mode, the unit of interest is the person GPU, and never the person node—node boundaries turn into almost transparent.
- Currently, at most one ComputeDomain is supported per node. This constraint is predicated on the alternative of providing each workload with its dedicated IMEX domain (and the proven fact that there could be at most one IMEX daemon running per node). If a ComputeDomain occupies only a fraction of a node’s set of GPUs, the remaining GPUs in that node can’t be a part of another ComputeDomain. For instance, a six-GPU ComputeDomain in a GB200 rack would at all times render various GPUs unavailable for other ComputeDomains (two in one of the best case, 18 within the worst case). Lifting that constraint allows for increased resource utilization on the one hand but, on the flipside, may weaken fault isolation between workloads. No universal treatment exists, and we are going to allow users to choose their sweet spot within the trade-off spectrum between cost efficiency and isolation strength. This work is planned and tracked here.
Additional initiatives are in progress, for instance to further enhance robustness at scale and to enhance overall debuggability. Follow the issue tracker on GitHub and browse the milestone view for an up-to-date peek into the roadmap. We also encourage you to submit questions, bug reports, and requests for enhancements to the difficulty tracker.
Summary
As advanced multi‑node GPU architectures like NVIDIA  GB200 NVL72 begin to push the boundaries of what’s possible in high‑performance AI infrastructure, Kubernetes needs abstractions that may understand and manage the topology of those modern GPU systems. ComputeDomains address this challenge by bridging low‑level constructs corresponding to NVLink and IMEX domains with Kubernetes‑native scheduling and DRA.
ComputeDomains dynamically form, manage, and tear down IMEX domains as workloads move across the cluster, enabling secure, high‑bandwidth GPU‑to‑GPU connectivity without manual setup. The newest v25.8.0 release of the NVIDIA DRA driver for GPUs extends this model with elasticity and fault tolerance, allowing ComputeDomains to expand or contract with workloads, get better mechanically from node loss, and speed up startup times for distributed jobs.
For infrastructure teams, these changes mean multi-node training and inference on GB200 NVL72 or DGX GB200 systems occur with minimal setup. For developers, it implies that running distributed training or inference across complex, NVLink‑connected GPU fabrics now feels so simple as deploying an ordinary Kubernetes workload. Together, these innovations make ComputeDomains a cornerstone for scalable, topology‑aware AI orchestration on NVIDIA GB200 NVL72 and future platforms.
See the NVIDIA DRA driver for GPUs and its latest v25.8.0 release to start. And take a look at these other resources:
