Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now consist of several distinct components—prefill, decode, vision encoders, key value (KV) routers, and more. As well as, entire agentic pipelines are emerging, where multiple such model instances collaborate to perform reasoning, retrieval, or multimodal tasks.

This shift has modified the scaling and orchestration problem from “run N replicas of a pod” to “coordinate a gaggle of components as one logical system.” Managing such a system requires scaling and scheduling the appropriate pods together, understanding that every component has distinct configuration and resource needs, starting them in a deliberate order, and placing them within the cluster with network topology in mind. Ultimately, the goal is to orchestrate a system and scale components with awareness of their dependencies as a complete, somewhat than one pod at a time.

To deal with these challenges, today we’re announcing that NVIDIA Grove, a Kubernetes API for running modern ML inference workloads on Kubernetes clusters, is now available inside NVIDIA Dynamo as a modular component. Grove is fully open source and available on the ai-dynamo/grove GitHub repo.

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

How NVIDIA Grove orchestrates inference as a complete

Multilevel autoscaling for interdependent components

System-level lifecycle management with recovery and rolling updates

Flexible hierarchical gang scheduling

Topology-aware scheduling

Role‑aware orchestration and explicit startup ordering

Grove primitives

How you can start with Grove using Dynamo

Prerequisites

Step 1: Create a namespace

Step 2: Install Dynamo CRDs and Dynamo Operator with Grove

Step 3: Confirm Grove installation

Step 4: Create the DynamoGraphDeployment configuration

Step 5: Deploy the configuration

Step 6: Confirm the deployment

Step 7: Test the deployment

Ready for more?

Acknowledgments

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

The best way to Leverage Slash Commands to Code Effectively

Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Automobile Example

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

How NVIDIA Grove orchestrates inference as a complete

Multilevel autoscaling for interdependent components

System-level lifecycle management with recovery and rolling updates

Flexible hierarchical gang scheduling

Topology-aware scheduling

Role‑aware orchestration and explicit startup ordering

Grove primitives

How you can start with Grove using Dynamo

Prerequisites

Step 1: Create a namespace

Step 2: Install Dynamo CRDs and Dynamo Operator with Grove

Step 3: Confirm Grove installation

Step 4: Create the DynamoGraphDeployment configuration

Step 5: Deploy the configuration

Step 6: Confirm the deployment

Step 7: Test the deployment

Ready for more?

Acknowledgments

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.