Streamline AI Infrastructure with NVIDIA Run:ai on Microsoft Azure

-


Modern AI workloads, starting from large-scale training to real-time inference, demand dynamic access to powerful GPUs. Nonetheless, Kubernetes environments have limited native support for GPU management, which results in challenges similar to inefficient GPU utilization, lack of workload prioritization and preemption, limited visibility into GPU consumption, and difficulty enforcing governance and quota policies across teams.

In containerized environments, orchestrating GPU resources effectively helps maximize performance and efficiency. NVIDIA Run:ai simplifies this process with intelligent GPU resource management, enabling organizations to scale AI workloads with speed, agility, and governance. 

On this blog, we’ll explore how NVIDIA Run:ai, now generally available on the Microsoft Marketplace, helps organizations streamline AI infrastructure on Azure. You’ll find out how it optimizes GPU utilization, enforces governance and quotas, and dynamically schedules AI workloads across teams and projects. We’ll also cover its seamless integration with Azure Kubernetes Service, support for hybrid cloud environments, and the tools it provides for managing clusters, node pools, and the complete AI lifecycle. By the tip, you’ll see how NVIDIA Run:ai simplifies AI orchestration, boosts performance, and enables scalable, cost-efficient AI operations.

Managing AI workloads with NVIDIA Run:ai

NVIDIA Run:ai offers a Kubernetes-native AI orchestration platform designed specifically for managing AI and machine-learning workloads. It provides a versatile layer that permits dynamic, policy-based scheduling of GPU resources across teams and workloads. This platform optimizes GPU utilization while enforcing governance, quotas, and workload prioritization.

Key capabilities include:

How NVIDIA Run:ai works on Azure

NVIDIA Run:ai integrates seamlessly with Microsoft Azure’s GPU-accelerated virtual machine (VM) families, optimizing performance and simplifying the management of AI workloads. 

Azure offers a broad collection of GPU-enabled VM families tailored to distinct needs: the NC-family, optimized for compute-intensive and high-performance computing (HPC) tasks; the ND-family, purpose-built for deep learning and AI research; the NG-family, designed for cloud gaming and distant desktop experiences; and the NV-family, focused on visualization, rendering, and virtual desktop workloads. Together, these GPU-powered families provide the flexibleness and performance required to speed up innovation across AI, graphics, and simulation workloads.

These VMs leverage NVIDIA GPUs, including the T4, A10, A100, and H100, H200 and GB200 Grace Blackwell Superchip. Lots of these VMs are equipped with high-speed NVIDIA Quantum InfiniBand networking to deliver the low-latency, high-throughput performance required for advanced AI and deep-learning applications.

On the software side, NVIDIA Run:ai tightly integrates with Azure’s cloud infrastructure to offer a seamless experience for AI workloads. NVIDIA Run:ai leverages Azure Kubernetes Service (AKS) to orchestrate and virtualize GPU resources efficiently across diverse AI projects.

Moreover, NVIDIA Run:ai works with Azure Blob Storage to handle large datasets and model storage, facilitating smooth data access and transfer between on-premises and cloud resources. This close integration allows organizations to maximise GPU utilization while taking full advantage of Azure’s security and storage capabilities.

Desire a visual walkthrough? Watch the demo video for a step-by-step guide to deploying NVIDIA Run:ai on Microsoft Azure.

Running AI workloads with Azure Kubernetes Service (AKS)

Azure Kubernetes Service (AKS) provides a managed Kubernetes environment that simplifies cluster management and scaling. NVIDIA Run:ai enhances AKS by adding an intelligent orchestration layer that dynamically manages GPU resources.

With NVIDIA Run:ai on AKS, AI workloads are scheduled based on real-time priorities and resource availability. This reduces idle GPU time and maximizes throughput by allowing multiple workloads to share GPUs efficiently. It also supports multi-node and multi-GPU training jobs, enabling enterprises to scale their AI pipelines seamlessly.

Teams can use namespaces and quota policies inside AKS to isolate workloads, ensuring fair access and governance. Keep reading for tips about getting began. 

Supporting hybrid infrastructure for today’s businesses

As organizations grow and AI workloads turn into more complex, many corporations are adopting hybrid strategies that mix on-premises data centers with cloud platforms like Azure. This approach allows businesses to maintain sensitive workloads local while leveraging the cloud’s scalability and suppleness for other tasks. Effectively managing resources across these environments is crucial to balancing performance, cost, and control. 

Firms like Deloitte and Dell Technologies have observed that mixing local infrastructure with cloud resources using a hybrid approach with NVIDIA Run:ai improves GPU utilization and enables smoother sharing of compute capability across on-site and cloud environments. Similarly, institutions like John Hopkins University are using NVIDIA Run:ai, running workloads each on-premise and on Azure, to scale their experiments more efficiently, reduce wait times for GPU resources, and enable faster iteration while maintaining control over sensitive data and specialized tools critical for his or her work.

Start on Microsoft Marketplace

NVIDIA Run:ai is obtainable as a non-public offer on Microsoft Marketplace. The private listing ensures flexible deployment, custom licensing, and seamless integration into your existing enterprise agreement. To request a non-public offer:

  1. Visit NVIDIA Run:ai and choose “Get Began.” 
  2. Complete the “Contact Us About NVIDIA Run:ai” form.
  3. An NVIDIA representative will likely be in contact with you to create a tailored private offer.
  4. Once the offer has been accepted, you may connect your AKS cluster to NVIDIA Run:ai by following these steps:
    1. Create an Azure AKS cluster using the instructions provided within the AKS documentation.
    2. Install the NVIDIA Run:ai control plane.
    3. Install the NVIDIA Run:ai cluster.
    4. Access the NVIDIA Run:ai user interface (UI) using your fully qualified domain name and confirm that the cluster status shows “Connected.”

Getting began with NVIDIA Run:ai on Azure

Once deployed in your AKS cluster, NVIDIA Run:ai provides a transparent and comprehensive overview of all of your GPU resources. The dashboard offers real-time insights into cluster health, including GPU availability, lively workloads, and pending tasks. For instance, a cluster with 4 nodes, each hosting eight GPUs, allows you to immediately see which GPUs are idle or in use.

Screenshot of the NVIDIA Run:ai dashboard displaying real-time metrics for an AKS clusterScreenshot of the NVIDIA Run:ai dashboard displaying real-time metrics for an AKS cluster
Figure 1. NVIDIA Run:ai overview dashboard

Once your AKS cluster is connected to the NVIDIA Run:ai control plane, you may access a unified view of all nodes, including CPU and GPU employee nodes. NVIDIA Run:ai supports heterogeneous GPU environments, enabling management of various GPU types similar to A100 and H100 throughout the same cluster.

Screenshot of the NVIDIA Run:ai Control Plane displaying AKS cluster nodes equipped with both NVIDIA H100 and A100 GPUsScreenshot of the NVIDIA Run:ai Control Plane displaying AKS cluster nodes equipped with both NVIDIA H100 and A100 GPUs
Figure 2. NVIDIA Run:ai Control Plane showing AKS nodes with NVIDIA H100s and A100s in the identical cluster.

Optimizing GPU resources across clusters and teams

NVIDIA Run:ai lets you group similar nodes into node pools, enabling refined, contextual based scheduling of workloads. This grouping ensures that tasks are matched with probably the most appropriate GPU or machine type. Node pools may align with Azure scale sets, dynamically adjusting as you add or remove nodes—providing the flexibleness your workloads demand.

Screenshot of the NVIDIA Run:ai Control Plane showing node pools aligned with Azure scale sets, illustrating how GPU resources are organized and managed across different node groups.Screenshot of the NVIDIA Run:ai Control Plane showing node pools aligned with Azure scale sets, illustrating how GPU resources are organized and managed across different node groups.
Figure 3. NVIDIA Run:ai node pools aligned with Azure scale sets

Allocate GPU resources across teams using projects and quotas to optimize utilization. NVIDIA Run:ai guarantees baseline GPU quotas for every team, similar to Teams A, B, and C (as shown in Figure 5 below), while allowing some workloads to burst beyond these limits when resources can be found. The scheduler fairly preempts workloads when obligatory to make sure guaranteed resource access.

Screenshot of the NVIDIA Run:ai dashboard showing GPU allocation across teams using projects and quotasScreenshot of the NVIDIA Run:ai dashboard showing GPU allocation across teams using projects and quotas
Figure 4. NVIDIA Run:ai allocating GPUs across teams using projects and quotas

Supporting the complete AI lifecycle

NVIDIA Run:ai orchestrates workloads across your entire AI lifecycle, from interactive Jupyter notebooks to single-node and multi-node training jobs, in addition to inference workloads. You possibly can run popular frameworks like PyTorch Elastic on dedicated GPU pools or deploy models from Hugging Face and NVIDIA NGC containers natively on the platform. NVIDIA Run:ai also supports NVIDIA Dynamo for dynamic, distributed inference, enabling efficient resource utilization and scalable deployment of AI models across multiple GPUs and nodes.

Screenshot of the NVIDIA Run:ai dashboard showing a list of workloads running on an AKS cluster, including details such as workload name, type (e.g., training or inference), status (e.g., running or pending), and GPU compute information like number of GPUs allocated and usage metricsScreenshot of the NVIDIA Run:ai dashboard showing a list of workloads running on an AKS cluster, including details such as workload name, type (e.g., training or inference), status (e.g., running or pending), and GPU compute information like number of GPUs allocated and usage metrics
Figure 5. View of NVIDIA Run:ai workloads running on AKS

NVIDIA Run:ai provides detailed usage analytics over various time frames, enabling chargeback or showback to different teams or business units. These insights empower IT and management teams to make informed decisions on scaling GPU infrastructure, ensuring optimal performance and cost-efficiency.

Screenshot of the NVIDIA Run:ai Dashboard displaying GPU usage analytics, including graphs and metrics showing GPU utilization over time.Screenshot of the NVIDIA Run:ai Dashboard displaying GPU usage analytics, including graphs and metrics showing GPU utilization over time.
Figure 6. NVIDIA Run:ai Dashboard showing GPU usage analytics

Conclusion

As AI adoption grows, efficient GPU management becomes critical. NVIDIA Run:ai on Azure offers a robust orchestration platform that simplifies GPU resource management and accelerates AI innovation. 

By combining NVIDIA Run:ai’s intelligent scheduling with Azure’s scalable GPU infrastructure and AI tools, organizations gain a unified, enterprise-ready solution that drives productivity and price efficiency.

Explore NVIDIA Run:ai on Microsoft Marketplace to experience seamless AI infrastructure management and speed up your AI journey.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x