Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

NVIDIA Run:ai v2.24 introduces time-based fairshare, a brand new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to Kubernetes clusters. This capability, built on the open source KAI Scheduler that powers NVIDIA Run:ai, addresses a long-standing challenge in shared GPU infrastructure.

Consider two teams with equal priority sharing a cluster. Team A repeatedly submits smaller jobs, while Team B must run a bigger job that requires more resources. Each time resources unencumber, the smaller jobs from Team A fit immediately and get scheduled. The larger job from Team B continues to attend for enough resources to change into available. Before that happens, the following small job from Team A claims the freed capability. The result: although each teams have equivalent priority and entitlements, Team A runs job after job while the job from Team B sits within the queue indefinitely.

Time-based fairshare solves this problem by giving the scheduler memory. As an alternative of calculating justifiable share at a single easy, the scheduler now tracks historical resource usage and adjusts each queue’s share based on past consumption. Teams which have used more resources recently receive lower scores for over-quota allocation, while teams which were waiting receive a lift.

Time-based fairshare ends in proportional compute time over days and weeks. This permits true time-sharing of GPU resources, burst access for infrequent large jobs, and resource planning that aligns with weekly or monthly GPU-hour budgets. Importantly, guaranteed quotas and queue priorities proceed to work exactly as before.

This post explains the issue in additional detail, walks through a real-world use case, and demonstrates find out how to enable time-based fairshare in NVIDIA Run:ai and KAI Scheduler.

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

Why is over-quota GPU resource fairness necessary?

How does stateless justifiable share scheduling work?

How does time-based fairshare work?

How is time-based fairshare calculated?

Example scenario: One cluster, multiple workload types

The issue: Burst access becomes blocked

Without time-based fairshare

With time-based fairshare

Start with NVIDIA Run:ai time-based fairshare

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Federated Learning, Part 2: Implementation with the Flower Framework 🌼

We Got Claude to Construct CUDA Kernels and teach open models!

What AI “remembers” about you is privacy’s next frontier

Users flock to open source Moltbot for always-on AI, despite major risks

How one can Install and Use the Hugging Face Unity API

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

Why is over-quota GPU resource fairness necessary?

How does stateless justifiable share scheduling work?

How does time-based fairshare work?

How is time-based fairshare calculated?

Example scenario: One cluster, multiple workload types

The issue: Burst access becomes blocked

Without time-based fairshare

With time-based fairshare

Start with NVIDIA Run:ai time-based fairshare

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.