All posts
·2 min read

The Complete Guide to Kubernetes GPU Scheduling

How to configure Kubernetes for GPU workloads — from device plugins to topology-aware scheduling. Plus, where default K8s scheduling falls short and how to fix it.

DeepLM Team
Engineering

GPU Scheduling in Kubernetes: The Basics

Kubernetes supports GPU scheduling through device plugins. NVIDIA's k8s-device-plugin is the most common, but AMD and Intel have their own implementations.

The basic flow:

resources:
  limits:
    nvidia.com/gpu: 2

This requests 2 GPUs for a pod. Kubernetes finds a node with 2 available GPUs and schedules the pod there. Simple — and insufficient.

Where Default Scheduling Fails

No Topology Awareness

Default K8s scheduling doesn't consider GPU topology. Two GPUs on the same NVLink bridge perform dramatically differently than two GPUs connected via PCIe. For distributed training, this matters enormously.

No Fractional GPU Support

A pod requesting 1 GPU gets an entire GPU, even if it only needs 4GB of a 80GB A100. Multi-Instance GPU (MIG) helps, but configuration is manual and static.

No Workload-Aware Placement

K8s doesn't know that your inference workload would perform identically on a cheaper GPU, or that your training job needs high-bandwidth interconnect. All GPUs are treated as equivalent.

No Cross-Node Optimization

Scheduling decisions are per-pod. There's no global view of fleet utilization or ability to rebalance running workloads.

Better GPU Scheduling

DeepLM's scheduler integrates with K8s as a secondary scheduler:

schedulerName: deeplm-scheduler

It adds:

  • Topology-aware placement — respects NVLink, PCIe, and cross-node fabric
  • Workload profiling — learns GPU utilization patterns per job type
  • Dynamic rebalancing — suggests or executes workload migration
  • Multi-vendor support — schedules across NVIDIA, AMD, Intel from one API

Getting Started

Install the DeepLM scheduler on any K8s cluster with GPU nodes:

helm install deeplm-scheduler deeplm/scheduler

Or start with DeepLM Insights for observability before switching schedulers.

Further Reading

Optimize your GPU fleet

Try DeepLM Insights — free, open source GPU observability.

Try Now