
Kubernetes blind to NVLink GPUs, quietly taxing your AI jobs 40% slower—because who needs that bandwidth you paid for?
Kubernetes users with GPU-accelerated workloads may be experiencing a hidden performance tax due to suboptimal GPU placement. NVLink, a high-bandwidth interconnect, can provide up to 900 GB/s of communication between GPUs, but Kubernetes' default scheduler is not NVLink-aware. This can result in up to 40% degradation in multi-GPU communication performance. To address this, a Kubernetes scheduler plugin has been developed to score nodes based on NVLink topology awareness, integrating with NVIDIA's Data Center GPU Manager via Prometheus. The plugin considers factors such as domain integrity, node-workload matching, domain completion, and resource efficiency, assigning scores to optimize placement. By keeping workloads within NVLink domain boundaries, the plugin can improve cluster-wide efficiency and reduce fragmentation. This development is significant for industries relying on distributed training workloads, such as AI and deep learning, where fast GPU-to-GPU communication is crucial. The plugin's real-time allocation data and customizable scoring weights enable operators to optimize their clusters for specific workload mixes.