Giving back to tech

GPU sharing

Posted on Fri 05 April 2024 in Nvidia, GPU, Kubernetes • Tagged with Nvidia, GPU, Kubernetes

I've found this interesting post from 2022 by Nvidia about GPU sharing in Kubernetes.

The main GPU sharing technologies can be summarized in this table:

Technology	Description	MicroArchitecture	CUDA Version
CUDA Streams	Allows concurrent operations within a single CUDA context using software abstraction.	Pascal and later	Not specified
Time-Slicing	Oversubscription strategy using the GPU's time-slicing scheduler.	Pascal and later	11.1 (R455+ drivers)
CUDA MPS	MPS (Multi-Process Service) enables concurrent processing of CUDA kernels from different processes, typically MPI jobs.	Not specified	11.4+
MIG	MIG (Multi-Instance GPU) is a secure partitioning of GPUs into separate instances for dedicated resources.	Ampere Architecture	Not specified
NVIDIA vGPU	Provides VMs with simultaneous, direct access to a single physical GPU.	Compatible with MIG-supported GPUs	Not specified

The post also explains how GPUs are advertised as schedulable resources in Kubernetes with the device plugin framework, but it is a integer-based resource, so it does not allow for oversuscription. They describe a way of achieving this with time-slicing APIs.