GPU sharing

Posted on Fri 05 April 2024 in Nvidia, GPU, Kubernetes

I've found this interesting post from 2022 by Nvidia about GPU sharing in Kubernetes.

The main GPU sharing technologies can be summarized in this table:

Technology Description MicroArchitecture CUDA Version
CUDA Streams Allows concurrent operations within a single CUDA context using software abstraction. Pascal and later Not specified
Time-Slicing Oversubscription strategy using the GPU's time-slicing scheduler. Pascal and later 11.1 (R455+ drivers)
CUDA MPS MPS (Multi-Process Service) enables concurrent processing of CUDA kernels from different processes, typically MPI jobs. Not specified 11.4+
MIG MIG (Multi-Instance GPU) is a secure partitioning of GPUs into separate instances for dedicated resources. Ampere Architecture Not specified
NVIDIA vGPU Provides VMs with simultaneous, direct access to a single physical GPU. Compatible with MIG-supported GPUs Not specified

The post also explains how GPUs are advertised as schedulable resources in Kubernetes with the device plugin framework, but it is a integer-based resource, so it does not allow for oversuscription. They describe a way of achieving this with time-slicing APIs.