GPU sharing
Posted on Fri 05 April 2024 in Nvidia, GPU, Kubernetes
I've found this interesting post from 2022 by Nvidia about GPU sharing in Kubernetes.
The main GPU sharing technologies can be summarized in this table:
Technology | Description | MicroArchitecture | CUDA Version |
---|---|---|---|
CUDA Streams | Allows concurrent operations within a single CUDA context using software abstraction. | Pascal and later | Not specified |
Time-Slicing | Oversubscription strategy using the GPU's time-slicing scheduler. | Pascal and later | 11.1 (R455+ drivers) |
CUDA MPS | MPS (Multi-Process Service) enables concurrent processing of CUDA kernels from different processes, typically MPI jobs. | Not specified | 11.4+ |
MIG | MIG (Multi-Instance GPU) is a secure partitioning of GPUs into separate instances for dedicated resources. | Ampere Architecture | Not specified |
NVIDIA vGPU | Provides VMs with simultaneous, direct access to a single physical GPU. | Compatible with MIG-supported GPUs | Not specified |
The post also explains how GPUs are advertised as schedulable resources in Kubernetes with the device plugin framework, but it is a integer-based resource, so it does not allow for oversuscription. They describe a way of achieving this with time-slicing APIs.