GPU sharing

Posted on Fri 05 April 2024 in Nvidia, GPU, Kubernetes • Tagged with Nvidia, GPU, Kubernetes

I've found this interesting post from 2022 by Nvidia about GPU sharing in Kubernetes.

The main GPU sharing technologies can be summarized in this table:

Technology Description MicroArchitecture CUDA Version
CUDA Streams Allows concurrent operations within a single CUDA context using software abstraction. Pascal and later Not specified
Time-Slicing Oversubscription strategy using the GPU's time-slicing scheduler. Pascal and later 11.1 (R455+ drivers)
CUDA MPS MPS (Multi-Process Service) enables concurrent processing of CUDA kernels from different processes, typically MPI jobs. Not specified 11.4+
MIG MIG (Multi-Instance GPU) is a secure partitioning of GPUs into separate instances for dedicated resources. Ampere Architecture Not specified
NVIDIA vGPU Provides VMs with simultaneous, direct access to a single physical GPU. Compatible with MIG-supported GPUs Not specified

The post also explains how GPUs are advertised as schedulable resources in Kubernetes with the device plugin framework, but it is a integer-based resource, so it does not allow for oversuscription. They describe a way of achieving this with time-slicing APIs.

Using IPVS in kube-proxy with eksctl

Posted on Mon 20 June 2022 in kubernetes, eksctl, kube-proxy • Tagged with kubernetes, eksctl, kube-proxy

I have a kubernetes cluster launched with eksctl. I can get the configuration of kube-proxy with:

kubectl edit configmap kube-proxy-config -n kube-system

I see that the default configuration uses the iptables mode. In order to change it, the mode parameter has to be changed to ipvs and the scheduler parameter in the ipvs section, which is initially empty, has to be assigned one of these policies:

  • rr: round-robin
  • lc: least connection
  • dh: destination hashing
  • sh: source hashing
  • sed: shortest expected delay
  • nq: never queue

Notice that the corresponding kernel modules must be present in the working node. You can connect with ssh to the node and check with modules are loaded with:

lsmod | grep ip_vs

In order to apply the configuration, kube-proxy has to be restarted with this command:

kubectl rollout restart ds kube-proxy -n kube-system

I get this:

ip_vs_sh               16384  0
ip_vs_wrr              16384  0
ip_vs_rr               16384  0
ip_vs                 176128  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack          163840  8 xt_conntrack,nf_nat,xt_state,xt_nat,nf_conntrack_netlink,xt_connmark,xt_MASQUERADE,ip_vs
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs

This means that the modules …

Continue reading

Pinning CPUs in Kubernetes using full-pcpus-only with eksctl

Posted on Mon 16 May 2022 in kubernetes, eksctl • Tagged with kubernetes, eksctl

I was trying to use the option full-pcpus-only with eksctl and I was not having luck. In the end, I was able to do it by using this cluster.yaml configuration file:

kind: ClusterConfig

  name: k8s-Stokholm-Cluster
  region: eu-north-1

  - name: ng-1
    instanceType: c5.4xlarge
    desiredCapacity: 1
      publicKeyPath: /home/joaquin/k8s/
      cpuManagerPolicy: static
        full-pcpus-only: "true"
        cpu: "300m"
        memory: "300Mi"
        ephemeral-storage: "1Gi"
      kubeReservedCgroup: "/kube-reserved"
        cpu: "300m"
        memory: "300Mi"
        ephemeral-storage: "1Gi"
        CPUManager: true
        CPUManagerPolicyOptions: true

When my file had not the correct options, the problem I was seeing was that eksctl got stuck with the message:

waiting for at least 1 node(s) to become ready in "ng-1"

For debugging the errors, I connected by ssh to the EC2 instance that was created and I check the logs of the kubelet service with this command:

journalctl -u kubelet.service

In order to have the CPUs pinned to a physical CPU, I had to make the requests and the limits equal (both for CPU and memory …

Continue reading

Reading traces from a file in k6

Posted on Wed 24 November 2021 in k6 • Tagged with k6, kubernetes

I wanted to read a trace of requests per second I have in a file and use it as the injection pattern in k6. I could do it by reading the values into an array, which is then used as the stages in a ramping arrival rate excutor, like this:

import http from 'k6/http';
import papaparse from '';
import { SharedArray } from 'k6/data';

const trace_file = 'PATH_TO_THE_TRACE_FILE';
const trace = new SharedArray('another data name', function () {
  return papaparse.parse(open(trace_file)).data;

var stages = []
for (var i of trace) {
  stages.push({ target: trace[i][0], duration: "1s" })

export const options = {
  discardResponseBodies: true,
  scenarios: {
    contacts: {
      executor: 'ramping-arrival-rate',
      startRate: 1,
      timeUnit: '1s',
      preAllocatedVUs: 50,
      maxVUs: 1000,
      stages: stages

export default function () {
  const url = "http://localhost:8555";

Integrating Prometheus, InfluxDB and Grafana

Posted on Wed 29 September 2021 in Kubernetes • Tagged with Kubernetes, Prometheus, Telegraf, InfluxDB, Grafana

I've got a Kubernetes cluster prepared to be be integrated with Prometheus, i.e., all relevant information is exposed with /metrics and scraped by a Prometheus instance. I want to save the information long term and I've decided that Influx DB is the best option for that. In addition, I want to create dashboards using Grafana.

This would require moving the information from Prometheus to Influx DB. According to this, for Influx DB v1, this would be accomplished by using directly remote writes in Prometheus. However, Influx DB v2 doesn't allow this way of working. Instead, Telegraf has to be inserted in the middle. Therefore, the whole pipeline would be Prometheus 🠮 Telegraf 🠮 Influx DB 🠮 Grafana.

I'm using EKS to deploy the cluster, but these instructions should work with any other Kubernetes cluster. I'll only assume that kubectl is already configured to work with your cluster. We are going to deploy many services, so you may require new nodes in your cluster.


We'are going to use Helm for installing all the components, so first we …

Continue reading