Profiling with nvprof

Posted on Fri 19 April 2024 in Nvidia, GPU, Profiling, Jetson • Tagged with Nvidia, GPU, Profiling, Jetson

I'm working on a Jetson Nano and I want to profile my code. Nvidia provides several tools, being Nsight Compute the most powerful one. However, it cannot be run on the Jetson Nano, so I have resorted to using nvprof.

nvprof is a command-line profiler that can be used to profile CUDA applications. It is included in the CUDA Toolkit, so you should have it installed if you have CUDA installed. However, you need to be root to run it. As root, I don't have the CUDA environment set up, so I need use the full path. These are two commands that I've found useful:

/usr/local/cuda/bin/nvprof --print-gpu-trace ./my_program

This shows the GPU trace of the program. It is useful to see how the GPU is being used.

/usr/local/cuda/bin/nvprof --metrics all ./my_program

This shows all the metrics that nvprof can measure. It is useful to see how the program is using the GPU.

In addition, you can use the --log-file option to save the output to a file …


Continue reading

GPU sharing

Posted on Fri 05 April 2024 in Nvidia, GPU, Kubernetes • Tagged with Nvidia, GPU, Kubernetes

I've found this interesting post from 2022 by Nvidia about GPU sharing in Kubernetes.

The main GPU sharing technologies can be summarized in this table:

Technology Description MicroArchitecture CUDA Version
CUDA Streams Allows concurrent operations within a single CUDA context using software abstraction. Pascal and later Not specified
Time-Slicing Oversubscription strategy using the GPU's time-slicing scheduler. Pascal and later 11.1 (R455+ drivers)
CUDA MPS MPS (Multi-Process Service) enables concurrent processing of CUDA kernels from different processes, typically MPI jobs. Not specified 11.4+
MIG MIG (Multi-Instance GPU) is a secure partitioning of GPUs into separate instances for dedicated resources. Ampere Architecture Not specified
NVIDIA vGPU Provides VMs with simultaneous, direct access to a single physical GPU. Compatible with MIG-supported GPUs Not specified

The post also explains how GPUs are advertised as schedulable resources in Kubernetes with the device plugin framework, but it is a integer-based resource, so it does not allow for oversuscription. They describe a way of achieving this with time-slicing APIs.


How GPGPU came to exist

Posted on Tue 05 March 2024 in GPGPU, CUDA, Nvidia, GPU • Tagged with GPGPU, CUDA, Nvidia, GPU

I've been reading about the history of GPU computing in chapter 2 of "Massively Parallel Processors" by David B. Kirk and Wen-mei W. Hwu. It's a fascinating story of how the GPU came to be used for general-purpose computing.

The story begins in the 1990s, when the first consumer 3D graphics cards were being developed. These cards were designed to accelerate the rendering of 3D graphics for video games. They were able to do this by offloading the rendering work from the CPU to the GPU, which was specifically designed for this task.

The first GPUs were fixed-function, meaning that they could only perform a limited set of operations. However, as the demand for more realistic and complex graphics grew, the capabilities of the GPU were expanded. This led to the development of programmable shaders, which allowed developers to write custom code to control the rendering process. In the beginning, these shaders were still limited to graphics-related tasks and there were different kinds, such as vertex shaders and pixel shaders.

One of the questions I …


Continue reading