Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

NVIDIA flagship data center GPUs within the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single memory space. Most programs due to this fact would not have a problem with memory non-uniformity. Nonetheless, as bandwidth increases in newer generation GPUs, there are significant performance and power gains available when bearing in mind compute and data locality.

This post first analyzes the memory hierarchy of the NVIDIA GPUs, discussing the ability and performance impacts of information transfer over die-to-die link. It then reviews how one can use NVIDIA Multi-Instance GPU (MIG) mode to attain data localization. Finally, it presents results for running MIG mode versus unlocalized for the Wilson-Dslash stencil operator use case.

No items found.