CUDA 13.2 Introduces Enhanced CUDA Tile Support and Recent Python Features

CUDA 13.2 arrives with a serious update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), in addition to 10.X and 12.X architectures (NVIDIA Blackwell). In an upcoming release of the CUDA Toolkit, all GPU architectures starting with Ampere might be fully supported. In the event you’re using Ampere, Ada, or Blackwell GPU architectures, take a look at the cuTile Python Quickstart guide to start with CUDA Tile.

This post explores the CUDA 13.2 release, which boosts developer productivity with a wide range of latest Python additions, including profiling in CUDA Python and debugging Numba kernels. The mathematics libraries provide expanded support for high-performance emulated libraries, and CUDA Core Compute Libraries (CCCL) proceed so as to add each performance and have improvements, providing C++ developers with a high-performance, modern interface to GPU programming.

CUDA 13.2 Introduces Enhanced CUDA Tile Support and Recent Python Features

cuTile Python

Core enhancements

`memcpy` with attributes

Query the properties of a memory pool

Windows compute drivers default to MCDM as a substitute of TCC

`CUDA_DISABLE_PERF_BOOST`

CUDA Graphs polymorphic function to acquire graph node parameters

Compilers

Embedded devices

Math libraries

NVIDIA cuBLAS

NVIDIA cuSOLVER

NVIDIA Nsight Python

Numba-CUDA debugging

NVIDIA Nsight Tools updates

CCCL

Modern CUDA C++ runtime

Recent algorithms

Top-K selection

Fixed-size segmented reduction

More latest algorithms in CCCL 3.2

CUDA Python

Start with CUDA 13.2

Acknowledgments

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The Map of Meaning: How Embedding Models “Understand” Human Language

Compact Multimodal Intelligence for Enterprise Documents

Shifting to AI model customization is an architectural imperative

Post-Training Library That Holds When the Field Invalidates Its Own Assumptions

Turning 127 Million Data Points Into an Industry Report

CUDA 13.2 Introduces Enhanced CUDA Tile Support and Recent Python Features

cuTile Python

Core enhancements

memcpy with attributes

Query the properties of a memory pool

Windows compute drivers default to MCDM as a substitute of TCC

CUDA_DISABLE_PERF_BOOST

CUDA Graphs polymorphic function to acquire graph node parameters

Compilers

Embedded devices

Math libraries

NVIDIA cuBLAS

NVIDIA cuSOLVER

NVIDIA Nsight Python

Numba-CUDA debugging

NVIDIA Nsight Tools updates

CCCL

Modern CUDA C++ runtime

Recent algorithms

Top-K selection

Fixed-size segmented reduction

More latest algorithms in CCCL 3.2

CUDA Python

Start with CUDA 13.2

Acknowledgments

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

`memcpy` with attributes

`CUDA_DISABLE_PERF_BOOST`

What are your thoughts on this topic?
Let us know in the comments below.