NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA CUDA 13.1 introduces the biggest and most comprehensive update to the CUDA platform because it was invented 20 years ago.

On this release, you’ll find recent features and updates for improving performance and driving accelerated computing, including:

The launch of NVIDIA CUDA Tile, our tile-based programming model for abstracting away specialized hardware, including tensor cores.
Runtime API exposure of green contexts.
Emulation for double and single precisions in NVIDIA cuBLAS.
A totally rewritten CUDA programming guide, designed for each novice and advanced CUDA programmers.

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

CUDA Tile programming

CUDA software updates

Runtime exposure of green contexts

CUDA Multi-Process Service updates

Memory locality optimization partition

Static streaming multiprocessor partitioning

Emulation for double and single precisions in cuBLAS

CUDA Tile kernel profiling

Compile-time patching

NVIDIA Nsight Systems

Math libraries

cuBLAS Blackwell performance

cuSOLVER Blackwell performance

NVIDIA CUDA Core Compute Libraries

Deterministic floating-point reductions

More convenient single-phase CUB APIs

Learn more

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Statement from Dario Amodei on our discussions with the Department of War Anthropic

Google quantum-proofs HTTPS by squeezing 2.5kB of information into 64-byte space – Ars Technica

Generative AI, Discriminative Human

Featured video: Coding for underwater robotics

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

CUDA Tile programming

CUDA software updates

Runtime exposure of green contexts

CUDA Multi-Process Service updates

Memory locality optimization partition

Static streaming multiprocessor partitioning

Emulation for double and single precisions in cuBLAS

CUDA Tile kernel profiling

Compile-time patching

NVIDIA Nsight Systems

Math libraries

cuBLAS Blackwell performance

cuSOLVER Blackwell performance

NVIDIA CUDA Core Compute Libraries

Deterministic floating-point reductions

More convenient single-phase CUB APIs

Learn more

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.