Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

Can AI coding assistants write efficient CUDA code? To assist measure and improve their capabilities, we created ComputeEval, a strong, open source benchmark for evaluating AI models and agents on CUDA programming tasks.

Just a few months ago, we announced the first release of ComputeEval and today, we’re introducing its first major expansion by adding greater than 100 latest CUDA challenges.

With this release, the dataset has grown to a complete of 232 of CUDA and CUDA Compute Core Libraries (CCCL) problems. We deliberately raised the bar by adding harder challenges that require LLMs to make use of modern CUDA features, equivalent to Tensor Cores, advanced shared memory patterns, and warp-level primitives. The brand new problems test the power to accurately orchestrate features like CUDA Graphs, Streams, and Events. All throughout the context of real-world applications like dynamic simulations.

Model	ComputeEval 2025.2 232 latest problems pass@1	ComputeEval 2025.1 128 problems pass@1
GPT-5 (medium)	0.5819	0.61
Claude Sonnet 4.0	0.5517	0.64
gpt-oss-20B (high)	0.5474	N/A
gpt-oss-120b (high)	0.5302	N/A
Claude Opus 4.0	0.5216	N/A
DeepSeek-R1	0.4397	0.55
gpt-oss-120b (medium)	0.4224	N/A
gpt-oss-20b (medium)	0.4224	N/A
gpt-oss-120b (low)	0.4052	N/A
DeepSeek-V3.1	0.3750	0.44
Llama 4 Maverick 17B 128E	0.3448	0.47
Llama 3.1 405B	0.3405	0.4
gpt-oss-20B (low)	0.3319	0.41

Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

LLM performance on CUDA programming

What’s next and the way to get entangled

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Accelerating Large-Scale Mixture-of-Experts Training in PyTorch

Swift Transformers Reaches 1.0 – and Looks to the Future

OpenAI braces for “rough vibes”

Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS

VibeGame: Exploring Vibe Coding Games

Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

LLM performance on CUDA programming

What’s next and the way to get entangled

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.