How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

Today’s demanding AI developer workloads often need more memory than desktop systems provide or require access to software that laptops or PCs lack. This forces work to be moved to the cloud or data center.

NVIDIA DGX Spark provides a substitute for cloud instances and data-center queues. The Blackwell-powered, compact supercomputer comprises 1 petaflop of FP4 AI computer performance, 128 GB of coherent unified system memory, memory bandwidth of 273 GB/second, and the NVIDIA AI software stack preinstalled. With DGX Spark, you may work with large, compute intensive tasks locally, without moving to the cloud or data center.

We’ll walk you thru how DGX Spark’s compute performance, large memory, and preinstalled AI software speed up fine-tuning, image generation, data science, and inference workloads. Keep reading for some benchmarks.

Positive-tuning
Model	Method	Backend	Configuration	Peak tokens/sec
Llama 3.2 3B	Full positive tuning	PyTorch	Sequence length: 2048 Batch size: 8 Epoch: 1 Steps: 125BF16	82,739.20
Llama 3.1 8B	LoRA	PyTorch	Sequence length: 2048 Batch size: 4 Epoch: 1 Steps: 125BF16	53,657.60
Llama 3.3 70B	QLoRA	PyTorch	Sequence length: 2048 Batch size: 8 Epoch: 1 Steps: 125FP4	5,079.04

Image generation
Model	Precision	Backend	Configuration	Images/min
Flux.1 12B Schnell	FP4	TensorRT	Resolution: 1024×1024 Denoising steps: 4 Batch size: 1	23
SDXL1.0	BF16	TensorRT	Resolution: 1024×1024 Denoising steps: 50 Batch size: 2	7

Data science
Library	Benchmark	Dataset size	Time
NVIDIA cuML	UMAP	250 MB	4 secs
NVIDIA cuML	HDBSCAN	250 MB	10 secs
NVIDIA cuDF pandas	Key data evaluation operations (joins, string methods, UDFs)	0.5 to five GB	11 secs

Inference (ISL\|OSL= 2048\|128, BS=1)
Model	Precision	Backend	Prompt processing throughput (tokens/sec)	Token generation throughput (tokens/sec)
Qwen3 14B	NVFP4	TRT-LLM	5928.95	22.71
GPT-OSS-20B	MXFP4	llama.cpp	3670.42	82.74
GPT-OSS-120B	MXFP4	llama.cpp	1725.47	55.37
Llama 3.1 8B	NVFP4	TRT-LLM	10256.9	38.65
Qwen2.5-VL-7B-Instruct	NVFP4	TRT-LLM	65831.77	41.71
Qwen3 235B (on dual DGX Spark)	NVFP4	TRT-LLM	23477.03	11.73

How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

Positive-tuning workloads on DGX Spark

DGX Spark’s image-generation capabilities

Using DGX Spark for data science

Using DGX Spark for inference

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Reconstruct a Scene in NVIDIA Isaac Sim Using Only a Smartphone

A guide to Efficient Multi-GPU Training

a tool to work with datasets using open AI models!

Solve Linear Programs Using the GPU-Accelerated Barrier Method in NVIDIA cuOpt

How Good are LLMs at Text-Based Video Games?

How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

Positive-tuning workloads on DGX Spark

DGX Spark’s image-generation capabilities

Using DGX Spark for data science

Using DGX Spark for inference

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.