Latest Software and Model Optimizations Supercharge NVIDIA DGX Spark

-


Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and shut collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and artistic workflows.

At CES 2026, the newest DGX Spark software release, combined with latest model updates, and open-source libraries provide significant performance improvements for each DGX Spark and OEM GB10-based systems.

Scaling large models locally with unified memory and NVFP4

DGX Spark is designed for working locally with large models, featuring 128GB of unified memory in a compact desktop form factor. Two DGX Spark systems will be connected to deliver 256GB of combined memory, enabling developers to run even larger models locally. 

The systems connect using ConnectX-7 networking, providing 200 Gbps of bandwidth for fast, low-latency multi-node workloads. 

Support for the NVIDIA NVFP4 data format enables next-generation models to dramatically reduce memory footprint while boosting throughput. For instance, running the Qwen-235B model using NVFP4 precision and speculative decoding delivers as much as a 2.6x performance increase compared with FP8 execution on the identical dual DGX Spark configuration. 

With FP8 precision, the model saturates the combined memory of two systems, limiting multitasking and overall responsiveness. Quantizing to NVFP4 reduces memory usage by roughly 40% while maintaining high accuracy, allowing developers to attain FP8-equivalent results with significantly higher performance and enough free memory to run multiple other workloads concurrently. The result’s a noticeably more responsive and productive local AI development experience.

Open-source collaboration drives additional performance gains

NVIDIA’s collaborations with open-source software partners continues to push performance further. Llama.cpp updates deliver a mean 35% performance uplift when running mixture-of-experts (MoE) models on DGX Spark – improving each throughput and efficiency for popular open-source workflows.

Bar chart showing performance improvements using DGX Spark.Bar chart showing performance improvements using DGX Spark.
Figure 1. DGX Spark Llama.cpp performance improvements

A robust desktop platform for creators

While DGX Spark is an exceptional platform for AI developers, creators may benefit from its desktop-class capabilities. 

By offloading AI workloads to DGX Spark, creators liberate their laptop or PC to stay responsive while content is being generated. With 128GB of unified memory, DGX Spark can run large models resembling GPT-OSS-120B or FLUX 2 (90GB) at full precision, enabling the highest-quality outputs without compromise.

Leading diffusion models, including FLUX.2 from Black Forest Labs and Qwen-Image from Alibaba, leverage NVFP4 to cut back memory footprint while delivering higher performance.

AI video generation is especially well-suited for DGX Spark, because it demands each significant memory and compute. The brand new LTX-2 audio-video generation model from Lightricks, featuring NVFP8-optimized weights, delivers substantial performance gains over the previous generation, making high-quality video generation practical on the desktop.

DGX Spark now included within the NVIDIA-Certified Systems program

The NVIDIA-Certified Systems program validates system performance across a big selection of accelerated graphics, compute and AI workloads.  NVIDIA-Certified Systems provide a trusted foundation for AI development, desktop inference, data science, design, and content creation workloads, while also augmenting data center, and cloud resources. 

DGX Spark and OEM GB10-based systems at the moment are included in this system, with DGX Spark and partner systems currently in testing.

Latest playbooks to allow you to start faster

To assist developers get productive immediately, we’re releasing a brand new set of DGX Spark playbooks that showcase what’s possible with the Blackwell GPU. These playbooks concentrate on practical, hands-on workflows you possibly can try immediately, including:

  • Nemotron 3 Nano: Run NVIDIA’s efficient 30B-parameter MoE model locally for LLM experimentation. 
  • Live VLM WebUI: Stream webcam input into vision-language models for real-time evaluation, with GPU utilization. 
  • Isaac Sim / Lab: Construct and train robotics applications using GPU-accelerated simulation and reinforcement learning. 
  • SGLang and vLLM serving playbooks: Now include a transparent model support matrix showing tested and supported models and quantization options.
  • GPU-accelerated quantitative finance and genomics playbooks: Workflows with minimal code changes in comparison with CPU implementations. 
  • Positive-tune with PyTorch: Distributed fine-tuning across two DGX Spark systems for LLMs as much as 70B parameters using FSDP and LoRA.
  • Speculative Decoding: A brand new EAGLE-3 with GPT-OSS-120B example uses a built-in drafting head as a substitute of a separate draft model, simplifying deployment and increasing token acceptance rates.

Each playbook is designed to be straightforward and dependable, with clear steps, practical troubleshooting guidance, and configurations validated on the newest DGX OS, so you possibly can spend less time establishing and more time constructing.

Access your DGX Spark from anywhere with NVIDIA Brev

With NVIDIA Brev, your DGX Spark is accessible from anywhere through a secure connection. Brev enables developers to simply spin up AI cloud instances and benefit from Launchables, using a single click to establish AI environments. At CES, updates to Brev demonstrated the power to register local compute, resembling DGX Spark. Once registered with Brev, you possibly can access your DGX Spark from anywhere. You may also securely share access together with your team.

Brev enables hybrid deployment between local and cloud models. Using a router layer, you possibly can keep sensitive tasks, resembling email or proprietary data processing, on a neighborhood open model running on DGX Spark, while routing general reasoning to frontier models within the cloud. See the NVIDIA LLM Router developer example for implementation details.

Brev support for local compute will probably be previewed at CES with official support coming within the spring 2026.

Bring your individual agent to life

Able to take it further? NVIDIA and Hugging Face have partnered to point out how you possibly can construct a private desktop AI companion. Using DGX Spark with Reachy Mini, you possibly can create a non-public AI assistant that processes your data privately. Try the NVIDIA and Hugging Face tutorial to start.

Join the DGX Spark developer community, and begin your AI-building journey today. 



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x