Unlock Faster, Smarter Edge Models with 7x Gen AI Performance on NVIDIA Jetson AGX Thor

A defining strength of the NVIDIA software ecosystem is its commitment to continuous optimization. In August, NVIDIA Jetson AGX Thor launched, with as much as a 5x boost in generative AI performance over NVIDIA Jetson AGX Orin. Through software updates for the reason that release, Jetson Thor now powers a 7x increase in generative AI throughput.

With this proven approach, showcased previously on NVIDIA Jetson Orin and NVIDIA Jetson AGX Xavier, developers can enjoy these improvements on models akin to Llama and DeepSeek, and similar advantages are expected for future model releases. Along with consistent software enhancements, NVIDIA also provides support for leading models, often inside days of their launch. This allows developers to experiment with the newest AI models early on.

The Jetson Thor platform also supports major quantization formats, including the brand new NVFP4 from the NVIDIA Blackwell GPU architecture, helping optimize inference even further. Latest techniques like speculative decoding are also being supported, offering an extra strategy to speed up Gen AI workloads at the sting.

Family	Model	Jetson AGX Thor Sept 2025 (output tokens/sec)	Jetson AGX Thor Aug 2025 (output tokens/sec)	Jetson AGX Thor speedup in comparison with launch
Llama	Llama 3.3 70B	41.5	12.64	3.3
DeepSeek	DeepSeek R1 70B	40.29	11.5	3.5

Unlock Faster, Smarter Edge Models with 7x Gen AI Performance on NVIDIA Jetson AGX Thor

Continuous software optimization

Run the newest models with day 0 support

Get max gen AI performance with Jetson Thor

Quantization: Shrinking model size, speeding up inference

Speculative decoding: Boost inference with a draft-verification decoding approach

Putting together quantization and speculative decoding

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

AI giants join forces on Genesis Mission

3 Techniques to Effectively Utilize AI Agents for Coding

Yay! Organizations can now publish blog Articles

Generating Artwork in Python Inspired by Hirst’s Million-Dollar Spots Painting

Unlock Faster, Smarter Edge Models with 7x Gen AI Performance on NVIDIA Jetson AGX Thor

Continuous software optimization

Run the newest models with day 0 support

Get max gen AI performance with Jetson Thor

Quantization: Shrinking model size, speeding up inference

Speculative decoding: Boost inference with a draft-verification decoding approach

Putting together quantization and speculative decoding

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.