Bringing AI Closer to the Edge and On-Device with Gemma 4

The Gemmaverse expands with the launch of the newest Gemma 4 multimodal and multilingual models, designed to scale across the total spectrum of deployments, from NVIDIA Blackwell in the info center to Jetson at the sting. These models are suited to satisfy the growing demand for local deployment for AI development and prototyping, secure on-prem requirements, cost efficiency, and latency-sensitive use cases. The most recent generation improves each efficiency and accuracy, making these general-purpose models well-suitable for a wide selection of common tasks:

Reasoning: Strong performance on complex problem-solving tasks.
Coding: Code generation and debugging for developer workflows.
Agents: Native support for structured tool use (function calling).
Vision, video and audio capability: Enables wealthy multimodal interactions to be used cases equivalent to object recognition, automated speech recognition (ASR), document and video intelligence, and more.
Interleaved multimodal input: Freely mix text and pictures in any order inside a single prompt.
Multilingual: Out-of-the-box support for over 35 languages, and pre-trained on over 140 languages.

The bundle includes 4 models, including Gemma’s first MoE model, which might all fit on a single NVIDIA H100 GPU and supports over 140 languages. The 31B and 26B A4B variants are high-performing reasoning models suitable for each local and data center environments. The E4B and E2B are the most recent edition of on-device and mobile designed models first launched with Gemma 3n.

Model Name	Architecture Type	Total Parameters	Energetic or Effective Parameters	Input Context Length (Tokens)	Sliding Window (Tokens)	Modalities
Gemma-4-31B	Dense Transformer	31B	—	256K	1024
Gemma-4-26B-A4B	MoE – 128 Experts	26B	3.8B	256K	—
Gemma-4-E4B	Dense Transformer	7.9B with embeddings	4.5B effective	128K	512	Text, Audio, Vision, Video
Gemma-4-E2B	Dense Transformer	5.1B with embeddings	2.3B effective	128K	512	Text, Audio, Vision, Video

Table 1. Overview of the Gemma 4 model family, summarizing architecture types, parameter sizes, effective parameters, supported context lengths, and available modalities to assist developers select the proper model for data center, edge, and on‑device deployments.

Each model is accessible on Hugging Face with BF16 checkpoints, and an NVFP4 quantized check point for Gemma-4-31B will likely be available soon for NVIDIA Blackwell developers.

	DGX Spark	Jetson	RTX / RTX PRO
Use Case	AI research and prototyping	Edge AI and robotics	Desktop apps and Windows development
Key Highlights	A preinstalled NVIDIA AI software stack and 128 GB of unified memory power local prototyping, fine-tuning, and fully local OpenClaw workflows	Near-zero latency on account of architecture features equivalent to conditional parameter loading and per-layer embeddings which could be cached for faster and reduced memory use (more info)	Optimized performance for local inference for hobbyists, creators and professionals
Getting Began Guide	DGX Spark Playbooks for vLLM, Ollama, Unsloth and llama.cpp deployment guides NeMo Automodel for fine-tuning on Spark guide	Jetson AI Lab for tutorials and custom Gemma containers	RTX AI Garage for Ollama and llama.cpp guides. RTX Pro owners can use vLLM as well.

Bringing AI Closer to the Edge and On-Device with Gemma 4

Run intelligent workloads on-device

Construct secure agentic AI workflows with DGX Spark

Power physical AI agents with Jetson

Production ready deployment with NVIDIA NIM

Day 0 fine-tuning with NeMo Framework

Start today

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight

Linear Regression Is Actually a Projection Problem (Part 2: From Projections to Predictions)

Latest Rowhammer attacks give complete control of machines running Nvidia GPUs

Our most capable open models so far

Achieving Single-Digit Microsecond Latency Inference for Capital Markets

Bringing AI Closer to the Edge and On-Device with Gemma 4

Run intelligent workloads on-device

Construct secure agentic AI workflows with DGX Spark

Power physical AI agents with Jetson

Production ready deployment with NVIDIA NIM

Day 0 fine-tuning with NeMo Framework

Start today

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.