At the sting, our E2B and E4B models redefine on-device utility, prioritizing multimodal capabilities, low-latency processing and seamless ecosystem integration over raw parameter count.
Powerful, accessible, open
To power the subsequent generation of pioneering research and products, we have sized the Gemma 4 models specifically to run and fine-tune efficiently on hardware — from billions of Android devices worldwide, to laptop GPUs, all the way in which as much as developer workstations and accelerators.
Through the use of these highly optimized models, you possibly can fine-tune Gemma 4 to realize state-of-the-art performance in your specific tasks. We have already seen incredible success with this approach; as an example, INSAIT created a pioneering Bulgarian-first language model (BgGPT), and we worked with Yale University on Cell2Sentence-Scale to find recent pathways for cancer therapy, amongst many others.
Here’s what makes Gemma 4 our most capable open model family yet:
- Advanced reasoning: Able to multi-step planning and deep logic, Gemma 4 demonstrates significant improvements in math and instruction-following benchmarks that require it.
- Agentic workflows: Native support for function-calling, structured JSON output, and native system instructions lets you construct autonomous agents that may interact with different tools and APIs and execute workflows reliably.
- Code generation: Gemma 4 supports high-quality offline code, turning your workstation right into a local-first AI code assistant.
- Vision and audio: All models natively process video and pictures, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Moreover, the E2B and E4B models feature native audio input for speech recognition and understanding.
- Longer context: Process long-form content seamlessly. The sting models feature a 128K context window, while the larger models offer as much as 256K, allowing you to pass repositories or long documents in a single prompt.
- 140+ languages: Natively trained on over 140 languages, Gemma 4 helps developers construct inclusive, high-performance applications for a world audience.
Versatile models for diverse hardware
We’re releasing the Gemma 4 model weights in sizes tailored for specific hardware and use cases, ensuring you get frontier-class reasoning wherever you wish it:
26B and 31B models: Frontier intelligence, offline in your personal computers
Optimized to offer researchers and developers with state-of-the-art reasoning on accessible hardware, our unquantized bfloat16 weights fit efficiently on a single 80GB NVIDIA H100 GPU. For local setups, quantized versions run natively on consumer GPUs to power your IDEs, coding assistants and agentic workflows. Our 26B Mixture of Experts (MoE) give attention to latency, activating only 3.8 billion of its total parameters during inference to deliver exceptionally fast tokens-per-second, while our 31B Dense is maximizing raw quality and provides a strong foundation for fine-tuning.
