Today, we’re releasing the stable version of Gemini 2.5 Flash-Lite, our fastest and lowest cost ($0.10 input per 1M, $0.40 output per 1M) model within the Gemini 2.5 model family. We built 2.5 Flash-Lite to push the frontier of intelligence per dollar, with native reasoning capabilities that will be optionally toggled on for more demanding use cases. Constructing on the momentum of two.5 Pro and a pair of.5 Flash, this model rounds out our set of two.5 models which are ready for scaled production use.
Our most cost-efficient and fastest 2.5 model yet
Gemini 2.5 Flash-Lite strikes a balance between performance and value, without compromising on quality, particularly for latency-sensitive tasks like translation and classification.
Here’s what makes it stand out:
- Best in-class speed: Gemini 2.5 Flash-Lite has lower latency than each 2.0 Flash-Lite and a pair of.0 Flash on a broad sample of prompts.
- Cost-efficiency: It’s our lowest-cost 2.5 model yet, priced at $0.10 / 1M input tokens and $0.40 output tokens, allowing you to handle large volumes of requests affordably. We’ve also reduced audio input pricing by 40% from the preview launch.
- Smart and small: It demonstrates all-around higher quality than 2.0 Flash-Lite across a wide selection of benchmarks, including coding, math, science, reasoning, and multimodal understanding.
- Fully featured: If you construct with 2.5 Flash-Lite, you get access to a 1 million-token context window, controllable considering budgets, and support for native tools like Grounding with Google Search, Code Execution, and URL Context.
Gemini 2.5 Flash-Lite in motion
Because the launch of two.5 Flash-Lite, we’ve already seen some incredibly successful deployments, listed below are a few of our favorites:
- Satlyt is constructing a decentralized space computing platform that may transform how satellite data is processed and utilized for real-time summarization of in-orbit telemetry, autonomous task management, and satellite-to-satellite communication parsing. 2.5 Flash-Lite’s speed has enabled a forty five% reduction in latency for critical onboard diagnostics and a 30% decrease in power consumption in comparison with their baseline models.
- HeyGen uses AI to create avatars for video content and leverages Gemini 2.5 Flash-Lite to automate video planning, analyze and optimize content, and translate videos into over 180 languages. This enables them to offer global, personalized experiences for his or her users.
- DocsHound turns product demos into documentation through the use of Gemini 2.5 Flash-Lite to process long videos and extract 1000’s of screenshots with low latency. This transforms footage into comprehensive documentation and training data for AI agents much faster than traditional methods.
- Evertune helps brands understand how they’re represented across AI models. Gemini 2.5 Flash-Lite is a game-changer for them, dramatically speeding up evaluation and report generation. Its fast performance allows them to quickly scan and synthesize large volumes of model output to offer clients with dynamic, timely insights.
You may start using 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. When you are using the preview version, you may switch to “gemini-2.5-flash-lite” which is identical underlying model. We plan to remove the preview alias of Flash-Lite on August twenty fifth.
Ready to start out constructing? Try the stable version of Gemini 2.5 Flash-Lite now in Google AI Studio and Vertex AI.
