Gemini 2.5: Updates to our family of considering models

Today we’re excited to share updates across the board to our Gemini 2.5 model family:

Gemini 2.5 Pro is usually available and stable (no changes from the 06-05 preview)

Gemini 2.5 Flash is usually available and stable (no changes from the 05-20 preview, see pricing updates below)

Gemini 2.5 Flash-Lite is now available in preview

Gemini 2.5 models are considering models, able to reasoning through their thoughts before responding, leading to enhanced performance and improved accuracy. Each model has control over the considering budget, giving developers the flexibility to decide on when and the way much the model “thinks” before generating a response.

Overview of our family of Gemini 2.5 thinking models

Overview of our family of Gemini 2.5 considering models

Introducing Gemini 2.5 Flash-Lite

Today, we’re introducing 2.5 Flash-Lite in preview with the bottom latency and price within the 2.5 model family. It’s designed as a cheap upgrade from our previous 1.5 and a couple of.0 Flash models. It also offers higher performance across most evals, and lower time to first token while also achieving higher tokens per second decode. This model is great for prime throughput tasks like classification or summarization at scale.

Gemini 2.5 Flash-Lite is a reasoning model, which allows for dynamic control of the considering budget with an API parameter. Because Flash-Lite is optimized for cost and speed, “considering” is off by default, unlike our other models. 2.5 Flash-Lite also supports all of our native tools like Grounding with Google Search, Code Execution, and URL Context along with function calling.

Benchmarks for Gemini 2.5 Flash-Lite

Updates to Gemini 2.5 Flash and pricing

During the last yr, our research teams have continued to push the pareto frontier with our Flash model series. When 2.5 Flash was initially announced, we had not yet finalized the capabilities for two.5 Flash-Lite. We also launched with a “considering” and “non-thinking price”, which led to developer confusion.

With the stable version of Gemini 2.5 Flash rolling out (which is similar 05-20 model preview we made available at Google I/O), and the incredible performance of two.5 Flash, we’re updating the pricing for two.5 Flash:

$0.30 / 1M input tokens (*up from $0.15 input)

$2.50 / 1M output tokens (*down from $3.50 output)

We removed the considering vs. non-thinking price difference

We kept a single price tier no matter input token size

While we attempt to take care of consistent pricing between preview and stable releases to attenuate disruption, that is a particular adjustment reflecting Flash’s exceptional value, still offering the perfect cost-per-intelligence available.

And with Gemini 2.5 Flash-Lite, we now have a fair lower cost option (with or without considering) for cost and latency sensitive use cases that require less model intelligence.

Pricing updates for our Gemini Flash family

For those who are using the Gemini 2.5 Flash Preview 04-17 , the prevailing preview pricing will remain in effect until its planned deprecation on July 15, 2025, at which point that model endpoint will probably be turned off. You possibly can transition to the widely available model “gemini-2.5-flash”, or switch to 2.5 Flash-Lite Preview as a lower cost option.

Continued growth of Gemini 2.5 Pro

The expansion and demand for Gemini 2.5 Pro continues to be the steepest of any of our models now we have ever seen. To permit more customers to construct on this model in production, we’re making the 06-05 version of the model stable, with the identical pareto frontier price point as before.

We expect that cases where you wish the best intelligence and most capabilities are where you will note Pro shine, like coding and agentic tasks. Gemini 2.5 Pro is at the center of lots of essentially the most loved developer tools.

Top developer tools using Gemini 2.5 Pro, featuring Cursor, Bolt, Cline, Cognition, Windsurf, GitHub, Lovable, Replit, and Zed Industries

Top developer tools using Gemini 2.5 Pro

For those who are using 2.5 Pro Preview 05-06, the model will remain available until June 19, 2025 after which will probably be turned off. For those who are using 2.5 Pro Preview 06-05, you’ll be able to simply update your model string to “gemini-2.5-pro”.

We are able to’t wait to see much more domains profit from the intelligence of two.5 Pro and look ahead to sharing more about scaling beyond Pro within the near future.

Source link

Gemini 2.5: Updates to our family of considering models

Introducing Gemini 2.5 Flash-Lite

Updates to Gemini 2.5 Flash and pricing

Continued growth of Gemini 2.5 Pro

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The Machine Learning Lessons I’ve Learned Last Month

The way to train a Language Model with Megatron-LM

Why the Moltbook frenzy was like Pokémon

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

Transformers.js v4 Preview: Now Available on NPM!

Gemini 2.5: Updates to our family of considering models

Introducing Gemini 2.5 Flash-Lite

Updates to Gemini 2.5 Flash and pricing

Continued growth of Gemini 2.5 Pro

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.