Start constructing with Gemini 2.5 Flash

Today we’re rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Constructing upon the favored foundation of two.0 Flash, this new edition delivers a serious upgrade in reasoning capabilities, while still prioritizing speed and price. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the flexibility to show considering on or off. The model also allows developers to set considering budgets to seek out the proper tradeoff between quality, cost, and latency. Even with considering off, developers can maintain the fast speeds of two.0 Flash, and improve performance.

Our Gemini 2.5 models are considering models, able to reasoning through their thoughts before responding. As a substitute of immediately generating an output, the model can perform a “considering” process to higher understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the considering process allows the model to reach at more accurate and comprehensive answers. Actually, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena, second only to 2.5 Pro.

Comparison table showing price and performance metrics for LLMs

2.5 Flash has comparable metrics to other leading models for a fraction of the price and size.

Our most cost-efficient considering model

2.5 Flash continues to guide because the model with the very best price-to-performance ratio.

A graph showing Gemini 2.5 Flash price-to-performance comparison

Gemini 2.5 Flash adds one other model to Google’s pareto frontier of cost to quality.*

Positive-grained controls to administer considering

We all know that different use cases have different tradeoffs in quality, cost, and latency. To provide developers flexibility, we’ve enabled setting a considering budget that gives fine-grained control over the utmost variety of tokens a model can generate while considering. The next budget allows the model to reason further to enhance quality. Importantly, though, the budget sets a cap on how much 2.5 Flash can think, however the model doesn’t use the complete budget if the prompt doesn’t require it.

Plot graphs show improvements in reasoning quality as thinking budget increases

Improvements in reasoning quality as considering budget increases.

The model is trained to understand how long to think for a given prompt, and due to this fact robotically decides how much to think based on the perceived task complexity.

If you should keep the bottom cost and latency while still improving performance over 2.0 Flash, set the considering budget to 0. You may also decide to set a particular token budget for the considering phase using a parameter within the API or the slider in Google AI Studio and in Vertex AI. The budget can range from 0 to 24576 tokens for two.5 Flash.

The next prompts show how much reasoning could also be utilized in the two.5 Flash’s default mode.

Prompts requiring low reasoning:

Example 1: “Thanks” in Spanish

Example 2: What number of provinces does Canada have?

Prompts requiring medium reasoning:

Example 1: You roll two dice. What’s the probability they add as much as 7?

Example 2: My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days every week and wish to play 5 hours of basketball on weekdays, create a schedule for me to make all of it work.

Prompts requiring high reasoning:

Example 1: A cantilever beam of length L=3m has an oblong cross-section (width b=0.1m, height h=0.2m) and is made from steel (E=200 GPa). It’s subjected to a uniformly distributed load w=5 kN/m along its entire length and a degree load P=10 kN at its free end. Calculate the utmost bending stress (σ_max).

Example 2: Write a function evaluate_cells(cells: Dict[str, str]) -> Dict[str, float] that computes the values of spreadsheet cells.

Each cell incorporates:

Or a formula like "=A1 + B1 * 2" using +, -, *,/ and other cells.

Requirements:

Resolve dependencies between cells.

Handle operator precedence (*/ before +-).

Detect cycles and lift ValueError("Cycle detected at ").

No eval(). Use only built-in libraries.

Start constructing with Gemini 2.5 Flash today

Gemini 2.5 Flash with considering capabilities is now available in preview via the Gemini API in Google AI Studio and in Vertex AI, and in a dedicated dropdown within the Gemini app. We encourage you to experiment with the thinking_budget parameter and explore how controllable reasoning can make it easier to solve more complex problems.

from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
  model="gemini-2.5-flash-preview-04-17",
  contents="You roll two dice. What’s the probability they add as much as 7?",
  config=genai.types.GenerateContentConfig(
    thinking_config=genai.types.ThinkingConfig(
      thinking_budget=1024
    )
  )
)

print(response.text)

Python

Find detailed API references and considering guides in our developer docs or start with code examples from the Gemini Cookbook.

We are going to proceed to enhance Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use.

^*_{^{Model pricing is sourced from Artificial Evaluation & Company Documentation}}

Source link

Start constructing with Gemini 2.5 Flash

Our most cost-efficient considering model