Google introduced its first reasoning-viscous ‘hybrid’ artificial intelligence (AI) model. It emphasizes reasoning ability to handle complex tasks, and at the identical time reflects the trend of reducing the price burden on many users to access high -end models.
Google unveiled the subsequent -generation AI model ‘Gemini 2.5 Flash’ in the shape of preview on the seventeenth (local time) and developer platform.
The Geminai 2.5 flash was developed based on the present low -cost and high -performance model Geminai 2.0 Flash and was designed to drive AI agents.
Particularly, Google emphasized that it’s a ‘Considering’ model with an improved reasoning ability. For the primary time, the performance was strengthened than the Geminai 2.0, which was equipped with reasoning.
Developers might be used through the Geminai API of Google AI Studio and Google Cloud’s AI platform ‘Vertex AI’. General users can select the model through the drop -down menu on the Geminai Mobile app and web app.
Google emphasized that “Geminai 2.5 flash is our first full hybrid reasoning model.”
Additionally it is designed to establish a ‘pondering budget’ that cannot only activate or off the accident function, but additionally adjust the balance between quality, cost, and speed.
The accident function consumes more tokens, which might increase the response time and increase costs. Considering this, it provides a feature that permits developers to set the utmost of the pondering tokens utilized by the model. The upper the accident budget, the higher the standard of the response, however the speed is slow, and if the budget is low, the response will likely be faster.
Depending on the complexity of the query, it also has the flexibility to mechanically set the accident budget. Easy questions might be responded quickly with none reasoning, however the work of writing each day oral tags run medium -level reasoning, and python code creation and web game production are classified as advanced levels.
The speed varies depending on whether the pondering function is activated.
For those who turn off the accident function, it is comparatively inexpensive at $ 0.15 per million input tokens and $ 0.60 per million output tokens. Nonetheless, in the event you activate the accident function, you will likely be charged $ 3.50 per million tokens without distinguishing the input and output.

The benchmark also showed improved reasoning. It was 12.1%in ‘Humanity’s Last Exam’, which is understood to be very difficult, and greater than 5.1%, which is simply 5.1%.
Nonetheless, that is lower than the O3 of Open AI, which recorded 20.32 ~ 24.9%in the identical test.
Google added that it was the second highest performance within the Hard Prompts of the Chatbot Benchmark LM Arena, apart from the Geminai 2.5 Pro.
Because of this, he introduced it as ‘the most effective price’ model, not ‘best performance’.
Meanwhile, the primary place to introduce the inference-viscosity integrated model amongst major AI firms was Antropic, which launched Claude 3.7 Sonnet on February 25. Nus Research, famous for its open source ‘Hermes’, also unveiled the open source ‘Deep Hermes-3’ on February 14.
Sam Altman Open CEO also announced that it will integrate O3 into GPT-5 on February 13, and predict this trend. The GPT-5 is predicted to be released in May because it is currently under safety test.
By Park Chan, reporter cpark@aitimes.com