Recent Gemini 2.5 capabilities
Native audio output and enhancements to Live API
Today, the Live API is introducing a preview version of audio-visual input and native audio out dialogue, so you may directly construct conversational experiences, with a more natural and expressive Gemini.
It also allows the user to steer its tone, accent and type of speaking. For instance, you may tell the model to make use of a dramatic voice when telling a story. And it supports tool use, to give you the chance to look in your behalf.
You’ll be able to experiment with a set of early features, including:
- Affective Dialogue, through which the model detects emotion within the user’s voice and responds appropriately.
- Proactive Audio, through which the model will ignore background conversations and know when to reply.
- Pondering within the Live API, through which the model leverages Gemini’s pondering capabilities to support more complex tasks.
We’re also releasing recent previews for text-to-speech in 2.5 Pro and a couple of.5 Flash. These have first-of-its-kind support for multiple speakers, enabling text-to-speech with two voices via native audio out.
Like Native Audio dialogue, text-to-speech is expressive, and may capture really subtle nuances, comparable to whispers. It really works in over 24 languages and seamlessly switches between them.
