Updates to Gemini 2.5 from Google DeepMind

-


Recent Gemini 2.5 capabilities

Native audio output and enhancements to Live API

Today, the Live API is introducing a preview version of audio-visual input and native audio out dialogue, so you may directly construct conversational experiences, with a more natural and expressive Gemini.

It also allows the user to steer its tone, accent and type of speaking. For instance, you may tell the model to make use of a dramatic voice when telling a story. And it supports tool use, to give you the chance to look in your behalf.

You’ll be able to experiment with a set of early features, including:

  • Affective Dialogue, through which the model detects emotion within the user’s voice and responds appropriately.
  • Proactive Audio, through which the model will ignore background conversations and know when to reply.
  • Pondering within the Live API, through which the model leverages Gemini’s pondering capabilities to support more complex tasks.

We’re also releasing recent previews for text-to-speech in 2.5 Pro and a couple of.5 Flash. These have first-of-its-kind support for multiple speakers, enabling text-to-speech with two voices via native audio out.

Like Native Audio dialogue, text-to-speech is expressive, and may capture really subtle nuances, comparable to whispers. It really works in over 24 languages and seamlessly switches between them.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x