Flash 1.5, Gemma 2 and Project Astra

1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more. It’s because it’s been trained by 1.5 Pro through a process called “distillation,” where essentially the most essential knowledge and skills from a bigger model are transferred to a smaller, more efficient model.

Read more about 1.5 Flash in our updated Gemini 1.5 technical report, on the Gemini technology page, and find out about 1.5 Flash’s availability and pricing.

Significantly improving 1.5 Pro

Over the previous few months, we’ve significantly improved 1.5 Pro, our greatest model for general performance across a wide selection of tasks.

Beyond extending its context window to 2 million tokens, we’ve enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding through data and algorithmic advances. We see strong improvements on public and internal benchmarks for every of those tasks.

1.5 Pro can now follow increasingly complex and nuanced instructions, including ones that specify product-level behavior involving role, format and magnificence. We’ve improved control over the model’s responses for specific use cases, like crafting the persona and response form of a chat agent or automating workflows through multiple function calls. And we’ve enabled users to steer model behavior by setting system instructions.

We added audio understanding within the Gemini API and Google AI Studio, so 1.5 Pro can now reason across image and audio for videos uploaded in Google AI Studio. And we’re now integrating 1.5 Pro into Google products, including Gemini Advanced and in Workspace apps.

Read more about 1.5 Pro in our updated Gemini 1.5 technical report and on the Gemini technology page.

Gemini Nano understands multimodal inputs

Gemini Nano is expanding beyond text-only inputs to incorporate images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will find a way to grasp the world the way in which people do — not only through text, but in addition through sight, sound and spoken language.

Read more about Gemini 1.0 Nano on Android.

Source link