Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more

Today, we’re releasing two updated production-ready Gemini models: Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002 together with:

>50% reduced price on 1.5 Pro (each input and output for prompts <128K)
2x higher rate limits on 1.5 Flash and ~3x higher on 1.5 Pro
2x faster output and 3x lower latency
Updated default filter settings

These latest models construct on our latest experimental model releases and include meaningful improvements to the Gemini 1.5 models released at Google I/O in May. Developers can access our latest models at no cost via Google AI Studio and the Gemini API. For larger organizations and Google Cloud customers, the models are also available on Vertex AI.

Improved overall quality, with larger gains in math, long context, and vision

The Gemini 1.5 series are models which might be designed for general performance across a big selection of text, code, and multimodal tasks. For instance, Gemini models may be used to synthesize information from 1000 page PDFs, answer questions on repos containing greater than 10 thousand lines of code, soak up hour long videos and create useful content from them, and more.

With the newest updates, 1.5 Pro and Flash are actually higher, faster, and more cost-efficient to construct with in production. We see a ~7% increase in MMLU-Pro, a more difficult version of the favored MMLU benchmark. On MATH and HiddenMath (an internal holdout set of competition math problems) benchmarks, each models have made a substantial ~20% improvement. For vision and code use cases, each models also perform higher (starting from ~2-7%) across evals measuring visual understanding and Python code generation.

We also improved the general helpfulness of model responses, while continuing to uphold our content safety policies and standards. This implies less punting/fewer refusals and more helpful responses across many topics.

Each models now have a more concise style in response to developer feedback which is meant to make these models easier to make use of and reduce costs. To be used cases like summarization, query answering, and extraction, the default output length of the updated models is ~5-20% shorter than previous models. For chat-based products where users might prefer longer responses by default, you possibly can read our prompting strategies guide to learn more about the way to make the models more verbose and conversational.

For more details on migrating to the newest versions of Gemini 1.5 Pro and 1.5 Flash, try the Gemini API models page.

Gemini 1.5 Pro

We proceed to be blown away with the creative and useful applications of Gemini 1.5 Pro’s 2 million token long context window and multimodal capabilities. From video understanding to processing 1000 page PDFs, there are such a lot of latest use cases still to be built. Today we’re announcing a 64% price reduction on input tokens, a 52% price reduction on output tokens, and a 64% price reduction on incremental cached tokens for our strongest 1.5 series model, Gemini 1.5 Pro, effective October 1st, 2024, on prompts lower than 128K tokens. Coupled with context caching, this continues to drive the associated fee of constructing with Gemini down.

Increased rate limits

To make it even easier for developers to construct with Gemini, we’re increasing the paid tier rate limits for 1.5 Flash to 2,000 RPM and increasing 1.5 Pro to 1,000 RPM, up from 1,000 and 360, respectively. In the approaching weeks, we expect to proceed to extend the Gemini API rate limits so developers can construct more with Gemini.

2x faster output and 3x less latency

Together with core improvements to our latest models, over the previous couple of weeks we’ve driven down the latency with 1.5 Flash and significantly increased the output tokens per second, enabling latest use cases with our strongest models.

Updated filter settings

For the reason that first launch of Gemini in December of 2023, constructing a protected and reliable model has been a key focus. With the newest versions of Gemini (-002 models), we’ve made improvements to the model’s ability to follow user instructions while balancing safety. We are going to proceed to supply a set of safety filters that developers may apply to Google’s models. For the models released today, the filters is not going to be applied by default in order that developers can determine the configuration best suited to their use case.

Gemini 1.5 Flash-8B Experimental updates

We’re releasing an extra improved version of the Gemini 1.5 model we announced in August called “Gemini-1.5-Flash-8B-Exp-0924.” This improved version includes significant performance increases across each text and multimodal use cases. It is on the market now via Google AI Studio and the Gemini API.

The overwhelmingly positive feedback developers have shared about 1.5 Flash-8B has been incredible to see, and we’ll proceed to shape our experimental to production release pipeline based on developer feedback.

We’re enthusiastic about these updates and may’t wait to see what you will construct with the brand new Gemini models! And for Gemini Advanced users, you’ll soon have the ability to access a chat optimized version of Gemini 1.5 Pro-002.

Source link

Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more