Introducing the Gemini 2.5 Computer Use model

Earlier this yr, we mentioned that we’re bringing computer use capabilities to developers via the Gemini API. Today, we’re releasing the Gemini 2.5 Computer Use model, our latest specialized model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that powers agents able to interacting with user interfaces (UIs). It outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities via the Gemini API in Google AI Studio and Vertex AI.

While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for instance, filling and submitting forms. To finish these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling. The power to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is an important next step in constructing powerful, general-purpose agents.

How it really works

The model’s core capabilities are exposed through the brand new `computer_use` tool within the Gemini API and ought to be operated inside a loop. Inputs to the tool are the user request, screenshot of the environment, and a history of recent actions. The input can even specify whether to exclude functions from the full list of supported UI actions or specify additional custom functions to incorporate.

Source link

Introducing the Gemini 2.5 Computer Use model

How it really works

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

I checked out considered one of the largest anti-AI protests ever

OpenAI steps into Anthropic’s Pentagon void

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

Constructing Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

Introducing the Gemini 2.5 Computer Use model

How it really works

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.