Over the past decade, we’ve laid loads of the foundations for the fashionable AI era, from pioneering the Transformer architecture on which all large language models are based, to developing agent systems that may learn and plan like AlphaGo and AlphaZero.
We’ve applied these techniques to make breakthroughs in quantum computing, mathematics, life sciences and algorithmic discovery. And we proceed to double down on the breadth and depth of our fundamental research, working to invent the subsequent big breakthroughs vital for artificial general intelligence (AGI).
That is why we’re working to increase our greatest multimodal foundation model, Gemini 2.5 Pro, to grow to be a “world model” that could make plans and picture recent experiences by understanding and simulating elements of the world, just because the brain does.
We’ve been taking strides on this direction for some time, from our pioneering work training agents to master complex games like Go and StarCraft, to constructing Genie 2, which is able to generating 3D simulated environments which you could interact with, from a single image prompt.
Already, we will see evidence of those capabilities emerging in Gemini’s ability to make use of world knowledge and reasoning to represent and simulate natural environments, Veo’s deep understanding of intuitive physics, and the best way Gemini Robotics teaches robots to understand, follow instructions and adjust on the fly.
Making Gemini a world model is a critical step in developing a brand new, more general and more useful sort of AI — a universal AI assistant. That is an AI that’s intelligent, understands the context you might be in, and that may plan and take motion in your behalf, across any device.
