Google’s Gemini 2.5 Tops Leaderboards

Good morning. It’s Wednesday, March twenty sixth.

On at the present time in tech history: In 1976, the primary 8-inch floppy disk drive, the Shugart SA-801, was introduced by Shugart Associates.

Google’s Gemini 2.5 Tops Leaderboards
OpenAI’s Recent Image Generator Solves Text In Images
4 Recent AI Tools
Latest AI Research Papers

You read. We listen. Tell us what you’re thinking that by replying to this email.

Today’s trending AI news stories

Google’s Gemini 2.5 Tops Leaderboards, Supports 1M Token Inputs

Google has introduced Gemini 2.5, its most advanced reasoning model, pushing the boundaries of AI-driven problem-solving in math, coding, and multimodal evaluation. The primary on this latest model family, Gemini 2.5 Pro Experimental, is now available in Google AI Studio and the Gemini Advanced subscription. Equipped with a 1 million-token context window—soon expanding to 2 million—the model handles vast datasets, technical documents, and full code repositories with improved reasoning capabilities. It leads in key benchmarks, scoring 68.6% on Aider Polyglot for code editing and 18.8% on Humanity’s Last Exam, though it lags behind Claude 3.7 Sonnet on SWE-bench Verified for AI-assisted software development.

🚨 Gemini 2.5 Pro Exp dropped and it’s now #1 across SEAL leaderboards:

🥇 Humanity’s Last Exam
🥇 VISTA (multimodal)
🥇 (tie) Tool Use
🥇 (tie) MultiChallenge (multi-turn)
🥉 (tie) Enigma (puzzles)

Congrats to @Dytirassabis@Sundarpichai & team!

🔗 scale.com/leaderboard

– Alexandr Wang (@lexandr_Wang)
5:45 pm • Mar 25, 2025

OpenAI’s Recent Image Generator Solves The ‘Text Problem’

OpenAI has integrated GPT-4o’s native image generation into ChatGPT, making it the default across free and paid tiers. Unlike previous DALL-E implementations, GPT-4o processes text and pictures together, improving spatial accuracy and object consistency. The model can handle as much as 20 objects directly while maintaining spatial relationships, making it more precise in rendering text and complicated scenes.

Users can refine images through conversation, leveraging in-context learning to iteratively improve results. While the system offers greater creative flexibility than DALL-E 3, OpenAI still enforces restrictions on explicit content, deepfakes, and unauthorized likenesses. All generated images include C2PA metadata for transparency.

OpenAI has also refined ChatGPT’s Advanced Voice Mode, making conversations smoother by reducing interruptions. Free users now experience more natural dialogue, while paying subscribers gain enhanced voice interactions.

On the leadership front, OpenAI is undergoing restructuring, with CEO Sam Altman stepping back from every day operations to deal with research and product strategy. COO Brad Lightcap will now oversee operations, partnerships, and international growth.

Nonetheless, OpenAI’s models are scuffling with a brand new test of artificial general intelligence. The Arc Prize Foundation’s ARC-AGI-2 benchmark, designed to evaluate adaptive reasoning, has exposed significant gaps. OpenAI’s o1-pro and DeepSeek’s R1 barely scored above 1%, while human test groups averaged 60%. Even o3 (low), which previously dominated ARC-AGI-1, now manages just 4% accuracy—despite a staggering $200 per task compute cost.

The Arc Prize Foundation has launched a contest difficult developers to hit 85% accuracy on ARC-AGI-2 for just $0.42 per task, marking a brand new frontier in AI’s pursuit of true reasoning capabilities. Read more.