Google To Launch Vision Model This Month

Good morning. It’s Wednesday, March fifth.

On today in tech history: In 1924, The Computing-Tabulating-Recording Corporation (CTR) officially rebranded as International Business Machines Corporation (IBM).

Google’s Gemini w/ Vision Coming Soon
The Most Realistic AI Voice Yet
Altman Hints At Image Gen Upgrade
6 Recent AI Tools
Latest AI Research Papers

You read. We listen. Tell us what you think that by replying to this email.

Today’s trending AI news stories

Google to launch Gemini with Vision in March for AI-powered live video evaluation

Google is rolling outlast video evaluation and screen-sharing features for Gemini as a part of the Google One AI Premium Plan. The update allows users to stream video from their smartphone cameras or share their screens for real-time AI-powered insights. Initially exclusive to Android devices, the features support multiple languages and enhance Gemini’s ability to interpret visual content.

This expansion aligns with Google’s broader vision for multimodal AI, leading as much as “Project Astra,” an assistant designed to process text, video, and audio in real-time. While Astra’s full rollout stays uncertain, these incremental updates suggest Google is steadily embedding multimodal AI into on a regular basis interactions.

Google has also added lockscreen widgets to its Gemini AI assistant on iOS and iPadOS, enabling quick access to key features like text prompts, live conversations, voice commands, and image evaluation. With a full Siri overhaul still years away, Google is capitalizing on the gap. Read more.

Sesame’s AI Voice Demo Stuns With Realism

Sesame AI’s Conversational Speech Model (CSM) delivers strikingly human-like voices, mimicking breath sounds, chuckles, and self-corrections. Built on Meta’s Llama architecture, it processes text and audio in a single-stage transformer model, enhancing realism beyond traditional text-to-speech.

The brand new AI voice from Sesame really is a robust illustration of where AI goes.

That is all real-time, from my browser. Excellent use of disfluencies, pauses, even intakes of breathe really make this look like a human, though bits of uncanniness remain, at the very least for now.

— Ethan Mollick (@emollick)
2:59 am • Mar 4, 2025

The demo, featuring voices “Miles” and “Maya,” has impressed users while raising concerns over emotional attachment and deepfake risks. Blind tests show CSM’s speech rivals human recordings, though real voices still hold an edge in context. Its ability to roleplay dynamic personalities, including aggressive tones, sets it aside from competitors.

Sesame plans to expand language support, scale its models, and open-source key components. Read more.

Altman Hints at Major Image Generation Upgrade

In a separate response, Altman hinted at significant improvements to ChatGPT’s image generation. When a user complained about declining quality, he replied that they’d soon be “wild with joy,” suggesting an upcoming upgrade. OpenAI has not provided a timeline for these enhancements. Read more.