Good morning. It’s Wednesday, March fifth.
On today in tech history: In 1924, The Computing-Tabulating-Recording Corporation (CTR) officially rebranded as International Business Machines Corporation (IBM).
-
Google’s Gemini w/ Vision Coming Soon
-
The Most Realistic AI Voice Yet
-
Altman Hints At Image Gen Upgrade
-
6 Recent AI Tools
-
Latest AI Research Papers
You read. We listen. Tell us what you think that by replying to this email.
Today’s trending AI news stories
Google to launch Gemini with Vision in March for AI-powered live video evaluation
Google is rolling outlast video evaluation and screen-sharing features for Gemini as a part of the Google One AI Premium Plan. The update allows users to stream video from their smartphone cameras or share their screens for real-time AI-powered insights. Initially exclusive to Android devices, the features support multiple languages and enhance Gemini’s ability to interpret visual content.
This expansion aligns with Google’s broader vision for multimodal AI, leading as much as “Project Astra,” an assistant designed to process text, video, and audio in real-time. While Astra’s full rollout stays uncertain, these incremental updates suggest Google is steadily embedding multimodal AI into on a regular basis interactions.
Google has also added lockscreen widgets to its Gemini AI assistant on iOS and iPadOS, enabling quick access to key features like text prompts, live conversations, voice commands, and image evaluation. With a full Siri overhaul still years away, Google is capitalizing on the gap. Read more.
Sesame’s AI Voice Demo Stuns With Realism
Sesame AI’s Conversational Speech Model (CSM) delivers strikingly human-like voices, mimicking breath sounds, chuckles, and self-corrections. Built on Meta’s Llama architecture, it processes text and audio in a single-stage transformer model, enhancing realism beyond traditional text-to-speech.
The brand new AI voice from Sesame really is a robust illustration of where AI goes.
That is all real-time, from my browser. Excellent use of disfluencies, pauses, even intakes of breathe really make this look like a human, though bits of uncanniness remain, at the very least for now.
— Ethan Mollick (@emollick)
2:59 am • Mar 4, 2025
The demo, featuring voices “Miles” and “Maya,” has impressed users while raising concerns over emotional attachment and deepfake risks. Blind tests show CSM’s speech rivals human recordings, though real voices still hold an edge in context. Its ability to roleplay dynamic personalities, including aggressive tones, sets it aside from competitors.
Sesame plans to expand language support, scale its models, and open-source key components. Read more.
Altman Hints at Major Image Generation Upgrade
In a separate response, Altman hinted at significant improvements to ChatGPT’s image generation. When a user complained about declining quality, he replied that they’d soon be “wild with joy,” suggesting an upcoming upgrade. OpenAI has not provided a timeline for these enhancements. Read more.


6 recent AI-powered tools from around the online

arXiv is a free online library where researchers share pre-publication papers.



Your feedback is invaluable. Reply to this email and tell us how you think that we could add more value to this article.
Curious about reaching smart readers such as you? To turn into an AI Breakfast sponsor, reply to this email or DM us on 𝕏!