Gemini 3 Leak?

-

Good morning. It’s Monday, October thirteenth.

On this present day in tech history: In 2010Google announced its acquisition of BlindType, a tiny startup working on machine-learning–based text-entry prediction for touchscreens. The tech became a part of what evolved into Gboard’s autocorrection and swipe-typing models. As a substitute of forcing users to tap perfectly on tiny keys, BlindType’s system guessed the intended letters based on patterns and context, even when the touches were way off.

You read. We listen. Tell us what you think that by replying to this email.

In partnership with Wall Street Prep

8 Weeks. Actionable AI Skills. MBA-Style Networking.

  • Construct AI confidence with role-specific use cases

  • Learn the way leaders are implementing AI strategies at top financial firms

  • Secure a long-lasting network that supports your profession growth

Earn your certificate from Columbia Business School Executive Education—program starts November 10.

Enroll by Oct. 13 to get $200 off tuition + use code AIBREAKFAST for a further $300 off.

Thanks for supporting our sponsors!

Today’s trending AI news stories

Gemini 3 leak gains steam with caveats, as DeepMind drops ‘Vibe Checker’

A leaked memo circling Reddit and 𝕏 now pegs Gemini 3 for an October 22 launch, bumping past the sooner October 9 rumor. The doc claims upgrades in multimodal reasoning, latency, inference cost, and even original music generation. Early testers say it’s already edging out Gemini 2.5 and Anthropic’s Sonnet 4.5 on coding and SVG work. The drop can also bundle Veo 3.1 and a Nano Banana variant of Gemini 3 Pro. On the interface side, Google is testing a “My Stuff” asset hub, browser-level Agent Mode, and a refreshed tackle “connected apps.” None of it’s confirmed, so take it with a grain of salt.

Meanwhile, DeepMind and a number of other U.S. researchers are pushing back on pass@k benchmarks that only confirm whether code runs and ignore what developers actually scrutinize: style, docstrings, API limits, and error handling. Their response is Vibe Checker, powered by VeriCode, a curated set of 30 rule types pulled from greater than 800 Ruff linter checks and paired with deterministic verifiers. In addition they turned BigCodeBench and LiveCodeBench into BigVibeBench and LiveVibeBench, now covering greater than 2,100 tasks.

Each methods test for functional correctness and instruction following. | Image: Zhong et al.

After they ran 31 leading models, pass@1 scores fell about 6 percent with only five added instructions. Once three or more constraints were introduced, not one of the models broke 50 percent. Comparing results with greater than 800,000 human rankings confirmed what no benchmark had quantified until now: accuracy plus instruction-following tracks real developer preference much better than any single metric in circulation.

Google can be touting a brand new flex: 1.3 quadrillion tokens processed per thirty days. The full is up 320 trillion since June, however the spike traces back to heavier models, not user growth. Gemini 2.5 Flash alone consumes 17 times more tokens per query and might cost 150 times more to run. Even a trivial prompt can spin into multiple reasoning passes, and multimodal inference inflates the count further.

The number signals infrastructure strain, not adoption. It also clashes with Google’s sustainability pitch of 0.24 watt-hours per Gemini request, a figure that only matches lightweight text use and ignores video workloads, agent chaining, and long-context reasoning. Impressive on a slide, less so on a utility bill. Read more.

OpenAI slammed on copyright, subpoenas, and bias testing concurrently

OpenAI is getting squeezed from multiple fronts. In Recent York, the corporate is staring down a possible multibillion-dollar copyright fight after authors and publishers uncovered internal emails about scrubbing a dataset full of pirated books. They’re now asking the court to force disclosure of OpenAI’s legal communications, arguing the corporate knew what it was doing and could have destroyed evidence. If a judge agrees, damages could explode past a billion dollars, especially after Anthropic already paid $1.5 billion to make an identical lawsuit go away. Insurers are reportedly balking at underwriting either company.

At the identical time, OpenAI is alienating the very policy advocates it claims to collaborate with. Encode and The Midas Project, each tiny nonprofits that backed California’s latest AI transparency law, SB 53, say OpenAI sent sheriffs to serve subpoenas demanding their private emails with lawmakers, journalists, students, and former OpenAI staff. The corporate insists it’s all a part of its lawsuit against Elon Musk and aimed toward sniffing out undisclosed backing. Each groups say they’ve never taken Musk’s money and examine the move as legal intimidation timed to ongoing reviews of OpenAI’s $500 billion reorganization.

To blunt growing distrust, the corporate can be trying to indicate progress on AI bias. In a brand new internal audit, OpenAI stress-tested GPT-4o, OpenAI o3, and the newer GPT-5 models with 500 prompts across hot-button political topics, from immigration to reproductive rights. One other model judged the responses using rules against emotive escalation, one-sided framing, personal opinions, and dismissive language.

OpenAI tested ChatGPT’s objectivity in responding to prompts about divisive topics from various political perspectives. Image screenshot: OpenAI

The corporate claims GPT-5 easy and GPT-5 considering cut biased replies by 30 percent in comparison with older models and were harder to push off balance with slanted prompts. Most failures still emerged under aggressively liberal framing. Read more.

Anthropic battles poisoned data, branding blitz, and legal bills

In collaboration with the UK AI Security Institute and the Alan Turing Institute, the corporate showed that just 250 poisoned documents, 0.00016 percent of a training corpus, can reliably backdoor large language models from 600 million to 13 billion parameters. Across 72 models, a trigger word, “SUDO,” caused the model to output gibberish. Fewer samples failed; more offered no additional effect, revealing a threshold effect moderately than proportional scaling. While low-risk, the outcomes underscore how even minimal data contamination can silently alter model behavior.

Concurrently, Anthropic is accelerating its consumer push. Its Recent York “Zero Slop Zone” pop-up, a screen-free newsstand offering coffee, books, and “considering” caps, drew 5,000 visitors and 10 million social impressions. Access required the Claude app, reinforcing product adoption. This anchors the multimillion-dollar “Keep Considering” campaign, spanning streaming, sports, and print media. Anthropic, now valued at $18.3 billion, projects $5 billion revenue in 2025, primarily from Claude Code, while launching its strongest code model yet, Claude 4.5 Sonnet.

Each Anthropic and OpenAI at the moment are confronting escalating legal and financial exposure. Insurers are shying away from AI-related coverage, forcing firms to think about using investor funds as self-insurance. OpenAI has $300 million in coverage, far in need of potential multibillion-dollar liabilities, while Anthropic has already tapped internal capital for a $1.5 billion settlement. These developments reveal a brand new reality. As AI scales, the industry faces inseparable technical, legal, and financial pressures that test each innovation and resilience. Read more.

arXiv is a free online library where researchers share pre-publication papers.

Your feedback is beneficial. Reply to this email and tell us how you think that we could add more value to this text.

All for reaching smart readers such as you? To grow to be an AI Breakfast sponsor, reply to this email or DM us on 𝕏!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x