What’s Nano-banana?

In partnership with

Good morning. It’s Friday, August twenty second.

On today in tech history: In 2019a Russian humanoid robot named FEDOR (Final Experimental Demonstration Object Research) was launched to the International Space Station to autonomously operate and test distant manipulator tasks in orbit. A unusual yet meaningful precursor to today’s discussions around robotics, autonomy, and microgravity operations.

You read. We listen. Tell us what you’re thinking that by replying to this email.

Turn AI Into Your Income Stream

The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Minimum Ream and get fast access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to construct your AI-powered income. No coding required, just practical strategies that work.

Today’s trending AI news stories

Google turns AI into infrastructure: Nano-banana, Search, and Pixel get system-level upgrades

Google is moving fast to make AI less of a bolt-on feature and more of the operating system for its ecosystem with three major product pushes this week: Flow, Search, and Gemini hardware.

In Flow, the nano-banana model takes text-to-image beyond novelty filters. With reference control, creators can upload a picture and generate stylistically consistent variations ready for direct use in video timelines. That eliminates the necessity to bounce through third-party tools. Google is pairing this with vertical aspect ratio support for TikTok and Shorts, prompt preambles that auto-extend easy queries into cinematic or vlogging styles, and social hooks like QR sharing. Flow starts looking like a full-stack studio with built-in distribution.

In Search, AI Mode is now rolling out to 180+ countries, and trades static “AI Overviews” for agentic behaviors that act on intent. For now, these task-execution features remain gated behind the Google AI Ultra subscription within the US.

The Pixel 10 launch shows Google wants Gemini embedded directly into devices. The Tensor G5 chip runs Gemini Nano entirely on-device, eliminating latency and cloud dependency. Which means fast context-sensitive features like Magic Cue, which pulls from Gmail and Calendar before a question is even typed, or live 11-language voice translation without a web connection. Pixel’s camera stack now includes “Camera Coach” for real-time framing guidance and 100× Super Res zoom stitched by AI inference.

Even Google Photos is now conversational: users can simply ask for edits, from cleanup to creative background swaps, while C2PA Content Credentials, IPTC data, and SynthID tags signal when AI is involved.

Health can also be a core theme. Fitbit has been rebuilt as a Gemini-powered coach that adapts to your recovery, sleep, or travel schedule. It doesn’t just spit out metrics – it explains, adjusts, and plans. Gemini Live expands conversational AI with visual annotations, while Gemini for Home replaces Google Assistant on Nest, for a more serious front room play.

Google is collapsing workflows across media, search, and devices. Flow creates, Search executes, Pixel embeds – and Gemini ties all of it together. Read more.

Elevenlabs releases its v3 model with latest expression controls and support for unlimited speakers

ElevenLabs has rolled out Eleven v3 (alpha), a significant upgrade to its text-to-speech API that makes AI voices more expressive and versatile. The model now supports unlimited speakers in dialog mode, letting developers construct multi-character conversations without workarounds. It also adds fine-grained audio controls with emotion, pitch, and magnificence so voices can laugh, whisper, or convey subtle nuance naturally.

Eleven v3 covers over 70 languages, expanding global reach, and is accessible via a free API account, with some advanced features behind a paywall. Documentation provides full examples for integrating expressive speech into apps, media, or virtual assistants. Read more.

Chinese unicorn Z.ai unifies mobile and desktop automation with next-gen AI agents

Z.ai is taking agentic AI to the following level, merging mobile and desktop automation into an ecosystem. Its smartphone agent, powered by GLM-Z1-Air and GLM-4-Air-0414, can autonomously handle complex multi-step tasks without forcing users to change apps. The cloud-backed AutoGLM engine interprets natural-language instructions in real time, supporting “life assistant” and “office assistant” modes and giving users the flexibility at hand over devices to the agent for full workflow execution.

Introducing ComputerRL, a framework for autonomous desktop intelligence that permits agents to operate complex digital workspaces skillfully.
arxiv.org/abs/2508.14040

ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to

— Z.ai (@Zai_org)
2:31 PM • Aug 20, 2025

On the desktop, Z.ai’s ComputerRL framework bridges the gap between programmatic reasoning and human-centric interfaces. By combining API calls with direct GUI interactions, AutoGLM-OS-9B, built on GLM-4-9B-0414, achieves SoTA accuracy on the OSWorld benchmark, excelling in multi-step reasoning, tool use, and general-purpose automation. Z.ai is positioning itself as considered one of the highest global competitors in human-aligned AI with $1.4 billion in funding and an IPO targeted for 2026. Read more.