OpenAI’s Latest Voice Mode

Good morning. It’s Friday, March twenty first.

Today in tech history: On this present day in 2006, Jack Dorsey sent the primary tweet, marking the launch of Twitter.

OpenAI’s Latest Voice Mode
Oracle’s No-Code Agent Tool
Pika Labs’ AI Video Editing
4 Latest AI Tools
Latest AI Research Papers

You read. We listen. Tell us what you’re thinking that by replying to this email.

Unlock the total potential of your workday with cutting-edge AI strategies and actionable insights, empowering you to attain unparalleled excellence in the long run of labor. Download the free guide today!

Today’s trending AI news stories

OpenAI’s Latest AI Voice Model Turns Any Text App right into a Voice-Powered AI in Seconds

OpenAI has launched three advanced voice AI models—gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts—engineered for high-fidelity transcription and customizable speech synthesis. Integrated into OpenAI’s API and accessible via OpenAI.fm, these models refine real-time transcription with a 2.46% English word error rate, enhanced noise cancellation, and semantic voice activity detection. Unlike Whisper, they don’t support speaker diarization but offer superior accuracy across 100+ languages.

Three recent state-of-the-art audio models within the API:

🗣️ Two speech-to-text models—outperforming Whisper
💬 A brand new TTS model—you possibly can instruct it *how* to talk

🤖 And the Agents SDK now supports audio, making it easy to construct voice agents.

Try TTS now at .

— OpenAI Developers (@OpenAIDevs)
5:25 PM • Mar 20, 2025

With pricing set at $6 per million audio input tokens, OpenAI enters a competitive landscape dominated by ElevenLabs’ Scribe and Hume AI’s Octave TTS. Developers can embed voice functionality with minimal code via OpenAI’s Agents SDK.

Some critics argue OpenAI is deprioritizing real-time conversational AI, while some say this trajectory suggests a much bigger play—one which extends beyond transcription into full-spectrum multimodal intelligence. Read more.

Oracle Lets Customers Construct AI Agents with No-Code Studio

Oracle just put enterprise AI on autopilot with AI Agent Studio, a no-cost tool that lets users craft and refine AI agents inside its Fusion Cloud Application Suite. Featuring drag-and-drop customization, API-level access, and a library of prebuilt templates, the platform keeps automation tight with existing business logic. Users can tweak over 50 preconfigured agents, wire in third-party APIs, and swap between Llama, Cohere, OpenAI’s GPT, or other LLMs—all without ranging from scratch.

The platform supports multi-agent orchestration, allowing agents to collaborate on tasks with checkpoints and approvals. While Fusion security policies extend to recent agents, connecting to third-party APIs may require additional coding. Oracle leverages REST APIs for external integration, enhancing automation without disrupting existing systems. Read more.

Pika previews precision video editing—move objects without disrupting scenes

Pika has released a behind-the-scenes preview of its latest AI-powered video editing tool, allowing users to control characters and objects inside a scene while keeping the remaining of the footage untouched. This precision-editing capability opens recent creative possibilities, offering greater control without the same old artifacts or distortions.

Behind the scenes sneak peek 👀
Manipulate any character or object in your video, while keeping the remaining perfectly intact.

Change into a Pika Creative Partner to get exclusive early access.

– Long (@pika_labs)
4:02 PM • Mar 20, 2025