Inside GPT-4o Voice

-

Good morning. It’s Wednesday, July thirty first.

  • Meta’s SAM 2: Advanced model for image and video segmentation.

  • Nvidia’s Microservices: Expanded support for 3D and robotics.

  • OpenAI’s GPT-4o Long Output: Recent model with as much as 64,000 tokens.

  • OpenAI’s Advanced Voice Mode: Recent voice feature for ChatGPT Plus with natural conversation capabilities.

  • 5 Recent AI Tools

  • Latest AI Research Papers

You read. We listen. Tell us what you’re thinking that by replying to this email.

In partnership with CREATE

Construct web sites and AI apps in seconds at no cost with just English, no code needed!

Introducing Create: a brand new tool that turns your ideas into reality. Whether you’re a founder, PM, designer, marketer, or engineer, Create empowers you to construct bespoke apps, MVPs, prototypes, designs, embeddable AI tools, and landing pages quickly and simply.

Why select Create?

Effortless: Simply describe your vision in English (or any language), or paste a screenshot.
Easy Results: Watch Create construct your project in real-time.
AI-Powered: Leverage multiple foundational models like GPT 4, Claude Sonnet 3.5, and more
Full-stack: Add user accounts, databases, backend functions and more
Extensible: Access 100s of built-in integrations plus hook up with any external API
Runs on code: Enjoy fast, powerful performance with the choice to directly edit the code.
Community: We’re growing fast, 140k projects, +1k latest projects/day

Thanks for supporting our sponsors!

Today’s trending AI news stories

Meta’s latest open-source model SAM 2 may very well be the “GPT-4 moment” for computer vision

Meta has launched SAM 2, a sophisticated foundation model for image and video segmentation, open-sourcing its model, code, and dataset. While its predecessor SAM was trained on 11 million images primarily for image segmentation, SAM 2 extends its capabilities to video segmentation.

It’s trained on the SA-V dataset, the most important video segmentation dataset available, comprising 50,900 videos and 642,600 mask annotations, totaling 35.5 million individual masks. This dataset was created using Meta’s “Data Engine,” which mixes SAM models with human annotators to make sure rapid and accurate labeling.

Architecturally, SAM 2 builds on a Transformer-based framework with a novel memory module that tracks objects across video frames, enhancing object tracking in longer sequences. SAM 2 achieves higher segmentation accuracy than previous methods, with thrice fewer interactions, and performs six times faster in image segmentation than SAM. Although effective in various conditions, SAM 2 faces limitations in accurately tracking fine-grained elements or multiple similar objects in motion. Read more.

OpenAI launches experimental GPT-4o Long Output model with 16X token capability

OpenAI has introduced the GPT-4o Long Output model, significantly increasing its token capability to permit outputs of as much as 64,000 tokens—16 times greater than the previous model’s limit. Despite maintaining a complete context window of 128,000 tokens, users can now input as much as 64,000 tokens and receive equivalent output, addressing the demand for more detailed and prolonged responses. This is very advantageous for applications that require comprehensive answers, similar to code editing and writing enhancements.

Priced at $6 per million input tokens and $18 per million output tokens, the model is barely costlier than the usual GPT-4o but offers considerable value for its prolonged capabilities. Initially available to pick out partners for alpha testing, this model’s effectiveness in real-world applications is being evaluated. If successful, OpenAI intends to expand access, potentially altering how developers utilize AI for complex problem-solving. Read more.

OpenAI begins alpha testing latest AI voice feature for ChatGPT Plus

OpenAI has commenced alpha testing of its “Advanced Voice Mode” for ChatGPT Plus users. This feature, designed to facilitate more fluid and natural conversations, allows users to interrupt the AI at any time. The rollout is gradual, with initial access granted to a select group via email and in-app notifications, and a broader launch planned for the autumn.

The voice capabilities are powered by GPT-4o and have been tested across 45 languages with over 100 external red teams. To make sure privacy, the mode utilizes 4 preset voices, and safeguards are in place to stop deviations and inappropriate content.

The feature’s introduction follows a delay from its initial June release resulting from security concerns and controversy over its voice resembling actress Scarlett Johansson. The insights gained from this phase will help refine the voice capabilities and address any concerns that arose from the initial announcement. Read more.

NYT slams OpenAI’s request for reporter notes as “unprecedented” and “harassing” OpenAI is requesting internal documents, including research notes, from The Recent York Times as a part of a copyright lawsuit. This request, which the Times decries as “unprecedented” and “harassing,” is seen by the newspaper as an try to intimidate journalists and undermine mental property rights. OpenAI argues that these documents are crucial to assessing the validity of the Times’ copyright claims, which center on allegations that ChatGPT reproduced the newspaper’s content verbatim. The Times maintains that the copyright status ought to be evaluated based on published works somewhat than private research. Read more.

Agents is perhaps the subsequent frontier of AI and OpenDevin desires to open source them: Developed by a consortium of educational and business researchers, OpenDevin encompasses a flexible architecture that features an agent abstraction, an event stream for monitoring actions and observations, and a runtime environment for executing tasks. The platform supports a secure sandbox for running code, integrating tools like bash shells, Jupyter notebooks, and web browsers, facilitating complex software development and web-based tasks. OpenDevin includes pre-built agents similar to CodeAct and an internet browsing agent, showing competitive performance in initial benchmarks. It also supports the creation of “micro-agents” and allows for collaboration amongst agents, similar to task delegation. The AgentSkills library, extendable with latest functionalities, enhances the platform’s capabilities. OpenDevin is community-driven, with its source code available on GitHub under the MIT license. Read more.

Etcetera: Stories you’ll have missed

5 latest AI-powered tools from around the net

Jamie provides human-like AI-powered meeting summaries across all platforms, supporting 15+ languages, ensuring a seamless and privacy-first experience.

Billy is an AI copilot for WordPress, generating blog content, coding custom widgets, and assisting with site evaluation and configuration.

Rodin Gen-1 uses Stable Diffusion and ControlNet to rapidly convert text or images into detailed, production-ready 3D models.

GitStart AI Ticket Studio generates precise engineering tickets by analyzing your codebase, reducing communication errors and improving workflow efficiency in software projects.

table AI offers an AI-first approach to private CRM, centralizing and enriching network connections while integrating with multiple tools and platforms.

arXiv is a free online library where researchers share pre-publication papers.

Your feedback is worthwhile. Reply to this email and tell us how you’re thinking that we could add more value to this text.

Fascinated with reaching smart readers such as you? To develop into an AI Breakfast sponsor, reply to this email!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x