China unveils LLaVA-o1 to challenge OpenAI’s o1 model

-

Good morning. It’s Friday, November twenty ninth.

You read. We listen. Tell us what you’re thinking that by replying to this email.

Today’s trending AI news stories

Amazon Develops Olympus AI, Undercutting Dependence on Anthropic

Amazon is reportedly gearing as much as showcase its Olympus AI model at AWS re:Invent. Designed as a multimodal large language model (LLM), Olympus can parse images, video, and text, enabling users to pinpoint, say, a pivotal basketball play through a straightforward prompt.

With its foray into generative AI, Amazon seems intent on lessening reliance on Anthropic’s Claude, following its substantial backing of the startup. This move signals Amazon’s recalibration within the AI arms race, where it’s often framed as playing catch-up to Google and Microsoft.

A key player on this effort is AWS’ Annapurna Labs in Austin, a hub for developing AI chips like Trainium and Graviton. By closely integrating hardware and software teams, the lab accelerates development and prototyping in a collaborative environment. Its work ranges from creating energy-efficient chips to refining full-stack server systems. Read more.

Google’s Latest AI Experiment Turns Chess right into a Creative Playground

Google’s experimental arm, Google Labs, has launched GenChess, a web-based game integrating AI-driven image generation through Gemini Imagen 3. Players can customize their chess pieces by inputting text prompts, selecting between a standard or abstract design.

Once the set is generated, users can fine-tune individual pieces to their preference. After crafting their ideal set, players can compete against a bot across three difficulty levels. This project highlights the synergy between AI, design, and gaming.

Moreover, Google’s collaboration with FIDE introduces coding challenges for AI chess engines, and the upcoming Chess Gem feature will allow users to play against a Gemini language model, though access might be limited to Gemini Advanced subscribers. Read more.

Chinese researchers unveil LLaVA-o1 to challenge OpenAI’s o1 model

LLaVA-o1, developed by Chinese researchers, introduces a structured approach to vision-language models (VLMs) for improved multimodal reasoning, inspired by OpenAI’s o1 model. It utilizes a four-stage reasoning process: Summary, Caption, Reasoning, and Conclusion, ensuring logical flow by independently managing each stage.

This method ensures that the model maintains control over its logical flow, sidestepping the common errors of earlier VLMs. LLaVA-o1 also debuts a “stage-level beam search,” refining inference-time scaling by generating multiple output candidates at each stage and choosing the perfect fit.

Trained on a curated dataset of 100,000 image-question pairs annotated by GPT-4o, it’s already outperforming each open-source and a few closed-source models, showing a 6.9% increase in benchmark scores. The model’s success sets a brand new bar for multimodal reasoning, signaling a future where structured logic could redefine VLMs. Read more.

ElevenLabs Launches GenFM to Convert Text into AI-Generated Audio

ElevenLabs has upgraded its ElevenReader app, now integrating GenFM to generate personalized podcasts from quite a lot of text sources, including PDFs, articles, and ebooks. This feature, available on iOS, employs AI co-hosts in 32 languages to supply dynamic, contextually relevant podcasts.

Utilizing ElevenLabs’ advanced AI audio models, GenFM curates detailed summaries, insightful book reviews, and study material explanations, offering users the power to eat information while multitasking—ideal for commutes or workouts.

The app’s enhanced capabilities transform static text into engaging audio, supporting diverse learning and productivity needs. Android support for GenFM is forthcoming, further extending the app’s reach. Read more.

Tesla Optimus Gets a Recent Hand with 22 Degrees of Freedom

Tesla has upgraded its Optimus humanoid robot with a redesigned hand, now featuring 22 degrees of freedom and an extra three within the forearm. The hand is coated with a soft, protective layer that preserves its tactile sensing abilities while enabling it to handle delicate objects with precision. All actuators at the moment are embedded inside the forearm, streamlining its design.

Tesla goals to finish the mixing of tactile sensors, implement tendon-based positive control, and reduce the forearm’s weight by year-end. This enhanced hand design might be standard across all future Optimus robots. Read more.

5 latest AI-powered tools from around the online

arXiv is a free online library where researchers share pre-publication papers.

Your feedback is worthwhile. Reply to this email and tell us how you’re thinking that we could add more value to this text.

Enthusiastic about reaching smart readers such as you? To turn into an AI Breakfast sponsor, reply to this email or DM us on X!

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x