OpenAI Forced To Delay GPT-5

Good morning. It’s Monday, April seventh.

On at the present time in tech history: In 1927, Bell Telephone Laboratories conducted the primary successful long-distance demonstration of television transmission. They transmitted live video and audio of then-Secretary of Commerce Herbert Hoover from Washington, D.C., to an audience in Latest York City.

You read. We listen. Tell us what you think that by replying to this email.

Atla has expanded what you’ll be able to do with Selene 1its evaluation-focused LLM. Originally designed to attain and critique AI outputs, Selene can now be used to systematically improve them.

Developers are already seeing measurable gains. A financial research company improved output quality by 15% after integrating Selene into its workflow. A legal consultancy saw a 7% lift in answer accuracy using domain-specific evals.

Selene works behind the scenes—identifying hallucinations, flagging inconsistencies, and enforcing custom standards tailored to your use case. Improvements are automated and continuous.

_{Thanks for supporting our sponsors!}

Today’s trending AI news stories

OpenAI Forced to Delay GPT-5 Launch: ‘It’s Harder Than We Thought’

OpenAI CEO Sam Altman has delayed the launch of GPT-5 by several months, citing integration challenges and unexpectedly high performance gains uncovered during development. As a substitute, OpenAI will release the o3 and o4-mini models in the approaching weeks—previously planned as internal components of GPT-5. The o3 model, specifically, has been internally benchmarked at the extent of a top-tier coder.

Altman pointed to a few key reasons for the shift: the complexity of merging features right into a single system, the necessity to scale infrastructure for unprecedented demand, and the chance to push GPT-5 far beyond initial expectations. The o-series will function interim steps, offering scalable architecture, multimodal capabilities, and cost-effective inference.

change of plans: we’re going to release o3 and o4-mini in spite of everything, probably in a few weeks, after which do GPT-5 in a couple of months.

there are a bunch of reasons for this, but probably the most exciting one is that we’re going to give you the chance to make GPT-5 a lot better than we originally

– Sam Altman (@sama)
2:39 PM • Apr 4, 2025

In a recent legal development, a federal judge denied OpenAI’s motion to dismiss, calling its argument a “straw man” and upholding the NYT’s contributory infringement claim. The court noted evidence that OpenAI knew its models could reproduce copyrighted content. A brand new study also suggests that OpenAI’s models, including GPT-4 and GPT-3.5, can have “memorized” copyrighted material like books and news articles during training. OpenAI maintains its training practices align with fair use, however the ruling strengthens the NYT’s legal challenge.

On the business front, OpenAI reportedly discussed buying Jony Ive and Sam Altman’s AI device startup in line with The Information. Read more.

Meta introduces Llama 4 with two recent AI models available now, and two more on the best way

Meta has launched two models from its Llama 4 family: Scout and Maverick, each now integrated into Meta AI across WhatsApp, Messenger, and Instagram.

Maverickdesigned for general assistant tasks, comprises 400 billion total parameters but prompts only 17 billion across 128 experts. Scoutoptimized for summarization and code reasoning, supports a ten million token context window and might run on a single Nvidia H100 GPU.

Meta also previewed Llama 4 Behemothstill in training, with 288 billion energetic parameters, positioning it as a future top-tier base model and reportedly outperforms models like GPT-4.5 and Claude 3.7 Sonnet on STEM benchmarks. Nevertheless, none of those models qualify as “reasoning” models that fact-check answers.

CEO Mark Zuckerberg added that a fourth model, Llama 4 Reasoning, can be introduced inside weeks. All models employ a mixture-of-experts (MoE) architecture to spice up efficiency and multimodal performance. Despite their open-weight release, Meta has barred EU-based developers from using the models, citing regulatory uncertainty under the EU AI Act.

The Maverick model, ranked second on LM Arena, has raised concerns on account of discrepancies between its benchmarked and public versions. Researchers noted that the LM Arena version is an experimental, “chat-optimized” model, fine-tuned for conversation, while the publicly available model differs, using excessive emojis and long-winded answers. This raises serious questions on the integrity of benchmarks, suggesting some models are engineered to perform well in tests while leaving developers in the dead of night about their true real-world capabilities. Read more.

Today is the beginning of a brand new era of natively multimodal AI innovation.

Today, we’re introducing the primary Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the most effective of their class for multimodality.

Call 4 scout
• 17B-active-parameter model

– You might have meta (@Aiatta)
7:11 PM • Apr 5, 2025

DeepSeek unveils recent AI reasoning method as anticipation for its next-gen model rises

Image: Liu, et.al on ArXiv

DeepSeek, a Chinese AI start-up, has unveiled a brand new technique to reinforce reasoning in large language models (LLMs) in collaboration with Tsinghua University. The strategy combines generative reward modelling (GRM) and self-principled critique tuning, designed to enhance LLM performance on general queries. The DeepSeek-GRM models outperformed existing approaches, offering competitive ends in aligning AI with human preferences.

This development comes ahead of the expected release of DeepSeek’s next-generation model, DeepSeek-R2, which is anticipated to construct on the success of its R1 reasoning model. While the corporate has not confirmed the R2 launch, it continues to give attention to R&D, recently upgrading its V3 model for higher reasoning and Chinese writing capabilities. DeepSeek has also open-sourced several code repositories for developer contribution. Read more.