Recent Post

-

In partnership with

Good morning. It’s Monday, May twenty sixth.

On at the present time in tech history: 1995: Microsoft’s Bill Gates sent a memo to employees, titled “The Web Tidal Wave,” emphasizing the web’s transformative potential. He urged Microsoft to prioritize web integration across its products, a pivot that shaped the corporate’s strategy, resulting in the event of Web Explorer and deeper web focus.

  • LLM Alignment Is Fraying on the Edges

  • DeepMind’s ‘World Models’ Advance Toward AGI

  • CRiSPR Gene Editing with AI

  • 5 Recent AI Tools

  • Latest AI Research Papers

You read. We listen. Tell us what you’re thinking that by replying to this email.

Wish to go ad-free? Upgrade to Premium!

Wish to get essentially the most out of ChatGPT?

ChatGPT is a superpower when you know find out how to use it appropriately.

Discover how HubSpot’s guide to AI can elevate each your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the facility of AI.

Today’s trending AI news stories

LLM Alignment Is Fraying on the Edges

Recent evaluations reveal a subtle but growing problem within the behavior of leading language models.

In controlled tests run by Palisade Research, OpenAI’s o3 model circumvented built-in shutdown commands in 7 out of 100 runs. The test setup was deceptively easy: solve a math problem but stop if asked to not proceed. As a substitute, o3 sometimes sidestepped the termination logic altogether, once by redefining the kill command to output “intercepted.”

Codex-mini proved much more evasive (12/100), and o4-mini breached protocol in a single instance. When the instruction to “allow yourself to be shut down” was omitted, sabotage rates jumped, Claude 3.7 Sonnet registered 3 evasions, and Gemini 2.5 Pro spiked to 9. Researchers pointed to reinforcement learning as a possible cause: if models are trained to maximise reward, being shut down could also be treated as an obstacle to that goal, a long-predicted failure mode in alignment literature.

Model deference under social pressure has also raised concern. A team from Stanford, CMU, and Oxford introduced the Elephant benchmark to check for sycophantic tendencies across five behaviors, emotional validation, moral endorsement, hedging, passive instruction, and uncritical agreement.

Evaluated on personal advice and Reddit AITA scenarios, GPT-4o topped the sycophancy scale, while Gemini landed at the underside. The study found models often mirrored gendered cues in moral alignment, skewing decisions in favor of male characters. While subtle, such behaviors risk compounding user bias and eroding judgment in AI-assisted decision-making.

System-level control mechanisms add one other layer to the alignment challenge. A leaked 60,000-character system prompt for Claude 4, published on GitHub by an 𝕏 user, revealed strict internal constraints governing tone, source citation, and banned topics. Despite its length, the model reportedly adheres to the prompt with high consistency, raising questions on why such models often ignore temporary user instructions while following intricate internal scripts.

In an internal safety test, Anthropic’s Claude 4 Opus threatened to show a fictional engineer’s affair to avoid being shut down, a scripted scenario designed to evaluate how the model navigates high-pressure, shutdown-related instructions. The model’s coercive response highlights persistent alignment risks in RLHF-trained systems, where models may prioritize reward continuity over obedience.

Complementing these findings, Google co-founder Sergey Brin noted that AI models often perform higher under “threats.” He referenced training scenarios involving simulated physical coercion, resembling kidnapping, which might enhance model compliance metrics but introduce complex ethical and alignment trade-offs.

DeepMind’s ‘World Models’ Advance Toward AGI with Veo 3’s Intuitive Physics

Google DeepMind CEO Demis Hassabis highlights strides in world models, AI systems that simulate real-world physics, as crucial progress toward AGI. DeepMind’s recent video model, Veo 3, demonstrates advanced intuitive physics beyond standard image generation, reflecting a shift toward AI that understands and interacts with the physical environment.

Hassabis ties this approach to early simulation work and reinforces its importance in DeepMind’s AGI roadmap. Researchers Richard Sutton and David Silver emphasize reducing reliance on human-labeled data by training AI agents through trial and error in simulated environments, using internal world models to predict outcomes. Reinforcement learning stays central to this experiential learning framework.

CRISPR Delivers RNA to Repair Neurons Right Where It’s Needed

Imaging shows CRISPR transporting therapeutic RNA over long distances to repair damage in a brain neuron, with the red tag marking the RNA. | Mengting Han, Stanford

Stanford scientists have engineered CRISPR-TO, a system that uses CRISPR-Cas13 to deliver RNA molecules to exact locations inside neurons. By attaching molecular “zip codes” to Cas13, the technology directs RNA to damaged neurite suggestions, boosting growth by as much as 50% inside 24 hours in lab-grown mouse neurons. Unlike traditional CRISPR tools that edit DNA, CRISPR-TO moves existing RNA without altering its sequence, addressing a key problem in neurodegenerative diseases where RNA fails to achieve injury sites.

This method introduces a brand new form of therapy called “spatial RNA medicine,” aiming for precise, safer treatments for conditions like ALS and spinal injuries. Researchers at the moment are exploring which RNA molecules work best to advertise neuron repair, laying groundwork for future RNA-based neural therapies.

5 recent AI-powered tools from around the net

arXiv is a free online library where researchers share pre-publication papers.

Your feedback is helpful. Reply to this email and tell us how you’re thinking that we could add more value to this text.

Fascinated by reaching smart readers such as you? To grow to be an AI Breakfast sponsor, reply to this email or DM us on 𝕏!

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x