ARC-AGI-3 resets frontier AI scoreboard

-

Good morning, { AI enthusiasts }. Certainly one of the AI industry’s favorite talking points of being on the doorstep of AGI just bumped into a test where the very best models on this planet cannot even rating above 1%.

ARC-AGI-3 is a harder version of the benchmark that is change into the go-to reality check for AGI claims — and with Gemini Pro leading the pack at just 0.37%, frontier models just got a brand latest challenge (to likely still crush in about six months).

In today’s AI rundown:

  • ARC’s latest AGI test stumps every frontier AI

  • Reddit’s AI bot crackdown skips the ID check

  • Create branded response GIFs for Slack

  • Google shrinks AI memory with zero accuracy loss

  • 4 latest AI tools, community workflows, and more

LATEST DEVELOPMENTS

AI BENCHMARKS

Image source: ARC Prize Foundation

The Rundown: François Chollet’s ARC Prize Foundation just released ARC-AGI-3, the most recent version of its interactive reasoning benchmark, where humans can solve 100% of tasks on the primary try but AI models struggle, with top systems not even scoring 1%.

The small print:

  • Labs spent thousands and thousands training models on earlier versions of the test, pushing ARC-AGI-2 scores from 3% to around 50% in under a yr.

  • Agents face game-like scenarios with zero instructions, and must discover rules, form goals, and plan strategies entirely from scratch.

  • Google’s Gemini Pro scored the very best amongst frontier models at just 0.37%, followed by GPT 5.4 High (0.26%), Opus 4.6 (0.25%), and Grok-4.20 (0%).

  • A $1M prize backs the challenge, and cofounder Mike Knoop says frontier labs are paying much more attention to V3 than they did to earlier versions.

Why it matters: It’s all the time jarring to see the highest models get reset below 1% on a brand new ARC-AGI release, but when the older tests are any indicator, much more surprising can be how quickly frontier labs climb the ladder. Whether that reflects real reasoning or simply costlier brute-forcing is strictly what Chollet built V3 to search out out.

TOGETHER WITH SLACK FROM SALESFORCE

The Rundown: Agentforce brings powerful AI agents directly into Slack, with no latest logins or context switching. DM an agent, @mention it in a channel, or let it take motion by pulling Salesforce insights, updating records, and creating canvases on the fly.

On this guide, you may learn the way to:

  • Start with agents right where your team already works

  • Take motion faster by pulling insights, updating records, and more

  • Start in minutes with ready-made templates or construct custom agents for any team

Read the complete guide to start with Agentforce in Slack.

REDDIT

Image source: Reddit

The Rundown: Reddit CEO Steve Huffman outlined a plan to separate humans from bots across the location, including labeling automated accounts, flagging suspicious users for verification, and letting sub-communities self-police without mass ID checks.

The small print:

  • Accounts running automation in approved ways on the social platform will carry an [App] label, with suspicious behavior resulting in human verification.

  • To verify proof of humanity, Reddit will offer passkeys or Sam Altman’s World ID scanner, with government IDs as a final resort, only where laws require it.

  • AI-written content isn’t being banned, with Huffman calling it ‘annoying’ but saying communities can set their very own rules on AI-generated posts.

  • Rival platform Digg recently folded after being overrun with bots, and Cloudflare data shows automated traffic on pace to surpass humans by 2027.

Why it matters: The Dead Web Theory was already here before the AI agent acceleration we’ve seen over the past six months. Now, it’s a reality every social media site is coping with. While this feels a bit like a band-aid, it’s a small step towards every platform needing a serious human-first solution if it desires to remain usable to them.

AI TRAINING

The Rundown: On this guide, you’ll learn the way to make custom, branded response GIFs on your company’s Slack using Higgsfield (a picture and video generator). The trick is to generate the starting frame before you animate it.

Step-by-step:

  1. Go to Higgsfield image gen, determine the GIF’s look, and enter the response’s visual style and text, like “ESPN themed response gif with words ‘SLOW DOWN’”

  2. In case your brand is just not recognizable, attach your logo or one other brand reference image while generating the still

  3. Generate a number of stills and pick the very best one, then click the camera’s Animate button on that also in order that it becomes the beginning frame in Higgsfield video

  4. Then, set the clip length to three seconds, turn off its audio, and prompt: “Response GIF”. Finally, download the MP4 and switch it right into a GIF with any MP4-to-GIF site

Pro tip: For those who make an entire batch of MP4s, ask Claude Code to convert them to GIFs in bulk in your desktop so that you do not need to make use of a converter site one file at a time.

PRESENTED BY TELY AI

The Rundown: Your buyers are asking AI questions — and AI is answering along with your competitors, not you. Tely makes AI like ChatGPT, Google, and Claude recommend your corporation as an alternative.

With Tely AI, you may:

  • Get really useful in ChatGPT, Google, Perplexity, and Claude in as little as 1 week

  • Fully hands-off: no writers, no agencies, no managing content

  • Costs lower than hiring freelancers or maintaining a marketing team

  • Ideal for area of interest industries where expertise matters

GOOGLE

Image source: Google

The Rundown: Google Research introduced TurboQuant, an algorithm that compresses AI model memory over 6x with none retraining — while delivering as much as 8x speed gains on Nvidia H100 chips and losing almost zero accuracy.

The small print:

  • AI models keep a running log of every conversation, and as chats get longer, that storage balloons, which slows responses and drives up costs.

  • TurboQuant shrinks that storage by over 6x with zero accuracy loss, scoring perfectly on tests that bury a key detail in a considerable amount of text.

  • On Nvidia’s top server chips, it also sped up response processing as much as 8x compared to straightforward methods, without adding any extra cost to run.

  • The paper, set to be presented at ICLR 2026 in April, also topped rival methods in vector search — the tech serps use to match similar results quickly.

Why it matters: Despite being first published in April 2025, top AI memory firms felt the warmth of the official release, with stocks dropping 3-5%. One compression paper won’t crater memory demand overnight, however the selloff shows Wall Street is pricing in a world where smarter software cuts into the premium AI memory commands.

QUICK HITS

  • 🎶 Lyria 3 Pro – Google’s upgraded AI music model with longer track outputs

  • 🌐 MolmoWeb – Ai2’s open-source web browsing agent

  • 🎨 Uni-1– Luma’s unified model that reasons and generates across text, images

  • ⚙️ Composer 2 – Cursor’s powerful, cost-effective coding model

Oracle Data Deep Dive NYC, April tenth: Hands-on AI labs and direct access to Oracle experts. Learn more and register at no cost.*

OpenAI is raising one other $10B to push its record funding round past $120B, with Microsoft, a16z, and T. Rowe Price joining the round.

Google upgraded its music AI model to generate full 3-minute songs with intros, verses, and choruses, with Lyria 3 Pro rolling out in Gemini, Vertex AI, and Google Vids.

Bret Taylor’s Sierra introduced Ghostwriter, an AI agent that builds other AI agents — letting firms create customer support bots across voice, chat, and 30+ languages.

The U.S. Department of Labor launched “Make America AI-Ready,” a free 7-day AI literacy course delivered entirely over text message to advertise AI upskilling.

COMMUNITY

Every newsletter, we showcase how a reader is using AI to work smarter, save time, or make life easier.

Today’s workflow comes from reader May F. in London, UK:

“I’m on maternity leave, but wanted to accumulate my AI knowledge, so I’ve used Claude Code to construct a custom dashboard of the information I’m tracking – feed time, naps, etc. I now get an email each morning with a summary of the day before today, with coaching tailored to my baby’s current age and development.”

How do you utilize AI? Tell us here.

That is it for today!

Before you go we’d like to know what you considered today’s newsletter to assist us improve The Rundown experience for you.
  • ⭐️⭐️⭐️⭐️⭐️ Nailed it
  • ⭐️⭐️⭐️ Average
  • ⭐️ Fail

Login or Subscribe to participate

See you soon,

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x