Poetiq cracks major reasoning benchmark

Good morning, AI enthusiasts. Six months ago, the most effective AI models could barely hit 5% on the ARC-AGI-2 reasoning benchmark. Today, a tiny startup just crossed 50% — and beat Google using its own model in the method.

With a “meta-system” that refines existing models relatively than constructing from scratch, Poetiq’s achievement shows that the subsequent breakthroughs might come from clever engineering, not only pure scale.

In today’s AI rundown:

Poetiq tops ARC-AGI-2 with Gemini variant
The Rundown Roundtable: Our AI use cases
Create LinkedIn carousels in ChatGPT with Canva
Poetry prompts can bypass AI safety guardrails
4 latest AI tools, community workflows, and more

LATEST DEVELOPMENTS

POETIQ

🏆 Poetiq tops ARC-AGI-2 with Gemini variant

Image source: Poetiq

The Rundown: Six-person AI startup Poetiq just officially claimed the highest spot on the ARC-AGI-2 reasoning benchmark, beating out Google’s Gemini 3 Deep Think at half the fee by orchestrating existing models over constructing its own.

The main points:

Poetiq’s meta-system adapts to latest models inside hours, achieving the top-ranked results shortly after Gemini 3 launched with none retraining.
Using Gemini 3 Pro as a base, Poetiq’s refinement system scored 54% at $30 per task — outpacing Google’s top variant Deep Think at 45% and $77.
The result marks the primary system to crack the 50% barrier on ARC-AGI-2, with leading models previously struggling to hit 5% just six months ago.
The startup’s open-sourced approach uses LLMs to constantly refine their very own outputs, with a built-in self-auditing system to make sure quality solutions.

Why it matters: The ARC-AGI-2 progress from sub-5% to over 50% in only months shows how quickly things are advancing. Poetiq’s refinement shows a future with AI gains coming from two directions without delay: frontier model development and clever orchestration built on top of them from teams without massive compute budgets.

TOGETHER WITH LINDY

🦾 AI that works like a teammate, not a chatbot

The Rundown: Describe what you wish done, and Lindy builds custom AI agents that qualify your leads, draft your reports, handle customer support, and knock out the busywork eating up your team’s day. No coding. No complexity. Just results.

What you may automate today:

Sales agents that qualify leads and book meetings whilst you sleep
Support agents who resolve tickets immediately across phone and chat
Ops agents that turn hours of manual work into minutes

Start free with $20 in credits today and rise up and running in minutes with Lindy’s 6,000+ integrations.

THE RUNDOWN ROUNDTABLE

💡 The Rundown Roundtable: Our AI use cases

Image source: Ideogram / The Rundown

The Rundown: The Rundown Roundtable is a weekly feature by which we poll members of The Rundown staff about how we use AI in our work and every day lives.

Billy, Educator: I’m an enormous basketball fan. The launch of Nano Banana 3.0 coincided with the beginning of the NBA season. So to check its consistency, I used a dynamic prompt formula in Google Sheets + Nano Banana to generate product photos of hats for every NBA team. I used to be in a position to get consistent styling across each design as in the event that they were a part of a fictional brand. Now I just need AI to get me an NBA licensing deal…

Reagan, Strategic Partnerships: Being outdoors and in nature is an element of my every day life. In the course of the week, I often go for long walks in between work blocks and recently discovered Wispr Flow. It’s a time I’m often pondering through work solutions and brainstorming ideas, so having the flexibility to easily talk and have those ideas transcribed and sent on to my workspace has been amazing.

Rishi, Product Marketing Manager: I’m constructing a brand new paid promoting tracker in Google Sheets, and need to document certain parts that need explanation in our central database (Notion).

A simple technique to do that is filming looms, taking the transcript, and plugging it into ChatGPT with the next prompt “I filmed a Loom explaining X. Using the transcript below, please write a 5-8 sentence summary which explains what X is, what it does, what it means, and the best way to use it in a straightforward to know way?”

AI TRAINING

🎨 Create LinkedIn carousels in ChatGPT with Canva

The Rundown: On this tutorial, you’ll learn the best way to create skilled LinkedIn carousels in minutes using ChatGPT’s Canva app integration, which supplies you the flexibility to draft content and design slides all inside a single interface.

Step-by-step:

Go to ChatGPT, open a brand new chat, click the ‘+’ button to pick out Canvas, then prompt: “Write a 5-slide LinkedIn carousel on ‘(your topic)’. Slide 1: A hook. Slides 2-4: One tip each. Slide 5: A CTA. Keep each under 40 words”
Refine your content in Canvas, then activate Canva by prompting: “@canva, create a 5-slide LinkedIn carousel using this content [paste slides]. Use a (detailed variety of your alternative). Stick with the content copy exactly”
Preview the 4 design options ChatGPT generates, select your favorite, and click on the Canva link to open your editable carousel
Review each slide in Canva, make any final tweaks, then click Download and choose PDF for LinkedIn documents or PNG for individual slides

Pro tip: Use your brand colours and fonts consistently — when you prompt them in chat, the mixing applies them routinely to the carousels.

PRESENTED BY FIDDLER AI

🔎 Gain visibility, context, and control for enterprise agents

The Rundown: Fiddler AI’s upcoming product webinar breaks down how agentic observability can improve AI performance and behavior with visibility, context, and control. Gain deep insights of your AI systems through end-to-end visibility, from pre-production evaluation to production monitoring.

On this live webinar, learn the best way to:

Validate agent behavior before production with golden and challenger datasets
Track system-wide health and drill into span-level metrics across the agentic hierarchy
Diagnose reasoning chains and decision paths to pinpoint points of failure

AI RESEARCH

✍️ Poetry prompts can bypass AI safety guardrails

Image source: Reve / The Rundown

The Rundown: A brand new study from Italy’s Icaro Labs just discovered that reformulating harmful requests as poetry can trick leading AI models into producing dangerous content, with some systems falling for the technique each time.

The main points:

Icaro Lab tested 25 frontier models from major labs like OpenAI, Google, and Anthropic, finding poetry verses achieved a 62% average jailbreak success rate.
Google’s Gemini 2.5 Pro was most vulnerable at 100%, while OpenAI’s smaller GPT-5 nano resisted all attempted poetry attacks.
The poem prompting unlocked dangerous responses on topics including weapons development, hacking, and psychological manipulation.
Researchers declined to publish the precise poems, calling them “too dangerous” despite reportedly being easy enough for anyone to create.

Why it matters: AI safety has turn into a whack-a-mole game, with poetry now joining roleplay scenarios, foreign language tricks, and encoding exploits on the growing list of unexpected vulnerabilities. Each patch seems to ask a brand new creative workaround — and there’s no finish line for an issue that is just going to get more advanced.

QUICK HITS

🛠️ Trending AI Tools

3️⃣ Mistral 3 – Mistral’s next-generation of open-source models
🌱 Seedream 4.5 – ByteDance’s image AI with powerful editing, text rendering
🧍Kling Avatar 2.0 – Upgraded avatar model with as much as 5-minute generations
🗣️VibeVoice – Microsoft’s open-source, real-time text-to-speech model

📰 Every little thing else in AI today

OpenAI is turning off shopping suggestions after backlash over responses that looked like ads, with CRO Mark Chen saying they “fell short” on the implementation.

Meta acquired Limitless, a startup backed by Sam Altman that makes an AI-powered pendant for recording and transcribing real-world conversations.

The Recent York Times and Chicago Tribune filed separate lawsuits against Perplexity over copyright infringement, marking the NYT’s second lawsuit against the AI startup.

Meta announced a series of recent AI licensing deals with publishers, including CNN, Fox News, and USA Today, to feed real-time news content into its Meta AI platform.

The U.S. Department of Energy launched AMP2, a brand new AI research platform that officials say shall be the world’s largest autonomous system for studying microbes.

COMMUNITY

🤝 Community AI workflows

Every newsletter, we showcase how a reader is using AI to work smarter, save time, or make life easier.

Today’s workflow comes from reader Anonymous in Houston, TX:

“I recently used ChatGPT as a strategic partner throughout a full interview and negotiation process, and the experience was surprisingly impactful. I leaned on AI to assist me prep for interviews, refine talking points, and rehearse answers so I used to be confident and concise.

Once the offer stage began, ChatGPT helped me craft positioning statements, negotiation language, and follow-up emails that were assertive but skilled.”

How do you employ AI? Tell us here.

🎓 Highlights: News, Guides & Events

Read our last AI newsletter: Anthropic puts Claude within the interviewer chair
Read our last Tech newsletter: Netflix buys Warner Bros. in $82B deal
Read our last Robotics newsletter: Humanoid breaks record for fastest construct
Today’s AI tool guide: Reverse Engineer Ad Creatives in Minutes
Watch our last live workshop: Nano Banana For Slide Decks

That is it for today!

Before you go we’d like to know what you considered today’s newsletter to assist us improve The Rundown experience for you.

⭐️⭐️⭐️⭐️⭐️ Nailed it
⭐️⭐️⭐️ Average
⭐️ Fail

See you soon,

Rowan, Joey, Zach, Shubham, and Jennifer — the humans behind The Rundown