OpenAI’s first AI agent arrives

Good morning, AI enthusiasts. 2025 has already been widely declared the 12 months of AI agents, and OpenAI just officially joined the party.

The startup’s ‘Operator’ release takes us right into a latest realm with mainstream AI assistants that may navigate the web and take actions on their very own — our interactions with chatbots may never be the identical.

Exclusive: I got early access to check Operator, which didn’t disappoint. Try my thread of demos here.

In today’s AI rundown:

OpenAI unveils its first autonomous web agent
Perplexity debuts latest AI mobile assistant
Tips on how to prompt o1 models higher
‘Humanity’s Last Exam’ scales up AI benchmark
4 latest AI tools & 4 job opportunities

LATEST DEVELOPMENTS

OPENAI

🤖 OpenAI unveils its first autonomous web agent

Image source: OpenAI

The Rundown: OpenAI just launched Operator, an AI agent that may independently navigate web browsers to finish on a regular basis tasks — marking the corporate’s first major step into autonomous AI assistants.

The small print:

Operator uses a brand new Computer-Using Agent model that mixes 4o’s vision capabilities with advanced reasoning to interact naturally with web sites.
OpenAI demoed the feature during a live stream, showcasing tasks like booking reservations, grocery ordering, and buying tickets to sporting events.
OpenAI has partnered with major platforms like DoorDash, Instacart, and Uber to make sure the agent works seamlessly while respecting platform guidelines.
Built-in safety features include user approval for purchases, automated threat detection, and “takeover mode” for sensitive info like passwords and payments.
The research preview is currently limited to U.S. Pro users, with plans to expand to Plus, Team, and Enterprise after more safety and reliability testing.

Why it matters: While we’ve seen agentic systems popping up more continuously, OpenAI’s long-awaited move is a significant step towards broadly changing all the mindset of how we interact with AI. While there could also be rough edges at first, Operator seems like the official starting of a brand latest agentic era.

TOGETHER WITH WORKOS

🔐 Protect your AI apps from bad actors

The Rundown: WorkOS Radar is a security solution that shields your AI platform from fake signups, throwaway emails, and brute force attempts — all powered by advanced device fingerprinting and real-time detection.

With WorkOS Radar, you’ll be able to:

Rapidly detect and challenge unfamiliar and suspicious devices in real time
Stop free-tier abuse and fraudulent behavior with advanced detection
Customize threat responses to suit your app’s exact security needs

Start integrating Radar today.

PERPLEXITY

📱 Perplexity debuts latest AI mobile assistant

Image source: Perplexity

The Rundown: Perplexity just unveiled Perplexity Assistant, a free, agent-like tool for Android that may control phone apps and perform complex tasks with multimodal and voice capabilities — directly difficult voice assistants like Google’s Gemini and Siri.

The small print:

The brand new assistant integrates with popular apps like Uber and OpenTable to perform actions directly through voice commands or gesture controls.
It maintains context throughout interactions, allowing users to progress from research to motion — like finding restaurants and booking a table.
The system supports multimodal interactions through voice and camera, enabling users to acquire details about their surroundings or view screen content.
Users can replace Google’s default assistant with Perplexity’s solution for free of charge, with the feature only available on Android for now.

Why it matters: Operator isn’t the one agent on the town today, with Perplexity evolving its platform from a search/answer engine to a full-blown digital assistant. The assistant space could change into a brand new battleground for AI firms, not only tech giants — and this Perplexity launch looks rather a lot like what Apple’s ‘upgraded’ Siri should actually be.

AI TRAINING

🤖 Tips on how to prompt o1 models higher

The Rundown: Using delimiters and structured prompts significantly improves AI model outputs by providing clear instructions and relevant context.

Step-by-step:

Structure prompts using XML tags (, , ).
Fill each section with relevant, specific information and define clear parameters in your desired output.
Test, ask follow-up questions, and optimize your results.

Pro tip: Save successful prompt structures as templates for consistent results across similar tasks. The Rundown University members can access our full workshop on effectively using ChatGPT-o1 to get the perfect results here.

PRESENTED BY INNOVATING WITH AI

🤝 Turn AI passion right into a consulting profession

The Rundown: Innovating with AI’s latest program, AI Consultancy Project, transforms AI enthusiasts into skilled consultants — tapping right into a market projected to succeed in $54.7B by 2032.

The 6-month program delivers:

Proven frameworks for client acquisition and repair delivery
A step-by-step path to six-figure consulting income
Students who land their first AI client in as little as 3 days

Click here to request early access to The AI Consultancy Project.

SCALE AI & THE CENTER FOR AI SAFETY

🧐 ‘Humanity’s Last Exam’ scales up AI benchmark

Image source: Humanity’s Last Exam

The Rundown: The Center for AI Safety and Scale AI just introduced “Humanity’s Last Exam,” a brand new AI benchmark designed to be the ultimate frontier for testing an LLM’s academic knowledge — as current AI systems change into too strong for existing tests.

The small print:

The benchmark consists of three,000 expert-crafted questions across 100+ subjects, with contributors from over 500 institutions in 50 countries.
Current leading AI models show surprisingly low performance on HLE, with even top systems scoring under 10% accuracy.
Questions are in either exact-match or multiple-choice format, with 10% of the challenges incorporating multimodal evaluation of text and pictures.
A $500k prize pool incentivizes high-quality submissions, with top questions earning $5,000 each and co-authorship opportunities for contributors.

Why it matters: With top models routinely scoring above 90% on lots of today’s key benchmarks, tests like HLE are a crucial strategy to proceed scaling the flexibility to measure increasingly advancing AI systems. Nevertheless, given the speed of progress, it likely won’t be long before we see some impressive results on these benchmarks.

QUICK HITS

🛠️ Trending AI Tools

🤖 Gemini 2.0 Flash Pondering Exp – Google’s powerful latest reasoning model
⚙️ Trae – Adaptive AI IDE that helps you ship faster
💻 UI-TARS – Control your computer using natural language
🪄 Spell by Spline – Generate full 3D scenes or worlds from a single image

💼 AI Job Opportunities

🎯 The Rundown – Marketing/Media Buyer
👋 Lindy AI – Head of Community
📊 Snorkel – Data Annotator (Statistics Expert Contributor)
🗣️ UiPath – ASR Manager

📰 Every little thing else in AI today

Anthropic launched Citations, a brand new feature within the Claude API that permits automated source attribution and verification in responses for increased accuracy.

Google’s Imagen 3.0 debuted at No. 1 within the LM Text-to-Image Arena, giving the tech giant the highest spots on image and LLM leaderboards.

ByteDance is planning a $20B investment in AI infrastructure in 2025, half of which will probably be allocated to international data centers and partnerships with chip suppliers.

OpenAI CEO Sam Altman revealed that the upcoming o3-mini model upgrade will probably be available within the free tier of ChatGPT, with usage upgrades for plus users.

Hugging Face unveiled SmolVLM 256M and 500M, hailed because the world’s smallest vision language models that maintain competitive performance against larger rivals.

LinkedIn is facing a brand new class-action lawsuit, alleging that the corporate used the private messages of premium subscribers to coach AI models.

COMMUNITY

🎥 Join our next live workshop

Join our next workshop today at 4 PM EST to study methods to use DeepSeek-R1 as a strong, cost-effective alternative in your AI projects with Dr. Alvaro Cintas, The Rundown’s AI professor.

RSVP here. Not a member? Join The Rundown University on a 14-day free trial.

🤝 Share The Rundown, get rewards

We’ll at all times keep this article 100% free. To support our work, consider sharing The Rundown with your pals, and we’ll send you more free goodies.

That is it for today!

Before you go we’d like to know what you considered today’s newsletter to assist us improve The Rundown experience for you.

⭐️⭐️⭐️⭐️⭐️ Nailed it
⭐️⭐️⭐️ Average
⭐️ Fail

See you soon,

Rowan, Joey, Zach, Alvaro, and Jason—The Rundown’s editorial team