Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans

-



Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a strategic move that intensifies the AI startup's competition with deep-pocketed rivals OpenAI and Google.

The brand new model, Claude Opus 4.5, scored higher on Anthropic's most difficult internal engineering assessment than any human job candidate in the corporate's history, in line with materials reviewed by VentureBeat. The result underscores each the rapidly advancing capabilities of AI systems and growing questions on how the technology will reshape white-collar professions.

The Amazon-backed company is pricing Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens — a dramatic reduction from the $15 and $75 rates for its predecessor, Claude Opus 4.1, released earlier this 12 months. The move makes frontier AI capabilities accessible to a broader swath of developers and enterprises while putting pressure on competitors to match each performance and pricing.

"We would like to ensure this really works for individuals who need to work with these models," said Alex Albert, Anthropic's head of developer relations, in an exclusive interview with VentureBeat. "That is basically our focus: How can we enable Claude to be higher at helping you do the things that you just don't necessarily need to do in your job?"

The announcement comes as Anthropic races to take care of its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a specialized coding model called Codex Max that may work autonomously for prolonged periods. Google unveiled Gemini 3 just last week, prompting concerns even from OpenAI concerning the search giant's progress, in line with a recent report from The Information.

Opus 4.5 demonstrates improved judgment on real-world tasks, developers say

Anthropic's internal testing revealed what the corporate describes as a qualitative leap in Claude Opus 4.5's reasoning capabilities. The model achieved 80.9% accuracy on SWE-bench Verified, a benchmark measuring real-world software engineering tasks, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%), Anthropic's own Sonnet 4.5 (77.2%), and Google's Gemini 3 Pro (76.2%), in line with the corporate's data. The result marks a notable advance over OpenAI's current state-of-the-art model, which was released just five days earlier.

However the technical benchmarks tell only a part of the story. Albert said worker testers consistently reported that the model demonstrates improved judgment and intuition across diverse tasks — a shift he described because the model developing a way of what matters in real-world contexts.

"The model just sort of gets it," Albert said. "It just has developed this type of intuition and judgment on a number of real world things that feels qualitatively like an enormous jump up from past models."

He pointed to his own workflow for example. Previously, Albert said, he would ask AI models to collect information but hesitated to trust their synthesis or prioritization. With Opus 4.5, he's delegating more complete tasks, connecting it to Slack and internal documents to provide coherent summaries that match his priorities.

Opus 4.5 outscores all human candidates on company's hardest engineering test

The model's performance on Anthropic's internal engineering assessment marks a notable milestone. The take-home exam, designed for prospective performance engineering candidates, is supposed to judge technical ability and judgment under time pressure inside a prescribed two-hour limit.

Using a way called parallel test-time compute — which aggregates multiple attempts from the model and selects one of the best result — Opus 4.5 scored higher than any human candidate who has taken the test, in line with company. With out a cut-off date, the model matched the performance of the all-time human candidate when used inside Claude Code, Anthropic's coding environment.

The corporate acknowledged that the test doesn't measure other crucial skilled skills corresponding to collaboration, communication, or the instincts that develop over years of experience. Still, Anthropic said the result "raises questions on how AI will change engineering as a occupation."

Albert emphasized the importance of the finding. "I feel that is sort of an indication, perhaps, of what's to come back around how useful these models can actually be in a piece context and for our jobs," he said. "After all, this was an engineering task, and I might say models are relatively ahead in engineering in comparison with other fields, but I feel it's a extremely vital signal to listen to."

Dramatic efficiency improvements cut token usage by as much as 76% on key benchmarks

Beyond raw performance, Anthropic is betting that efficiency improvements will differentiate Claude Opus 4.5 available in the market. The corporate says the model uses dramatically fewer tokens — the units of text that AI systems process — to attain similar or higher outcomes in comparison with predecessors.

At a medium effort level, Opus 4.5 matches the previous Sonnet 4.5 model's best rating on SWE-bench Verified while using 76% fewer output tokens, in line with Anthropic. At the best effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while still using 48% fewer tokens.

To present developers more control, Anthropic introduced an "effort parameter" that permits users to regulate how much computational work the model applies to every task — balancing performance against latency and price.

Enterprise customers provided early validation of the efficiency claims. "Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to resolve the identical problems," said Michele Catasta, president of Replit, a cloud-based coding platform, in a press release to VentureBeat. "At scale, that efficiency compounds."

GitHub's chief product officer, Mario Rodriguez, said early testing shows Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half, and is particularly well-suited for tasks like code migration and code refactoring."

Early customers report AI agents that learn from experience and refine their very own skills

One of the crucial striking capabilities demonstrated by early customers involves what Anthropic calls "self-improving agents" — AI systems that may refine their very own performance through iterative learning.

Rakuten, the Japanese e-commerce and web company, tested Claude Opus 4.5 on automation of office tasks. "Our agents were in a position to autonomously refine their very own capabilities — achieving peak performance in 4 iterations while other models couldn't match that quality after 10," said Yusuke Kaji, Rakuten's general manager of AI for business.

Albert explained that the model isn't updating its own weights — the elemental parameters that outline an AI system's behavior — but somewhat iteratively improving the tools and approaches it uses to resolve problems. "It was iteratively refining a skill for a task and seeing that it's attempting to optimize the skill to recover performance so it could accomplish this task," he said.

The aptitude extends beyond coding. Albert said Anthropic has observed significant improvements in creating skilled documents, spreadsheets, and presentations. "They're saying that this has been the most important jump they've seen between model generations," Albert said. "So going even from Sonnet 4.5 to Opus 4.5, larger jump than any two models back to back previously."

Fundamental Research Labs, a financial modeling firm, reported that "accuracy on our internal evals improved 20%, efficiency rose 15%, and complicated tasks that after seemed out of reach became achievable," in line with co-founder Nico Christie.

Latest features goal Excel users, Chrome workflows and eliminate chat length limits

Alongside the model release, Anthropic rolled out a collection of product updates aimed toward enterprise users. Claude for Excel became generally available for Max, Team, and Enterprise users with latest support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users.

Perhaps most importantly, Anthropic introduced "infinite chats" — a feature that eliminates context window limitations by mechanically summarizing earlier parts of conversations as they grow longer. "Inside Claude AI, throughout the product itself, you effectively get this type of infinite context window because of the compaction, plus some memory things that we're doing," Albert explained.

For developers, Anthropic released "programmatic tool calling," which allows Claude to put in writing and execute code that invokes functions directly. Claude Code gained an updated "Plan Mode" and have become available on desktop in research preview, enabling developers to run multiple AI agent sessions in parallel.

Market heats up as OpenAI, Google race to match performance and pricing

Anthropic reached $2 billion in annualized revenue in the course of the first quarter of 2025, greater than doubling from $1 billion within the prior period. The number of shoppers spending greater than $100,000 annually jumped eightfold year-over-year.

The rapid release of Opus 4.5 — just weeks after Haiku 4.5 in October and Sonnet 4.5 in September — reflects broader industry dynamics. OpenAI released multiple GPT-5 variants throughout 2025, including a specialized Codex Max model in November that may work autonomously for as much as 24 hours. Google shipped Gemini 3 in mid-November after months of development.

Albert attributed Anthropic's accelerated pace partly to using Claude to hurry its own development. "We're seeing a number of assistance and speed-up by Claude itself, whether it's on the actual product constructing side or on the model research side," he said.

The pricing reduction for Opus 4.5 could pressure margins while potentially expanding the addressable market. "I'm expecting to see a number of startups start to include this into their products way more and have it prominently," Albert said.

Yet profitability stays elusive for leading AI labs as they invest heavily in computing infrastructure and research talent. The AI market is projected to top $1 trillion in revenue inside a decade, but no single provider has established dominant market position—whilst models reach a threshold where they’ll meaningfully automate complex knowledge work.

Michael Truell, CEO of Cursor, an AI-powered code editor, called Opus 4.5 "a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, said the model delivers "stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions."

For enterprises and developers, the competition translates to rapidly improving capabilities at falling prices. But as AI performance on technical tasks approaches—and sometimes exceeds—human expert levels, the technology's impact on skilled work becomes less theoretical.

When asked concerning the engineering exam results and what they signal about AI's trajectory, Albert was direct: "I feel it's a extremely vital signal to listen to."



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x