That is essentially the most misunderstood graph in AI

-

That was definitely the case for Claude Opus 4.5, the newest version of Anthropic’s strongest model, which was released in late November. In December, METR announced that Opus 4.5 seemed to be able to independently completing a task that might have taken a human about five hours—an unlimited improvement over what even the exponential trend would have predicted. One Anthropic safety researcher tweeted that he would change the direction of his research in light of those results; one other worker at the corporate simply wrote, “mom come pick me up i’m scared.”

Credit: METR.ORG

But the reality is more complicated than those dramatic responses would suggest. For one thing, METR’s estimates of the talents of specific models include substantial error bars. As METR explicitly stated on X, Opus 4.5 might have the ability to recurrently complete only tasks that take humans about two hours, or it’d succeed on tasks that take humans so long as 20 hours. Given the uncertainties intrinsic to the tactic, it was inconceivable to know of course. 

“There are a bunch of how that folks are reading an excessive amount of into the graph,” says Sydney Von Arx, a member of METR’s technical staff.

More fundamentally, the METR plot doesn’t measure AI abilities writ large, nor does it claim to. As a way to construct the graph, METR tests the models totally on coding tasks, evaluating the problem of every by measuring or estimating how long it takes humans to finish it—a metric that not everyone accepts. Claude Opus 4.5 might have the ability to finish certain tasks that take humans five hours, but that doesn’t mean it’s anywhere near replacing a human employee.

METR was founded to evaluate the risks posed by frontier AI systems. Though it’s best known for the exponential trend plot, it has also worked with AI firms to guage their systems in greater detail and published several other independent research projects, including a widely covered July 2025 study suggesting that AI coding assistants might actually be slowing software engineers down. 

However the exponential plot has made METR’s status, and the organization appears to have an advanced relationship with that graph’s often breathless reception. In January, Thomas Kwa, certainly one of the lead authors on the paper that introduced it, wrote a blog post responding to some criticisms and making clear its limitations, and METR is currently working on a more extensive FAQ document. But Kwa isn’t optimistic that these efforts will meaningfully shift the discourse. “I feel the hype machine will principally, whatever we do, just strip out all of the caveats,” he says.

Nevertheless, the METR team does think that the plot has something meaningful to say concerning the trajectory of AI progress. “It is best to absolutely not tie your life to this graph,” says Von Arx. “But additionally,” she adds, “I bet that this trend is gonna hold.”

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x