AI won’t be coming for lawyers’ jobs anytime soon

-

But recent benchmarks are aiming to higher measure the models’ ability to do legal work in the actual world. The Skilled Reasoning Benchmark, published by ScaleAI in November, evaluated leading LLMs on legal and financial tasks designed by professionals in the sector. The study found that the models have critical gaps of their reliability for skilled adoption, with the best-performing model scoring only 37% on essentially the most difficult legal problems, meaning it met just over a 3rd of possible points on the evaluation criteria. The models steadily made inaccurate legal judgments, and in the event that they did reach correct conclusions, they did so through incomplete or opaque reasoning processes. 

“The tools actually will not be there to principally substitute [for] your lawyer,” says Afra Feyza Akyurek, the lead writer of the paper. “Though loads of people think that LLMs have grasp of the law, it’s still lagging behind.” 

The paper builds on other benchmarks measuring the models’ performance on economically priceless work. The AI Productivity Index, published by the information firm Mercor in September and updated in December, found that the models have “substantial limitations” in performing legal work. The perfect-performing model scored 77.9% on legal tasks, meaning it satisfied roughly 4 out of 5 evaluation criteria. A model with such a rating might generate substantial economic value in some industries, but in fields where errors are costly, it might not be useful in any respect, the early version of the study noted.  

Skilled benchmarks are a giant step forward in evaluating the LLMs’ real-world capabilities, but they could still not capture what lawyers actually do. “These questions, although tougher than those in past benchmarks, still don’t fully reflect the sorts of subjective, extremely difficult questions lawyers tackle in real life,” says Jon Choi, a law professor at Washington University School of Law, who coauthored a study on legal benchmarks in 2023. 

Unlike math or coding, by which LLMs have made significant progress, legal reasoning could also be difficult for the models to learn. The law deals with messy real-world problems, riddled with ambiguity and subjectivity, that usually haven’t any right answer, says Choi. Making matters worse, loads of legal work isn’t recorded in ways in which may be used to coach the models, he says. When it’s, documents can span a whole lot of pages, scattered across statutes, regulations, and court cases that exist in a posh hierarchy.  

But a more fundamental limitation is perhaps that LLMs are simply not trained to think like lawyers. “The reasoning models still don’t fully reason about problems like we humans do,” says Julian Nyarko, a law professor at Stanford Law School. The models may lack a mental model of the world—the flexibility to simulate a scenario and predict what is going to occur—and that capability could possibly be at the center of complex legal reasoning, he says. It’s possible that the present paradigm of LLMs trained on next-word prediction gets us only to date.  

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x