OpenAI just released GPT-4.5 and says it’s its biggest and best chat model yet

-

Unlike reasoning models reminiscent of o1 and o3, which work through answers step-by-step, most large language models like GPT-4.5 spit out the primary response they give you. But GPT-4.5 is more general-purpose. Tested on SimpleQA, a form of general-knowledge quiz developed by OpenAI last 12 months that features questions on topics from science and technology to TV shows and video games, GPT-4.5 scores 62.5% compared with 38.6% for GPT-4o and 15% for o3-mini.

What’s more, OpenAI claims that GPT-4.5 responds with far fewer made-up answers (often known as hallucinations). On the identical test, GPT-4.5 made up answers 37.1% of the time, compared with 59.8% for GPT-4o and 80.3% for o3-mini.

But SimpleQA is only one benchmark. On other tests, including MMLU, a more common benchmark for comparing large language models, GPT-4.5 beat OpenAI’s previous models by a smaller margin. And on standard science and math benchmarks, GPT-4.5 scores worse than o3-mini.

Turning on the charm

GPT-4.5’s special charm appears to be its conversational skills. Human testers employed by OpenAI say they preferred GPT-4.5 to GPT-4o for on a regular basis queries, skilled queries, and artistic tasks, including coming up with poems. (Ryder says it is usually great at old-school web ACSII art.)  

For instance, tell it that you simply’re going through a rough patch and GPT-4.5 might offer just a few words of sympathy before saying: “Wish to speak about what happened, or do you simply need a distraction? I’m here either way.” GPT-4o is less good at reading social cues and might attempt to fix the issue whether you asked it to or not, hitting you with a bullet point list of how to cheer yourself up.

And yet after years at the highest, OpenAI faces a troublesome crowd. “The concentrate on emotional intelligence and creativity is cool for area of interest use cases like writing coaches and brainstorming buddies,” says Waseem Alshikh, cofounder and CTO of Author, a startup that develops large language models for enterprise customers.

“But GPT-4.5 seems like a shiny recent coat of paint on the standard automotive,” he says. “Throwing more compute and data at a model could make it sound smoother, nevertheless it’s not a game-changer.”

“The juice isn’t well worth the squeeze once you consider the energy costs and the proven fact that most users won’t notice the difference in every day use,” he says. “I’d quite see them pivot to efficiency or area of interest problem-solving than keep supersizing the identical recipe.”

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x