Whereas o1 was a serious technological advancement, GPT-5 is, above all else, a refined product. During a press briefing, Sam Altman compared GPT-5 to Apple’s Retina displays, and it’s an apt analogy, though perhaps not in the best way that he intended. Very similar to an unprecedentedly crisp screen, GPT-5 will furnish a more nice and seamless user experience. That’s not nothing, however it falls far in need of the transformative AI future that Altman has spent much of the past yr hyping. Within the briefing, Altman called GPT-5 “a big step along the trail to AGI,” or artificial general intelligence, and possibly he’s right—but if that’s the case, it’s a really small step.
Take the demo of the model’s abilities that OpenAI showed to upfront of its release. Yann Dubois, a post-training lead at OpenAI, asked GPT-5 to design an online application that will help his partner learn French in order that she could communicate more easily along with his family. The model did an admirable job of following his instructions and created an appealing, user-friendly app. But once I gave GPT-4o an almost similar prompt, it produced an app with the exact same functionality. The one difference is that it wasn’t as aesthetically pleasing.
A number of the other user-experience improvements are more substantial. Having the model somewhat than the user select whether to use reasoning to every query removes a serious pain point, especially for users who don’t follow LLM advancements closely.
And, based on Altman, GPT-5 reasons much faster than the o-series models. The undeniable fact that OpenAI is releasing it to nonpaying users suggests that it’s also cheaper for the corporate to run. That’s an enormous deal: Running powerful models cheaply and quickly is a troublesome problem, and solving it is essential to reducing AI’s environmental impact.
OpenAI has also taken steps to mitigate hallucinations, which have been a persistent headache. OpenAI’s evaluations suggest that GPT-5 models are substantially less more likely to make incorrect claims than their predecessor models, o3 and GPT-4o. If that advancement holds as much as scrutiny, it could help pave the best way for more reliable and trustworthy agents. “Hallucination may cause real safety and security issues,” says Dawn Song, a professor of computer science at UC Berkeley. For instance, an agent that hallucinates software packages could download malicious code to a user’s device.
GPT-5 has achieved the cutting-edge on several benchmarks, including a test of agentic abilities and the coding evaluations SWE-Bench and Aider Polyglot. But based on Clémentine Fourrier, an AI researcher at the corporate HuggingFace, those evaluations are nearing saturation, which suggests that current models have achieved near maximal performance.
“It’s principally like taking a look at the performance of a high schooler on middle-grade problems,” she says. “If the high schooler fails, it tells you something, but when it succeeds, it doesn’t let you know rather a lot.” Fourrier said she could be impressed if the system achieved a rating of 80% or 85% on SWE-Bench—however it only managed a 74.9%.
Ultimately, the headline message from OpenAI is that GPT-5 feels higher to make use of. “The vibes of this model are really good, and I feel that folks are really going to feel that, especially average individuals who have not been spending their time serious about models,” said Nick Turley, the top of ChatGPT.
Vibes alone, nonetheless, won’t bring concerning the automated future that Altman has promised. Reasoning felt like a serious step forward on the method to AGI. We’re still waiting for the subsequent one.