Sam Altman, CEO of OpenAI, mentioned the model name ‘o2’ on X (Twitter) for the primary time. Although he quickly deleted the post, he ended up leaving a very important hint in regards to the next model.
Gigagene reported on the third (local time) that CEO Altman posted, “o2 achieved a rating of 105% in GPQA.” Nonetheless, this soon disappeared, and Hyunjae said, “I made a mistake. The one explanation left was, “I used the flawed account.”
The model called o2 has never been mentioned before. Nonetheless, it appears to be a successor version of the o1 model with enhanced inference performance.
CEO Altman said in a chat with Reddit users last week that “the corporate’s top goal is to develop the 1o model and its sequel.”
On the time, it was interpreted as specializing in the event of the o1 important model that had not yet been released, but in keeping with this post, it will probably be assumed that development of the follow-up model has already been accomplished.
No specific information has been disclosed, but in keeping with GPQA scores, it’s more likely to have record-breaking performance.
GPQA is a benchmark for evaluating AI performance and consists of 448 multiple-choice questions created by experts in biology, physics, and chemistry. The issue is so difficult that even when a median person tries it using Google search, the proper answer rate is simply 34%, and even doctoral degree holders or doctoral students only get a 65% rating.
It’s already getting used as a wide range of high-performance AI benchmarks. ‘GPT-4o’ recorded a rating of 53.6%, ‘Claude 3 Opus’ scored 50.4%, and ‘Rama 3 400b’ recorded a rating of 48.0%.
The figure of 105% is on a very different level from existing AI models. If o1 has the power of a graduate student, this means that o2 has an answering ability that surpasses that of a doctoral degree holder.
Meanwhile, CEO Altman said last week, “We’ve no plans to release GPT-5 inside this 12 months,” and added, “We’ll release several essential models by the top of this 12 months.”
The discharge date of o2 is unclear, but when its performance has already been verified through benchmarks, it seems likely that it can be included within the launch goal throughout the 12 months.
Reporter Park Chan cpark@aitimes.com