
The rumors were true: OpenAI on Thursday announced the discharge of its latest frontier large language model (LLM) family, GPT-5.2.
It comes at a pivotal moment for the AI pioneer, which has faced intensifying pressure since rival Google’s Gemini 3 LLM seized the highest spot on major third-party performance leaderboards and plenty of key benchmarks last month, though OpenAI leaders stressed in a press briefing that the timing of this release had been discussed and worked on well prematurely of the discharge of Gemini 3.
OpenAI describes GPT-5.2 as its "most capable model series yet for skilled knowledge work," aiming to reclaim the performance crown with significant gains in reasoning, coding, and agentic workflows.
"It’s our most advanced frontier model and the strongest yet available in the market for skilled use," Fidji Simo, OpenAI’s CEO of Applications, said during a press briefing today. "We designed 5.2 to unlock much more economic value for people. It's higher at creating spreadsheets, constructing presentations, writing code, perceiving images, understanding long context, using tools, and handling complex, multi-step projects."
GPT-5.2 incorporates a massive 400,000-token context window — allowing it to ingest a whole lot of documents or large code repositories without delay — and a 128,000 max output token limit, enabling it to generate extensive reports or full applications in a single go.
The model also incorporates a knowledge cutoff of August 31, 2025, ensuring it’s up-to-date with relatively recent world events and technical documentation. It explicitly includes "Reasoning token support," confirming the underlying architecture uses the chain-of-thought processing popularized by the "o1" series.
The 'Code Red' Reality Check
The discharge arrives following The Information's report of an emergency "Code Red" directive to OpenAI staff from CEO Sam Altman to enhance ChaTGPT — a move reportedly designed to mobilize resources following the "quality gap" exposed by Gemini 3. The Verge similarly reported on the timing of GPT-5.2's release ahead of the official announcement.
In the course of the briefing, OpenAI executives acknowledged the directive but pushed back on the narrative that the model was rushed solely to reply Google.
"It will be important to notice this has been within the works for a lot of, many months," Simo told reporters. She clarified that while the "Code Red" helped focus the corporate, it wasn't the only real driver of the timeline.
"We announced this Code Red to actually signal to the corporate that we wish to marshal resources in a single particular area… but that's not the explanation it's coming out this week specifically."
Max Schwarzer, lead of OpenAI's post-training team, echoed this sentiment to dispel the concept of a panic launch. "We've been planning for this release since a really very long time ago… this specific week we talked about many months ago."
A spokesperson from OpenAI further clarified that the "Code Red" call applied to ChatGPT as a product, not solely underlying model development or the discharge of recent models.
Under the Hood: Quick, Pondering, and Pro
OpenAI is segmenting the GPT-5.2 release into three distinct tiers inside ChatGPT, a technique likely designed to balance the large compute costs of "reasoning" models with user demand for speed:
-
GPT-5.2 Quick: Optimized for speed and day by day tasks like writing, translation, and data searching for.
-
GPT-5.2 Pondering: Designed for "complex, structured work" and long-running agents, this model leverages deeper reasoning chains to handle coding, math, and multi-step projects.
-
GPT-5.2 Pro: The brand new heavyweight champion. OpenAI describes this as its "smartest and most trustworthy option," delivering the very best accuracy for difficult questions where quality outweighs latency.
For developers, the models can be found immediately in the appliance programming interface (API) as gpt-5.2, gpt-5.2-chat-latest (Quick), and gpt-5.2-pro.
The Numbers: Beating the Benchmarks
The GPT-5.2 release includes leading metrics across most domains — specifically those that concentrate on the "skilled knowledge work" gap where competitors have recently gained ground.
OpenAI highlighted a brand new benchmark called GDPval, which measures performance on "well-specified knowledge work tasks" across 44 occupations.
"GPT-5.2 Pondering is now state-of-the-art on that benchmark… and beats or ties top industry professionals on 70.9% of well-specified skilled tasks like spreadsheets, presentations, and document creation, in response to expert human judges," Simo said.
Within the critical arena of coding, OpenAI is claiming a decisive lead. Schwarzer noted that on SWE-bench Pro, a rigorous evaluation of real-world software engineering, GPT-5.2 Pondering sets a brand new state-of-the-art rating of 55.6%.
He emphasized that this benchmark is "more contamination resistant, difficult, diverse, and industrially relevant than previous benchmarks like SWE-bench Verified."Other key benchmark results include:
-
GPQA Diamond (Science): GPT-5.2 Pro scored 93.2%, edging out GPT-5.2 Pondering (92.4%) and surpassing GPT-5.1 Pondering (88.1%).
-
FrontierMath: On Tier 1-3 problems, GPT-5.2 Pondering solved 40.3%, a major jump from the 31.0% achieved by its predecessor.
-
ARC-AGI-1: GPT-5.2 Pro is reportedly the primary model to cross the 90% threshold on this general reasoning benchmark, scoring 90.5%
The Price of Intelligence
Performance comes at a premium. While ChatGPT subscription pricing stays unchanged for now, the API costs for the brand new flagship models are steep in comparison with previous generations, reflecting the high compute demands of "considering" mode. They're also on the upper-end of API costs for the industry.
-
GPT-5.2 Pondering: Priced at $1.75 per 1 million input tokens and $14 per 1 million output tokens.
-
GPT-5.2 Pro: The prices jump significantly to $21 per 1 million input tokens and $168 per 1 million output tokens.
GPT-5.2 Pondering is priced 40% higher within the API than the usual GPT-5.1 ($1.25/$10), signaling that OpenAI views the brand new reasoning capabilities as a tangible value-add moderately than a mere efficiency update.
The high-end GPT-5.2 Pro follows the identical pattern, costing 40% greater than the previous GPT-5 Pro ($15/$120). While expensive, it still undercuts OpenAI’s most specialized reasoning model, o1-pro, which stays the most expensive offering on the menu at a staggering $150 per million input tokens and $600 per million output tokens.
OpenAI argues that despite the upper per-token cost, the model’s "greater token efficiency" and talent to unravel tasks in fewer turns make it economically viable for high-value enterprise workflows.
Here's the way it compares to the present API costs for other competing models across the LLM field:
|
Model |
Input (/1M) |
Output (/1M) |
Total Cost |
Source |
|
Qwen 3 Turbo |
$0.05 |
$0.20 |
$0.25 |
|
|
Grok 4.1 Fast (reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
Grok 4.1 Fast (non-reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
deepseek-chat (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
deepseek-reasoner (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
Qwen 3 Plus |
$0.40 |
$1.20 |
$1.60 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
$4.25 |
|
|
Claude Haiku 4.5 |
$1.00 |
$5.00 |
$6.00 |
|
|
Qwen-Max |
$1.60 |
$6.40 |
$8.00 |
|
|
Gemini 3 Pro (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.2 |
$1.75 |
$14.00 |
$15.75 |
|
|
Gemini 3 Pro (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Claude Sonnet 4.5 |
$3.00 |
$15.00 |
$18.00 |
|
|
Claude Opus 4.5 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.2 Pro |
$21.00 |
$168.00 |
$189.00 |
Image Generation: Nothing Recent Yet…But 'More to Come'
In the course of the briefing, VentureBeat asked the OpenAI participants if the brand new release included any boost to image generation capabilities, noting the joy around similar features in recent competitor launches like Google's Gemini 3 Image aka Nano Banana Pro.
Unfortunately for those searching for to recreate the sort of text-and-information heavy graphics and image editing capabilities, OpenAI executives clarified that GPT-5.2 comes with no current image improvements over the prior GPT-5.1 and OpenAI's integrated DALL-E 3 and gpt-4o native image generation models.
"On image Gen, nothing to announce today, but more to come back," Simo said. She acknowledged the recognition of the feature, adding, "We all know that is an important use case that individuals love, that we introduced [to] the market, and so definitely more to come back there."
Aidan Clark, OpenAI's lead of coaching, also declined to comment on visual generation specifics, stating simply, "I can't really speak to image Gen myself."
The 'Mega-Agent' Era
Beyond raw scores, OpenAI is positioning GPT-5.2 because the engine for a brand new generation of "long-running agents" able to executing multi-step workflows without human hand-holding."
Box found that 5.2 can extract information from long, complex documents about 40% faster, and likewise saw a 40% boost in reasoning accuracy for Life Sciences and healthcare," Simo said.
She also noted that Notion reported the model "outperforms 5.1 across every dimension… and it excels on the sort of really ambiguous, longer rising tasks that outline real knowledge work."Schwarzer added that coding startups like Augment Code found the model "delivered substantially stronger deep code capabilities than any prior model," which is why it was chosen to power their latest code review agent.Visual capabilities have also seen an upgrade.
OpenAI's release blog post shows an example where "a traveler reports a delayed flight, a missed connection, an overnight stay in Recent York, and a medical seating requirement."
The final result? "GPT‑5.2 manages all the chain of tasks—rebooking, special-assistance seating, and compensation—delivering a more complete final result than GPT‑5.1."
A brand new evaluation called ScreenSpot-Pro, which tests a model's ability to grasp GUI screenshots, shows GPT-5.2 Pondering achieving 86.3% accuracy, in comparison with just 64.2% for GPT-5.1.
Science and Reliability
OpenAI leaders also stressed the model's utility for scientific research, attempting to maneuver the conversation beyond easy chatbots to research assistants.
Aidan Clark, lead of the training team, shared an example of a senior immunology researcher testing the model.
"They tested it by asking it to generate an important unanswered questions on the immune system," Clark said. "That immunology researcher reported that GPT-5.2 produced sharper questions and stronger explanations for why those questions… matter in comparison with any previous pro model.
"Reliability was one other key focus. Schwarzer claimed the brand new model "hallucinates substantially lower than GPT-5.1," noting that on a set of de-identified queries, "responses contained errors 38% less often."
The 'Vibe' Shift
Interestingly, OpenAI acknowledged that not every user might immediately prefer the brand new models.
When asked why legacy models like GPT-5.1 would remain available, Schwarzer admitted that "models change just a little bit each time.
"Some users may find that they like the vibes of the previous model, although we predict the most recent one is across the board generally a lot better," Schwarzer said. He also noted that for some enterprise customers who’ve "really fine-tuned a prompt for a particular model," there is perhaps "small regressions," necessitating access to the older versions.
Safety, 'Adult Mode,' and Future Roadmap
Addressing safety concerns, Simo confirmed that the corporate is preparing to roll out an "Adult Mode" in the primary quarter of next yr, following the implementation of a brand new age prediction system.
"We're within the means of improving that," Simo said regarding the age prediction technology.
"We wish to try this ahead of launching adult mode."Looking further ahead, industry reports suggest OpenAI is working on a more fundamental architectural shift under the codename "Project Garlic," targeting a flagship release in early 2026.
While executives didn’t comment on specific future roadmaps throughout the briefing, Simo remained optimistic concerning the economics of their current trajectory.
"Should you have a look at historical trends, compute has increased about 3x yearly for the last three years," she explained. "Revenue has also increased at the identical pace… creating this virtuous cycle."
Clark added that efficiency is improving rapidly: "The model we're releasing today achieves a good higher rating [on ARC-AGI] with almost 400 times less cost and fewer compute related to it" in comparison with models from a yr ago.
GPT-5.2 Quick, Pondering, and Pro begin rolling out in ChatGPT today to paid users (Plus, Pro, Team, and Enterprise). The corporate notes the rollout will likely be gradual to take care of stability.
