OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

-



But 1,000 tokens per second is definitely modest by Cerebras standards. The corporate has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s own open-weight gpt-oss-120B model, suggesting that Codex-Spark’s comparatively lower speed reflects the overhead of a bigger or more complex model.

AI coding agents have had a breakout 12 months, with tools like OpenAI’s Codex and Anthropic’s Claude Code reaching a brand new level of usefulness for rapidly constructing prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic have all been racing to ship more capable coding agents, and latency has grow to be what separates the winners; a model that codes faster lets a developer iterate faster.

With fierce competition from Anthropic, OpenAI has been iterating on its Codex line at a rapid rate, releasing GPT-5.2 in December after CEO Sam Altman issued an internal “code red” memo about competitive pressure from Google, then shipping GPT-5.3-Codex just days ago.

Diversifying away from Nvidia

Spark’s deeper hardware story could also be more consequential than its benchmark scores. The model runs on Cerebras’ Wafer Scale Engine 3, a chip the dimensions of a dinner plate that Cerebras has built its business around since a minimum of 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the primary product to return out of it.

OpenAI has spent the past 12 months systematically reducing its dependence on Nvidia. The corporate signed a large multi-year take care of AMD in October 2025, struck a $38 billion cloud computing agreement with Amazon in November, and has been designing its own custom AI chip for eventual fabrication by TSMC.

Meanwhile, a planned $100 billion infrastructure take care of Nvidia has fizzled thus far, though Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI grew unsatisfied with the speed of some Nvidia chips for inference tasks, which is strictly the type of workload that OpenAI designed Codex-Spark for.

No matter which chip is under the hood, speed matters, though it might come at the fee of accuracy. For developers who spend their days inside a code editor waiting for AI suggestions, 1,000 tokens per second may feel less like rigorously piloting a jigsaw and more like running a rip saw. Just watch what you’re cutting.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x