OpenAI’s Foundry will let customers buy dedicated compute to run its AI models

Artificial Intelligence

OpenAI’s Foundry will let customers buy dedicated compute to run its AI models

admin

February 22, 2023

OpenAI’s Foundry will let customers buy dedicated compute to run its AI models

OpenAI is quietly launching a recent developer platform that lets customers run the corporate’s newer machine learning models, like GPT-3.5, on dedicated capability. In screenshots of documentation published to Twitter by users with early access, OpenAI describes the forthcoming offering, called Foundry, as “designed for cutting-edge customers running larger workloads.”

“[Foundry allows] inference at scale with full control over the model configuration and performance profile,” the documentation reads.

If the screenshots are to be believed, Foundry — every time it launches — will deliver a “static allocation” of compute capability (perhaps on Azure, OpenAI’s preferred public cloud platform) dedicated to a single customer. Users will have the ability to watch specific instances with the identical tools and dashboards that OpenAI uses to construct and optimize models. As well as, Foundry will provide some level of version control, letting customers determine whether or to not upgrade to newer model releases, in addition to “more robust” fine-tuning for OpenAI’s latest models.

Foundry can even offer service-level commitments as an example uptime and on-calendar engineering support. Rentals shall be based on dedicated compute units with three-month or one-year commitments; running a person model instance would require a particular variety of compute units (see the chart below).

Instances won’t be low cost. Running a light-weight version of GPT-3.5 will cost $78,000 for a three-month commitment or $264,000 over a one-year commitment. To place that into perspective, one among Nvidia’s recent-gen supercomputers, the DGX Station, runs $149,000 per unit.

Eagle-eyed Twitter and Reddit users spotted that one among the text-generating models listed within the instance pricing chart has a 32k max context window. (The context window refers back to the text that the model considers before generating additional text; longer context windows allow the model to “remember” more text essentially.) GPT-3.5, OpenAI’s latest text-generating model, has a 4k max context window, suggesting that this mysterious recent model might be the long-awaited GPT-4 — or a stepping stone toward it.

OpenAI is under increasing pressure to show a profit after a multi-billion-dollar investment from Microsoft. The corporate reportedly expects to make $200 million in 2023, a pittance in comparison with the greater than $1 billion that’s been put toward the startup to this point.

Compute costs are largely responsible. Training state-of-the-art AI models can command upwards of tens of millions of dollars, and running them generally isn’t less expensive. In accordance with OpenAI co-founder and CEO Sam Altman, it costs a few cents per chat to run ChatGPT, OpenAI’s viral chatbot — not an insignificant amount considering that ChatGPT had over 1,000,000 users as of last December.

In moves toward monetization, OpenAI recently launched a “pro” version of ChatGPT, ChatGPT Plus, starting at $20 per thirty days and teamed up with Microsoft to develop Bing Chat, a controversial chatbot (putting it mildly) that’s captured mainstream attention. According to Semafor and The Information, OpenAI plans to introduce a mobile ChatGPT app in the longer term and produce its AI language technology into Microsoft apps like Word, PowerPoint and Outlook.

Individually, OpenAI continues to make its tech available through Microsoft’s Azure OpenAI Service, a business-focused model-serving platform, and maintain Copilot, a premium code-generating service developed in partnership with GitHub.