We’re thrilled to share that OVHcloud is now a supported Inference Provider on the Hugging Face Hub! OVHcloud joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for each JS and Python), making it super easy to make use of a wide range of models together with your preferred providers.
This launch makes it easier than ever to access popular open-weight models like gpt-oss, Qwen3, DeepSeek R1, and Llama — right from Hugging Face. You possibly can browse OVHcloud’s org on the Hub at https://huggingface.co/ovhcloud and take a look at trending supported models at https://huggingface.co/models?inference_provider=ovhcloud&sort=trending.
OVHcloud AI Endpoints are a totally managed, serverless service that gives access to frontier AI models from leading research labs via easy API calls. The service offers competitive pay-per-token pricing starting at €0.04 per million tokens.
The service runs on secure infrastructure situated in European data centers, ensuring data sovereignty and low latency for European users. The platform supports advanced features including structured outputs, function calling, and multimodal capabilities for each text and image processing.
Built for production use, OVHcloud’s inference infrastructure delivers sub-200ms response times for first tokens, making it ideal for interactive applications and agentic workflows. The service supports each text generation and embedding models. You possibly can learn more about OVHcloud’s platform and infrastructure at https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/.
Read more about how one can use OVHcloud as an Inference Provider in its dedicated documentation page.
See the list of supported models here.
How it really works
In the web site UI
In your user account settings, you’re capable of:
- Set your individual API keys for the providers you have signed up with. If no custom secret’s set, your requests might be routed through HF.
- Order providers by preference. This is applicable to the widget and code snippets within the model pages.
As mentioned, there are two modes when calling Inference Providers:
- Custom key (calls go on to the inference provider, using your individual API key of the corresponding inference provider)
- Routed by HF (in that case, you do not need a token from the provider, and the costs are applied on to your HF account quite than the provider’s account)
Model pages showcase third-party inference providers (those which can be compatible with the present model, sorted by user preference)
From the client SDKs
From Python, using huggingface_hub
The next example shows how one can use OpenAI’s gpt-oss-120b using OVHcloud because the inference provider. You should use a Hugging Face token for automatic routing through Hugging Face, or your individual OVHcloud AI Endpoints API key if you’ve got one.
Note: this requires using a recent version of huggingface_hub (>= 1.1.5).
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="openai/gpt-oss-120b:ovhcloud",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
)
print(completion.decisions[0].message)
From JS using @huggingface/inference
import { InferenceClient } from "@huggingface/inference";
const client = recent InferenceClient(process.env.HF_TOKEN);
const chatCompletion = await client.chatCompletion({
model: "openai/gpt-oss-120b:ovhcloud",
messages: [
{
role: "user",
content: "What is the capital of France?",
},
],
});
console.log(chatCompletion.decisions[0].message);
Billing
Here is how billing works:
-
For direct requests, i.e. if you use the important thing from an inference provider, you’re billed by the corresponding provider. As an example, for those who use an OVHcloud API key you are billed in your OVHcloud account.
-
For routed requests, i.e. if you authenticate via the Hugging Face Hub, you will only pay the usual provider API rates. There is no additional markup from us; we just go through the provider costs directly. (In the long run, we may establish revenue-sharing agreements with our provider partners.)
Essential Note ‼️ PRO users get $2 value of Inference credits every month. You should use them across providers. 🔥
Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20x higher limits, and more.
We also provide free inference with a small quota for our signed-in free users, but please upgrade to PRO for those who can!
Feedback and next steps
We’d like to get your feedback! Share your thoughts and/or comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49




