Following our recent announcement on Inference Providers on the Hub, we’re thrilled to share that Fireworks.ai is now a supported Inference Provider on HF Hub!
Fireworks.ai delivers blazing-fast serverless inference directly on model pages, in addition to throughout the entire HF ecosystem of libraries and tools, making it easier than ever to run inference in your favorite models.

Amongst others, starting now, you may run serverless inference to the next models via Fireworks.ai:
and plenty of more, you will discover the total list here.
Light up your projects with Fireworks.ai today!
How it really works
In the web site UI
Seek for all models supported by Fireworks on HF here.
From the client SDKs
from Python, using huggingface_hub
The next example shows use DeepSeek-R1 using Fireworks.ai as your inference provider. You should use a Hugging Face token for automatic routing through Hugging Face, or your individual Fireworks.ai API key if you might have one.
Install huggingface_hub from source:
pip install git+https://github.com/huggingface/huggingface_hub
Use the huggingface_hub python library to call Fireworks.ai endpoints by defining the provider parameter.
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="fireworks-ai",
api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=messages,
max_tokens=500
)
print(completion.selections[0].message)
from JS using @huggingface/inference
import { HfInference } from "@huggingface/inference";
const client = recent HfInference("xxxxxxxxxxxxxxxxxxxxxxxx");
const chatCompletion = await client.chatCompletion({
model: "deepseek-ai/DeepSeek-R1",
messages: [
{
role: "user",
content: "How to make extremely spicy Mayonnaise?"
}
],
provider: "fireworks-ai",
max_tokens: 500
});
console.log(chatCompletion.selections[0].message);
From HTTP calls
Here’s how you may call Llama-3.3-70B-Instruct using Fireworks.ai because the inference provider via cURL.
curl 'https://router.huggingface.co/fireworks-ai/v1/chat/completions'
-H 'Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxx'
-H 'Content-Type: application/json'
--data '{
"model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
"messages": [
{
"role": "user",
"content": "What is the meaning of life if you were a dog?"
}
],
"max_tokens": 500,
"stream": false
}'
Billing
For direct requests, i.e. once you use a Fireworks key, you’re billed directly in your Fireworks account.
For routed requests, i.e. once you authenticate via the hub, you will only pay the usual Fireworks API rates. There is not any additional markup from us, we just go through the provider costs directly. (In the long run, we may establish revenue-sharing agreements with our provider partners.)
Necessary Note ‼️ PRO users get $2 price of Inference credits every month. You should use them across providers. 🔥
Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20x higher limits, and more.

