Welcome Fireworks.ai on the Hub 🎆

Following our recent announcement on Inference Providers on the Hub, we’re thrilled to share that Fireworks.ai is now a supported Inference Provider on HF Hub!

Fireworks.ai delivers blazing-fast serverless inference directly on model pages, in addition to throughout the entire HF ecosystem of libraries and tools, making it easier than ever to run inference in your favorite models.

Fireworks.ai supported as Inference Provider on Hugging Face

Amongst others, starting now, you may run serverless inference to the next models via Fireworks.ai:

and plenty of more, you will discover the total list here.

Light up your projects with Fireworks.ai today!

How it really works

In the web site UI

Seek for all models supported by Fireworks on HF here.

From the client SDKs

from Python, using huggingface_hub

The next example shows use DeepSeek-R1 using Fireworks.ai as your inference provider. You should use a Hugging Face token for automatic routing through Hugging Face, or your individual Fireworks.ai API key if you might have one.

Install huggingface_hub from source:

pip install git+https://github.com/huggingface/huggingface_hub

Use the huggingface_hub python library to call Fireworks.ai endpoints by defining the provider parameter.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="fireworks-ai",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.selections[0].message)

from JS using @huggingface/inference

import { HfInference } from "@huggingface/inference";

const client = recent HfInference("xxxxxxxxxxxxxxxxxxxxxxxx");

const chatCompletion = await client.chatCompletion({
    model: "deepseek-ai/DeepSeek-R1",
    messages: [
        {
            role: "user",
            content: "How to make extremely spicy Mayonnaise?"
        }
    ],
    provider: "fireworks-ai",
    max_tokens: 500
});

console.log(chatCompletion.selections[0].message);

From HTTP calls

Here’s how you may call Llama-3.3-70B-Instruct using Fireworks.ai because the inference provider via cURL.

curl 'https://router.huggingface.co/fireworks-ai/v1/chat/completions' 
-H 'Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxx' 
-H 'Content-Type: application/json' 
--data '{
    "model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is the meaning of life if you were a dog?"
        }
    ],
    "max_tokens": 500,
    "stream": false
}'

Billing

For direct requests, i.e. once you use a Fireworks key, you’re billed directly in your Fireworks account.

For routed requests, i.e. once you authenticate via the hub, you will only pay the usual Fireworks API rates. There is not any additional markup from us, we just go through the provider costs directly. (In the long run, we may establish revenue-sharing agreements with our provider partners.)

Necessary Note ‼️ PRO users get $2 price of Inference credits every month. You should use them across providers. 🔥

Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20x higher limits, and more.

Source link

Welcome Fireworks.ai on the Hub 🎆

How it really works

In the web site UI

From the client SDKs

from Python, using huggingface_hub

from JS using @huggingface/inference

From HTTP calls

Billing

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Scaling up BERT-like model Inference on modern CPU

Architecting GPUaaS for Enterprise AI On-Prem

Nice-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

Accelerating PyTorch distributed fine-tuning with Intel technologies

an Interactive Tool for Datasets

Welcome Fireworks.ai on the Hub 🎆

How it really works

In the web site UI

From the client SDKs

from Python, using huggingface_hub

from JS using @huggingface/inference

From HTTP calls

Billing

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.