Groq on Hugging Face Inference Providers 🔥

We’re thrilled to share that Groq is now a supported Inference Provider on the Hugging Face Hub!
Groq joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for each JS and Python), making it super easy to make use of a wide range of models together with your preferred providers.

Groq supports a wide range of text and conversational models, including the newest open-source models akin to Meta’s Llama 4, Qwen’s QWQ-32B, and plenty of more.

At the center of Groq’s technology is the Language Processing Unit (LPU™), a brand new style of end-to-end processing unit system that gives the fastest inference for computationally intensive applications with a sequential component, akin to Large Language Models (LLMs). LPUs are designed to beat the constraints of GPUs for inference, offering significantly lower latency and better throughput. This makes them ideal for real-time AI applications.

Groq offers fast AI inference for openly-available models. They supply an API that enables developers to simply integrate these models into their applications. It offers an on-demand, pay-as-you-go model for accessing a wide selection of openly-available LLMs.

You’ll be able to now use Groq’s Inference API as an Inference Provider on Huggingface. We’re quite excited to see what you will construct with this latest provider.

Read more about the right way to use Groq as Inference Provider in its dedicated documentation page.

See the list of supported models here.

How it really works

In the web site UI

In your user account settings, you might be in a position to:

Set your individual API keys for the providers you’ve signed up with. If no custom key’s set, your requests shall be routed through HF.
Order providers by preference. This is applicable to the widget and code snippets within the model pages.

Inference Providers

As mentioned, there are two modes when calling Inference Providers:

Custom key (calls go on to the inference provider, using your individual API key of the corresponding inference provider)
Routed by HF (in that case, you do not need a token from the provider, and the fees are applied on to your HF account relatively than the provider’s account)

Inference Providers

Model pages showcase third-party inference providers (those which might be compatible with the present model, sorted by user preference)

From the client SDKs

from Python, using huggingface_hub

The next example shows the right way to use Meta’s Llama 4 using Groq because the inference provider. You should utilize a Hugging Face token for automatic routing through Hugging Face, or your individual Groq API key if you have got one.

Install huggingface_hub from source (see instructions). Official support shall be released soon in version v0.33.0.

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key=os.environ["HF_TOKEN"],
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=messages,
)

print(completion.decisions[0].message)

from JS using @huggingface/inference

import { InferenceClient } from "@huggingface/inference";

const client = latest InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
  model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
  messages: [
    {
      role: "user",
      content: "What is the capital of France?",
    },
  ],
  provider: "groq",
});

console.log(chatCompletion.decisions[0].message);

Billing

For direct requests, i.e. if you use the important thing from an inference provider, you might be billed by the corresponding provider. For example, if you happen to use a Groq API key you are billed in your Groq account.

For routed requests, i.e. if you authenticate via the Hugging Face Hub, you will only pay the usual provider API rates. There is not any additional markup from us, we just go through the provider costs directly. (In the longer term, we may establish revenue-sharing agreements with our provider partners.)

Necessary Note ‼️ PRO users get $2 price of Inference credits every month. You should utilize them across providers. 🔥

Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20x higher limits, and more.

We also provide free inference with a small quota for our signed-in free users, but please upgrade to PRO if you happen to can!

Feedback and next steps

We might like to get your feedback! Share your thoughts and/or comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49

Source link

Groq on Hugging Face Inference Providers 🔥