Introducing HUGS – Scale your AI with Open Models

September 2025 Update: We now not offer HUGS model deployment containers.

To simply deploy optimized Hugging Face model in your infra, take a look at the Dell Enterprise Hub and the Hugging Face Collection in Azure AI Foundry.

Today, we’re thrilled to announce the launch of Hugging Face Generative AI Services a.k.a. HUGS: optimized, zero-configuration inference microservices designed to simplify and speed up the event of AI applications with open models. Built on open-source Hugging Face technologies reminiscent of Text Generation Inference and Transformers, HUGS provides one of the best solution to efficiently construct and scale Generative AI Applications in your personal infrastructure. HUGS is optimized to run open models on quite a lot of hardware accelerators, including NVIDIA GPUs, AMD GPUs, and shortly AWS Inferentia and Google TPUs.

Zero-Configuration Optimized Inference for Open Models

HUGS simplifies the optimized deployment of open models in your personal infrastructure and on a wide selection of hardware. One key challenge developers and organizations face is the engineering complexity of optimizing inference workloads for LLMs on a specific GPU or AI accelerator. With HUGS, we enable maximum throughput deployments for the most well-liked open LLMs with zero configuration required. Each deployment configuration offered by HUGS is fully tested and maintained to work out of the box.

HUGS model deployments provide an OpenAI compatible API for a drop-in alternative of existing Generative AI applications built on top of model provider APIs. Just point your code to the HUGS deployment to power your applications with open models hosted in your personal infrastructure.

Why HUGS?

HUGS offers a simple solution to construct AI applications with open models hosted in your personal infrastructure, with the next advantages:

In YOUR infrastructure: Deploy open models inside your personal secure environment. Keep your data and models off the Web!
Zero-configuration Deployment: HUGS reduces deployment time from weeks to minutes with zero-configuration setup, robotically optimizing the model and serving configuration on your NVIDIA, AMD GPU or AI accelerator.
Hardware-Optimized Inference: Built on Hugging Face’s Text Generation Inference (TGI), HUGS is optimized for peak performance across different hardware setups.
Hardware Flexibility: Run HUGS on quite a lot of accelerators, including NVIDIA GPUs, AMD GPUs, with support for AWS Inferentia and Google TPUs coming soon.
Model Flexibility: HUGS is compatible with a big range of open-source models, ensuring flexibility and selection on your AI applications.
Industry Standard APIs: Deploy HUGS easily using Kubernetes with endpoints compatible with the OpenAI API, minimizing code changes.
Enterprise Distribution: HUGS is an enterprise distribution of Hugging Face open source technologies, offering long-term support, rigorous testing, and SOC2 compliance.
Enterprise Compliance: Minimizes compliance risks by including obligatory licenses and terms of service.

We provided early access to HUGS to pick Enterprise Hub customers:

HUGS is a large timesaver to deploy locally ready-to-work models with good performances – before HUGS it could take us every week, now we might be done in lower than 1 hour. For patrons with sovereign AI requirements it is a game changer! – Henri Jouhaud, CTO at Polyconseil

We tried HUGS to deploy Gemma 2 on GCP using a L4 GPU – we didn’t must fiddle with libraries, versions and parameters, it just worked out of the box. HUGS gives us confidence we are able to scale our internal usage of open models! – Ghislain Putois, Research Engineer at Orange

The way it Works

Using HUGS is easy. Here’s how you’ll be able to start:

Note: You will have access to the suitable subscription or marketplace offering depending in your chosen deployment method.

Where to search out HUGS

HUGS is offered through several channels:

Cloud Service Provider (CSP) Marketplaces: Yow will discover and deploy HUGS on Amazon Web Services (AWS) and Google Cloud Platform (GCP). Microsoft Azure support will come soon.
DigitalOcean: HUGS is natively available inside DigitalOcean as a brand new 1-Click Models service, powered by Hugging Face HUGS and GPU Droplets.
Enterprise Hub: In case your organization is upgraded to Enterprise Hub, contact our Sales team to get access to HUGS.

For specific deployment instructions for every platform, please seek advice from the relevant documentation linked above.

Pricing

HUGS offers on-demand pricing based on the uptime of every container, apart from deployments on DigitalOcean.

AWS Marketplace and Google Cloud Platform Marketplace: $1 per hour per container, no minimum fee (compute usage billed individually by CSP). On AWS you may have 5 day free trial period so that you can test HUGS free of charge.
DigitalOcean: 1-Click Models powered by Hugging Face HUGS can be found at no additional cost on DigitalOcean – regular GPU Droplets compute costs apply.
Enterprise Hub: We provide custom HUGS access to Enterprise Hub organizations. Please contact our Sales team to learn more.

Running Inference

HUGS is predicated on Text Generation Inference (TGI), offering a seamless inference experience. For detailed instructions and examples, seek advice from the Run Inference on HUGS guide. HUGS leverages the OpenAI-compatible Messages API, allowing you to make use of familiar tools and libraries like cURL, the huggingface_hub SDK, and the openai SDK for sending requests.

from huggingface_hub import InferenceClient

ENDPOINT_URL="REPLACE" 

client = InferenceClient(base_url=ENDPOINT_URL, api_key="-")

chat_completion = client.chat.completions.create(
    messages=[
        {"role":"user","content":"What is Deep Learning?"},
    ],
    temperature=0.7,
    top_p=0.95,
    max_tokens=128,
)

Supported Models and Hardware

HUGS supports a growing ecosystem of open models and hardware platforms. Seek advice from our Supported Models and Supported Hardware pages for the most recent information.

We launch today with 13 popular open LLMs:

For an in depth view of supported Models x Hardware, take a look at the documentation.

Get Began with HUGS Today

HUGS makes it easy to harness the ability of open models, with zero-configuration optimized inference in your personal infra. With HUGS, you’ll be able to take control of your AI applications and simply transition proof of concept applications built with closed models to open models you host yourself.

Start today and deploy HUGS on AWS, Google Cloud or DigitalOcean!

Source link

Introducing HUGS – Scale your AI with Open Models

Zero-Configuration Optimized Inference for Open Models

Why HUGS?