Getting Began with Hugging Face Inference Endpoints

Training machine learning models has turn into quite easy, especially with the rise of pre-trained models and transfer learning. OK, sometimes it isn’t that easy, but no less than, training models won’t ever break critical applications, and make customers unhappy about your quality of service. Deploying models, nonetheless… Yes, we have all been there.

Deploying models in production normally requires jumping through a series of hoops. Packaging your model in a container, provisioning the infrastructure, creating your prediction API, securing it, scaling it, monitoring it, and more. Let’s face it: constructing all this plumbing takes worthwhile time away from doing actual machine learning work. Unfortunately, it will possibly also go awfully fallacious.

We attempt to repair this problem with the newly launched Hugging Face Inference Endpoints. Within the spirit of constructing machine learning ever simpler without compromising on state-of-the-art quality, we have built a service that helps you to deploy machine learning models directly from the Hugging Face hub to managed infrastructure in your favorite cloud in only a couple of clicks. Easy, secure, and scalable: you possibly can have all of it.

Let me show you ways this works!

Deploying a model on Inference Endpoints

Taking a look at the list of tasks that Inference Endpoints support, I made a decision to deploy a Swin image classification model that I recently fine-tuned with AutoTrain on the food101 dataset. When you’re interested by how I built this model, this video will show you the entire process.

Ranging from my model page, I click on Deploy and choose Inference Endpoints.

This takes me on to the endpoint creation page.

I resolve to deploy the most recent revision of my model on a single GPU instance, hosted on AWS within the eu-west-1 region. Optionally, I could arrange autoscaling, and I could even deploy the model in a custom container.

Next, I would like to make a decision who can access my endpoint. From least secure to most secure, the three options are:

Public: the endpoint runs in a public Hugging Face subnet, and anyone on the Web can access it with none authentication. Think twice before choosing this!
Protected: the endpoint runs in a public Hugging Face subnet, and anyone on the Web with the suitable organization token can access it.
Private: the endpoint runs in a non-public Hugging Face subnet. It is not accessible on the Web. It’s only available in your AWS account through a VPC Endpoint created with AWS PrivateLink. You may control which VPC and subnet(s) in your AWS account have access to the endpoint.

Let’s first deploy a protected endpoint, after which we’ll deploy a non-public one.

Deploying a Protected Inference Endpoint

I simply select Protected and click on on Create Endpoint.

After a couple of minutes, the endpoint is up and running, and its URL is visible.

I can immediately test it by uploading an image within the inference widget.

After all, I can even invoke the endpoint directly with a couple of lines of Python code, and I authenticate with my Hugging Face API token (you will find yours in your account settings on the hub).

import requests, json

API_URL = "https://oncm9ojdmjwesag2.eu-west-1.aws.endpoints.huggingface.cloud"

headers = {
  "Authorization": "Bearer MY_API_TOKEN",
  "Content-Type": "image/jpg"
}

def query(filename):
    with open(filename, "rb") as f:
        data = f.read()
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

output = query("food.jpg")

As you’d expect, the expected result’s equivalent.

[{'score': 0.9998438358306885,    'label': 'hummus'},
 {'score': 6.674625183222815e-05, 'label': 'falafel'}, 
 {'score': 6.490697160188574e-06, 'label': 'escargots'}, 
 {'score': 5.776922080258373e-06, 'label': 'deviled_eggs'}, 
 {'score': 5.492902801051969e-06, 'label': 'shrimp_and_grits'}]

Moving to the Analytics tab, I can see endpoint metrics. A few of my requests failed because I deliberately omitted the Content-Type header.

For added details, I can check the complete logs within the Logs tab.

5c7fbb4485cd8w7 2022-10-10T08:19:04.915Z 2022-10-10 08:19:04,915 | INFO | POST / | Duration: 142.76 ms
5c7fbb4485cd8w7 2022-10-10T08:19:05.860Z 2022-10-10 08:19:05,860 | INFO | POST / | Duration: 148.06 ms
5c7fbb4485cd8w7 2022-10-10T09:21:39.251Z 2022-10-10 09:21:39,250 | ERROR | Content type "None" not supported. Supported content types are: application/json, text/csv, text/plain, image/png, image/jpeg, image/jpg, image/tiff, image/bmp, image/gif, image/webp, image/x-image, audio/x-flac, audio/flac, audio/mpeg, audio/wave, audio/wav, audio/x-wav, audio/ogg, audio/x-audio, audio/webm, audio/webm;codecs=opus
5c7fbb4485cd8w7 2022-10-10T09:21:44.114Z 2022-10-10 09:21:44,114 | ERROR | Content type "None" not supported. Supported content types are: application/json, text/csv, text/plain, image/png, image/jpeg, image/jpg, image/tiff, image/bmp, image/gif, image/webp, image/x-image, audio/x-flac, audio/flac, audio/mpeg, audio/wave, audio/wav, audio/x-wav, audio/ogg, audio/x-audio, audio/webm, audio/webm;codecs=opus

Now, let’s increase our security level and deploy a non-public endpoint.

Deploying a Private Inference Endpoint

Repeating the steps above, I choose Private this time.

This opens a brand new box asking me for the identifier of the AWS account by which the endpoint might be visible. I enter the suitable ID and click on on Create Endpoint.

Undecided about your AWS account id? Here’s an AWS CLI one-liner for you: aws sts get-caller-identity --query Account --output text

After a couple of minutes, the Inference Endpoints user interface displays the name of the VPC service name. Mine is com.amazonaws.vpce.eu-west-1.vpce-svc-07a49a19a427abad7.

Next, I open the AWS console and go to the VPC Endpoints page. Then, I click on Create endpoint to create a VPC endpoint, which can enable my AWS account to access my Inference Endpoint through AWS PrivateLink.

In a nutshell, I would like to fill within the name of the VPC service name displayed above, select the VPC and subnets(s) allowed to access the endpoint, and connect an appropriate Security Group. Nothing scary: I just follow the steps listed within the Inference Endpoints documentation.

Once I’ve created the VPC endpoint, my setup looks like this.

Returning to the Inference Endpoints user interface, the private endpoint runs a minute or two later. Let’s test it!

Launching an Amazon EC2 instance in one in every of the subnets allowed to access the VPC endpoint, I take advantage of the inference endpoint URL to predict my test image.

curl https://oncm9ojdmjwesag2.eu-west-1.aws.endpoints.huggingface.cloud 
-X POST --data-binary '@food.jpg' 
-H "Authorization: Bearer MY_API_TOKEN" 
-H "Content-Type: image/jpeg"

[{"score":0.9998466968536377,     "label":"hummus"},
 {"score":0.00006414744711946696, "label":"falafel"},
 {"score":6.4065129663504194e-6,  "label":"escargots"},
 {"score":5.819705165777123e-6,   "label":"deviled_eggs"},
 {"score":5.532585873879725e-6,   "label":"shrimp_and_grits"}]

That is all there may be to it. Once I’m done testing, I delete the endpoints that I’ve created to avoid unwanted charges. I also delete the VPC Endpoint within the AWS console.

Hugging Face customers are already using Inference Endpoints. For instance, Phamily, the #1 in-house chronic care management & proactive care platform, told us that Inference Endpoints helps them simplify and speed up HIPAA-compliant Transformer deployments.

Now it is your turn!

Due to Inference Endpoints, you possibly can deploy production-grade, scalable, secure endpoints in minutes, in only a couple of clicks. Why don’t you give it a try?

We’ve got loads of ideas to make the service even higher, and we might love to listen to your feedback within the Hugging Face forum.

Thanks for reading and rejoice with Inference Endpoints!

Source link

Getting Began with Hugging Face Inference Endpoints

Deploying a model on Inference Endpoints

Deploying a Protected Inference Endpoint

Deploying a Private Inference Endpoint

Now it is your turn!

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

🧨 Stable Diffusion in JAX / Flax !

MTEB: Massive Text Embedding Benchmark

From PyTorch DDP to Speed up to Trainer, mastery of distributed training with ease

TDS Newsletter: Vibe Coding Is Great. Until It’s Not.

Evaluating Language Model Bias with 🤗 Evaluate

Getting Began with Hugging Face Inference Endpoints

Deploying a model on Inference Endpoints

Deploying a Protected Inference Endpoint

Deploying a Private Inference Endpoint

Now it is your turn!

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.