Deploy Hugging Face models easily with Amazon SageMaker

-


Philipp Schmid's avatar

hugging-face-and-aws-logo

Earlier this 12 months we announced a strategic collaboration with Amazon to make it easier for firms to make use of Hugging Face in Amazon SageMaker, and ship cutting-edge Machine Learning features faster. We introduced recent Hugging Face Deep Learning Containers (DLCs) to train Hugging Face Transformer models in Amazon SageMaker.

Today, we’re excited to share a brand new inference solution with you that makes it easier than ever to deploy Hugging Face Transformers with Amazon SageMaker! With the brand new Hugging Face Inference DLCs, you possibly can deploy your trained models for inference with just yet another line of code, or select any of the ten,000+ publicly available models from the Model Hub, and deploy them with Amazon SageMaker.

Deploying models in SageMaker provides you with production-ready endpoints that scale easily inside your AWS environment, with built-in monitoring and a ton of enterprise features. It has been an incredible collaboration and we hope you’ll benefit from it!

Here’s the right way to use the brand new SageMaker Hugging Face Inference Toolkit to deploy Transformers-based models:

from sagemaker.huggingface import HuggingFaceModel


huggingface_model = HuggingFaceModel(...).deploy()

That is it! 🚀

To learn more about accessing and using the brand new Hugging Face DLCs with the Amazon SageMaker Python SDK, try the guides and resources below.




Resources, Documentation & Samples 📄

Below yow will discover all of the essential resources for deploying your models to Amazon SageMaker.



Blog/Video



Samples/Documentation




SageMaker Hugging Face Inference Toolkit ⚙️

Along with the Hugging Face Transformers-optimized Deep Learning Containers for inference, we have now created a brand new Inference Toolkit for Amazon SageMaker. This recent Inference Toolkit leverages the pipelines from the transformers library to permit zero-code deployments of models without writing any code for pre- or post-processing. Within the “Getting Began” section below you discover two examples of the right way to deploy your models to Amazon SageMaker.

Along with the zero-code deployment, the Inference Toolkit supports “bring your personal code” methods, where you possibly can override the default methods. You possibly can learn more about “bring your personal code” within the documentation here or you possibly can try the sample notebook “deploy custom inference code to Amazon SageMaker”.



API – Inference Toolkit Description

Using the transformers pipelines, we designed an API, which makes it easy for you to profit from all pipelines features. The API has the same interface than the 🤗 Accelerated Inference API, meaning your inputs have to be defined within the inputs key and should you want additional supported pipelines parameters you possibly can add them within the parameters key. Below yow will discover examples for requests.


{
    "inputs": "Camera - You're awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery inside 28 days."
}

{
    "inputs": {
        "query": "What's used for inference?",
        "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}

{
    "inputs": "Hi, I recently bought a tool from your organization nevertheless it is just not working as advertised and I would love to get reimbursed!",
    "parameters": {
        "candidate_labels": [
            "refund",
            "legal",
            "faq"
        ]
    }
}



Getting began 🧭

On this guide we are going to use the brand new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy two transformer models for inference.

In the primary example, we deploy for inference a Hugging Face Transformer model trained in Amazon SageMaker.

Within the second example, we directly deploy one in every of the ten,000+ publicly available Hugging Face Transformers models from the Model Hub to Amazon SageMaker for Inference.



Organising the environment

We’ll use an Amazon SageMaker Notebook Instance for the instance. You possibly can learn here the right way to arrange a Notebook Instance. To start, jump into your Jupyter Notebook or JupyterLab and create a brand new Notebook with the conda_pytorch_p36 kernel.

Note: Using Jupyter is optional: We could also launch SageMaker API calls from anywhere we have now an SDK installed, connectivity to the cloud, and appropriate permissions, akin to a Laptop, one other IDE, or a task scheduler like Airflow or AWS Step Functions.

After that we will install the required dependencies.

pip install "sagemaker>=2.48.0" --upgrade

To deploy a model on SageMaker, we want to create a sagemaker Session and supply an IAM role with the precise permission. The get_execution_role method is provided by the SageMaker SDK as an optional convenience. You may as well specify the role by writing the particular role ARN you would like your endpoint to make use of. This IAM role will likely be later attached to the Endpoint, e.g. download the model from Amazon S3.

import sagemaker

sess = sagemaker.Session()
role = sagemaker.get_execution_role()



Deploy a trained Hugging Face Transformer model to SageMaker for inference

There are two ways to deploy your SageMaker trained Hugging Face model. You possibly can either deploy it after your training is finished, or you possibly can deploy it later, using the model_data pointing to your saved model on Amazon S3. Along with the 2 below-mentioned options, you can even instantiate Hugging Face endpoints with lower-level SDK akin to boto3 and AWS CLI, Terraform and with CloudFormation templates.



Deploy the model directly after training with the Estimator class

When you deploy your model directly after training, that you must be sure that all required model artifacts are saved in your training script, including the tokenizer and the model. A good thing about deploying directly after training is that SageMaker model container metadata will contain the source training job, providing lineage from training job to deployed model.

from sagemaker.huggingface import HuggingFace




huggingface_estimator = HuggingFace(....)


huggingface_estimator.fit(...)




predictor = hf_estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")


data = {
   "inputs": "Camera - You're awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery inside 28 days."
}

predictor.predict(data)

After we run our request we will delete the endpoint again with.


predictor.delete_endpoint()



Deploy the model from pre-trained checkpoints using the HuggingFaceModel class

When you’ve already trained your model and wish to deploy it at some later time, you should use the model_data argument to specify the situation of your tokenizer and model weights.

from sagemaker.huggingface.model import HuggingFaceModel


huggingface_model = HuggingFaceModel(
   model_data="s3://models/my-bert-model/model.tar.gz",  
   role=role, 
   transformers_version="4.6", 
   pytorch_version="1.7", 
)

predictor = huggingface_model.deploy(
   initial_instance_count=1, 
   instance_type="ml.m5.xlarge"
)


data = {
   "inputs": "Camera - You're awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery inside 28 days."
}


predictor.predict(data)

After we run our request, we will delete the endpoint again with:


predictor.delete_endpoint()



Deploy one in every of the ten,000+ Hugging Face Transformers to Amazon SageMaker for Inference

To deploy a model directly from the Hugging Face Model Hub to Amazon SageMaker, we want to define two environment variables when creating the HuggingFaceModel. We want to define:

  • HF_MODEL_ID: defines the model id, which will likely be robotically loaded from huggingface.co/models when creating or SageMaker Endpoint. The 🤗 Hub provides 10,000+ models all available through this environment variable.
  • HF_TASK: defines the duty for the used 🤗 Transformers pipeline. A full list of tasks could be found here.
from sagemaker.huggingface.model import HuggingFaceModel


hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', 
  'HF_TASK':'question-answering' 
}


huggingface_model = HuggingFaceModel(
   env=hub, 
   role=role, 
   transformers_version="4.6", 
   pytorch_version="1.7", 
)


predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)


data = {
"inputs": {
    "query": "What's used for inference?",
    "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}


predictor.predict(data)

After we run our request we will delete the endpoint again with.


predictor.delete_endpoint()



FAQ 🎯

Yow will discover the whole Ceaselessly Asked Questions within the documentation.

Q: Which models can I deploy for Inference?

A: You possibly can deploy:

  • any 🤗 Transformers model trained in Amazon SageMaker, or other compatible platforms and that may accommodate the SageMaker Hosting design
  • any of the ten,000+ publicly available Transformer models from the Hugging Face Model Hub, or
  • your private models hosted in your Hugging Face premium account!

Q: Which pipelines, tasks are supported by the Inference Toolkit?

A: The Inference Toolkit and DLC support any of the transformers pipelines. Yow will discover the complete list here

Q: Do I actually have to make use of the transformers pipelines when hosting SageMaker endpoints?

A: No, you can even write your custom inference code to serve your personal models and logic, documented here.

Q: Do I actually have to make use of the SageMaker Python SDK to make use of the Hugging Face Deep Learning Containers (DLCs)?

A: You should use the Hugging Face DLC without the SageMaker Python SDK and deploy your models to SageMaker with other SDKs, akin to the AWS CLI, boto3 or Cloudformation. The DLCs are also available through Amazon ECR and could be pulled and utilized in any environment of alternative.

Q: Why should I exploit the Hugging Face Deep Learning Containers?

A: The DLCs are fully tested, maintained, optimized deep learning environments that require no installation, configuration, or maintenance. Specifically, our inference DLC comes with a pre-written serving stack, which drastically lowers the technical bar of DL serving.

Q: How is my data and code secured by Amazon SageMaker?

A: Amazon SageMaker provides quite a few security mechanisms including encryption at rest and in transit, Virtual Private Cloud (VPC) connectivity, and Identity and Access Management (IAM). To learn more about security within the AWS cloud and with Amazon SageMaker, you possibly can visit Security in Amazon SageMaker and AWS Cloud Security.

Q: Is that this available in my region?

A: For a listing of the supported regions, please visit the AWS region table for all AWS global infrastructure.

Q: Do you offer premium support or support SLAs for this solution?

A: AWS Technical Support tiers can be found from AWS and canopy development and production issues for AWS services – please check with AWS Support for specifics and scope.

If you might have questions which the Hugging Face community may also help answer and/or profit from, please post them within the Hugging Face forum.


When you need premium support from the Hugging Face team to speed up your NLP roadmap, our Expert Acceleration Program offers direct guidance from our open-source, science, and ML Engineering teams.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x