Home Artificial Intelligence Deploy ML Models with AWS Lambda & Ephemeral Storage Prerequisites 1. ML Model 2. Lambda Function 3. Docker Image 4. Infrastructure Limitations and making the Solution scalable

Deploy ML Models with AWS Lambda & Ephemeral Storage Prerequisites 1. ML Model 2. Lambda Function 3. Docker Image 4. Infrastructure Limitations and making the Solution scalable

2
Deploy ML Models with AWS Lambda & Ephemeral Storage
Prerequisites
1. ML Model
2. Lambda Function
3. Docker Image
4. Infrastructure
Limitations and making the Solution scalable

Photo by Ryan Claus on Unsplash

So you might be a machine learning engineer and need a straightforward and potentially scalable approach to deploy your large machine learning model? On this post I’ll present you a comparatively straightforward solution that leverages Lambda’s recent feature of adding Ephemeral Storage.

Rest assured that you just won’t need to navigate the AWS console and manually click-birth all of the resources. As an alternative, we’ll utilize each the AWS command line interface (CLI) and a modicum of infrastructure-as-code for the AWS cloud development kit (CDK). The latter makes it possible to define our services and their relationships without the necessity to manually craft complex CloudFormation Templates.

The infrastructure looks pretty modest:

Historically, Lambda hasn’t been the go-to option for ML-Ops because of its limitation in storing large models. Nonetheless, this particular constraint has since been alleviated, as Lambda now allows for as much as 10GB ephemeral storage (/tmp memory) that we are able to leverage for downloading and caching our model. Furthermore, we are able to easily expose the Lambda by attaching a Function URL that acts as an API.

The deployment comprises following steps:

Let’s get hands-on with a deep learning toy example. We’ll construct and deploy a named entity recognizer (NER) using Flair. Flair provides an easy interface for using state-of-art models from Huggingface to unravel a wide range of NLP tasks.

It is advisable have following tools and frameworks installed:

Establishing the Environment

Clone the repository, install the dependencies:

git clone https://github.com/as-ideas/deep-lambda.git
cd deep-lambda
pip install -r requirements.txt

The repository structure is as follows:

deep-lambda/
|-- app.py
|-- tagger.py
|-- infrastructure/
|-- ...

We only need two Python files, tagger.py, which accommodates the deep learning code and app.py which defines the AWS lambda function. The CDK code for our infrastructure is defined in infrastructure/lambda_stack.py.

First off, lets write a straightforward NER tagger using Flair:

deep-lambda/tagger.py

Running the above code downloads a pre-trained NER model from Huggingface, saves it to /tmp/my_ner_tagger.pt and outputs the next result:

python tagger.py

Span [1,2]: "George Washington" [− Labels: PER (0.9985)]
Span [5]: "Washington" [− Labels: LOC (0.9706)]

Easy enough. Any longer let’s pretend that the saved model is a few customized NER model that we would like to deploy (here you may read how one can do fine-tune a pre-trained model with Flair).

With a view to deploy our model we want it to reside inside the AWS cloud. Subsequently, we proceed by uploading the model to a S3 bucket via the command line interface (replace the region by your individual):

aws s3api create-bucket --bucket deep-lambda --region eu-central-1 --create-bucket-configuration LocationConstraint=eu-central-1
aws s3 cp /tmp/my_ner_tagger.pt s3://deep-lambda-2/ --region eu-central-1

With a view to initiate the code by providing an AWS Lambda Function Stack it’s required to define a lambda handler that serves as an entrypoint for the function. The lambda handler is accountable for receiving an event and generating a corresponding response. Our tagging logic is invoked by the handler, which necessitates the prior loading of the model. We solve this by downloading and caching the model on the module level, leveraging lambda’s ephemeral storage on /tmp:

deep-lambda/app.py

Note that the model is downloaded to /tmp which by default is mounted to the ephemeral storage, so it is feasible to eat models as large as several GB.

Our Lambda might want to access some dependencies (e.g. Flair) that we are going to bake right into a docker image.

An easy Dockerfile may be construct from a Python lambda base image:

FROM public.ecr.aws/lambda/python:3.8 as base

FROM base

COPY requirements.txt .
RUN pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
COPY app.py "${LAMBDA_TASK_ROOT}"
COPY tagger.py "${LAMBDA_TASK_ROOT}"

ENV PYTHONPATH="${LAMBDA_TASK_ROOT}"

CMD ["app.lambda_handler"]

That’s pretty standard — we just must make certain that every one the files are copied accurately to the Lambda-native location LAMBDA_TASK_ROOT, which is the given working directory. The trail often resolves to /var/task within the image.

Let’s construct and tag the docker image (replace the 12-digit code along with your own AWS ID):

docker construct -t deep-lambda .
docker tag deep-lambda:latest 012345678901.dkr.ecr.eu-central-1.amazonaws.com/deep-lambda:latest

We now upload the image to the elastic container registry (ECR) via the command line interface:

aws ecr create-repository --repository-name deep-lambda
aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 012345678901.dkr.ecr.eu-central-1.amazonaws.com
docker push 012345678901.dkr.ecr.eu-central-1.amazonaws.com/deep-lambda:latest

That’s it. We successfully uploaded each our model and code to the AWS ecosystem. Now it’s time to offer the cloud infrastructure via CDK!

Let’s write a straightforward CDK stack that defines the lambda function with access to the model bucket:

infrastructure/lambda_stack.py

It’s important to offer enough memory_size to the Lambda in addition to a big enough ephemeral_storage_size. Furthermore, we want to point the PYTORCH_TRANSFORMERS_CACHE directory to the /tmp directory to permit the Transformers library to cache the model tokenizers to the ephemeral storage.

Now we’re able to deploy our function:

cd infrastructure
pip install -r requirements.txt
cdk synth
cdk deploy deep-lambda-stack

That’s it, the lambda must be up and running. Let’s quickly test it out within the online console:

On the underside right yow will discover out the Function URL. Let’s use it to invoke our Lambda with a straightforward request via curl:

curl -X POST -H "Content-Type: text/plain" -d "I went to Latest York last easter." https://rrpj3itxliq4754rbwscjg7p3i0geucq.lambda-url.eu-central-1.on.aws/

Span[3:5]: "Latest York" → LOC (0.9988)

Great! You’ll be able to now fire more requests to the endpoint. It’s important to take note that the initial request may take a bit longer because of the cold start process, because the lambda initializes and retrieves the model from the bucket. Nonetheless, subsequent requests should execute swiftly.

Deploying a model is so simple as replacing the present model within the S3 bucket and restartig the lambda function, which may be done via console or CLI. Updating the code requires that you just push a latest image to the ECR using the AWS commands earlier after which deploy the updated image to the Lambda function.

The presented solution is sort of useful if you wish to quickly deploy your model for a showcase or testing, nevertheless it might be not robust enough to be utilized in a production system. I’ll address the largest limitations below:

Limitation: Manual deployment, Solution: Add CI/CD Pipeline

Consider adding an AWS Codepipeline that routinely triggers upon code changes, pushes latest images to the ECR and re-deploys your Lambda function.

Limitation: Security, Solution: Add API Gateway

If it is advisable to control your API exposure, i.e. restrict the utmost variety of requests or block certain IP addresses, you might need to connect an API Gateway to your Lambda.

Limitation: Scalability, Solution: Add Queues

If you wish to process large amounts of information, it will not be advisable to topen lenghy requests to the Lambda function. A more robust solution can be to attach the Function a command queue (SQS) and store the lead to one other queue or a S3 bucket. This approach is extremely scalable and simple to observe, making it the perfect selection for many ML use cases.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here