Automate Machine Learning Deployment with GitHub Actions Motivation What’s Continuous Deployment? CD Pipeline Overview Construct a CD Pipeline Try it Out Conclusion

Faster Time to Market and Increase Efficiency

Within the previous article, we learned about using continuous integration to soundly and efficiently merge a recent machine-learning model into the principal branch.

Nevertheless, once the model is within the principal branch, how can we deploy it into production?

Counting on an engineer to deploy the model so can have some drawbacks, equivalent to:

Slowing down the discharge process
Consuming priceless engineering time that may very well be used for other tasks

These problems turn into more pronounced if the model undergoes frequent updates.

Wouldn’t or not it’s nice if the model is deployed into production each time a recent model is pushed to the principal branch? That’s when continuous integration is useful.

Continuous deployment (CD) is the practice of robotically deploying software changes to production after they pass a series of automated tests. In a machine learning project, continuous deployment can offer several advantages:

: Continuous deployment reduces the time needed to release recent machine learning models to production.
: Automating the deployment process reduces the resources required to deploy machine learning models to production.

This text will show you the right way to create a CD pipeline for a machine-learning project.

Be happy to play and fork the source code of this text here:

Before constructing a CD pipeline, let’s discover the workflow for the pipeline:

After a series of tests, a recent machine-learning model is merged into the principal branch
A CD pipeline is triggered and a recent model is deployed into production

To construct a CD pipeline, we are going to perform the next steps:

Save model object and model metadata
Serve the model locally
Upload the model to a distant storage
Arrange a platform to deploy your model
Create a GitHub workflow to deploy models into production

Let’s explore each of those steps intimately.

Save model

We’ll use MLEM, an open-source tool, to avoid wasting and deploy the model.

To save lots of an experiment’s model using MLEM, begin by calling its save method.

from mlem.api import save
...# as a substitute of joblib.dump(model, "model/svm")
save(model, "model/svm", sample_data=X_train)

Full script.

Running this script will create two files: a model file and a metadata file.

The metadata file captures various information from a model object, including:

Model artifacts equivalent to the model’s size and hash value, that are useful for versioning
Model methods equivalent topredict and predict_proba
Input data schema
Python requirements used to coach the model

artifacts:
data:
hash: ba0c50b412f6b5d5c5bd6c0ef163b1a1
size: 148163
uri: svm
call_orders:
predict:
- - model
- predict
object_type: model
processors:
model:
methods:
predict:
args:
- name: X
type_:
columns:
- ''
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- ...
dtypes:
- int64
- float64
- float64
- float64
- float64
- ...
index_cols:
- ''
type: dataframe
name: predict
returns:
dtype: int64
shape:
- null
type: ndarray
varkw: predict_params
type: sklearn_pipeline
requirements:
- module: numpy
version: 1.24.2
- module: pandas
version: 1.5.3
- module: sklearn
package_name: scikit-learn
version: 1.2.2

View the metadata file.

Serve the model locally

Let’s check out the model by serving it locally. To launch a FastAPI model server locally, simply run:

mlem serve fastapi --model model/svm

Go to http://0.0.0.0:8080 to view the model. Click “Try it out” to check out the model on a sample dataset.

Push the model to a distant storage

By pushing the model to distant storage, we will store our models and data in a centralized location that may be accessed by the GitHub workflow.

We’ll use DVC for model management since it offers the next advantages:

: DVC enables keeping track of changes to models and data over time, making it easy to revert to previous versions.
: DVC can store models and data in various kinds of storage systems, equivalent to Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage.
: By versioning data and models, experiments may be easily reproduced with the very same data and model versions.

To integrate DVC with MLEM, we will use DVC pipeline. With the DVC pipeline, we will specify the command, dependencies, and parameters needed to create certain outputs within the dvc.yaml file.

stages:
train:
cmd: python src/train.py
deps:
- data/intermediate
- src/train.py
params:
- data
- model
- train
outs:
- model/svm
- model/svm.mlem:
cache: false

View the total file.

In the instance above, we specify the outputs to be the files model/svm and model/svm.mlem under the outs field. Specifically,

The model/svm is cached, so it is going to be uploaded to a DVC distant storage, but not committed to Git. This ensures that giant binary files don’t decelerate the performance of the repository.
The mode/svm.mlem just isn’t cached, so it won’t be uploaded to a DVC distant storage but shall be committed to Git. This enables us to trace changes within the model while still keeping the repository’s size small.

To run the pipeline, type the next command in your terminal:

$ dvc exp runRunning stage 'train':                                                                                                                          
> python src/train.py

Next, specify the distant storage location where the model shall be uploaded to within the file .dvc/config :

['remote "read"']
url = https://winequality-red.s3.amazonaws.com/
['remote "read-write"']
url = s3://your-s3-bucket/

To push the modified files to the distant storage location named “read-write”, simply run:

dvc push -r read-write

Arrange a platform to deploy your model

Next, let’s determine a platform to deploy our model. MLEM supports deploying your model to the next platforms:

Docker
Heroku
Fly.io
Kubernetes
Sagemaker

This project chooses Fly.io as a deployment platform because it’s easy and low cost to start.

To create applications on Fly.io in a GitHub workflow, you’ll need an access token. Here’s how you possibly can get one:

Enroll for a Fly.io account (you’ll need to supply a bank card, but they won’t charge you until you exceed free limits).
Log in and click on “Access Tokens” under the “Account” button in the highest right corner.
Create a recent access token and replica it for later use.

Create a GitHub workflow

Now it involves the exciting part: Making a GitHub workflow to deploy your model! Should you aren’t accustomed to GitHub workflow, I like to recommend reading this text for a fast overview.

We’ll create the workflow called publish-model within the file .github/workflows/publish.yaml :

Here’s what the file looks like:

name: publish-modelon:
push:
branches:
- principal
paths:
- model/svm.mlem
jobs:
publish-model:
runs-on: ubuntu-latest
steps:
- name: Checkout 
uses: actions/checkout@v2
- name: Environment setup
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: pip install -r requirements.txt
- name: Download model
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: dvc pull model/svm -r read-write
- name: Setup flyctl
uses: superfly/flyctl-actions/setup-flyctl@master
- name: Deploy model
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
run: mlem deployment run flyio svm-app --model model/svm

The on field specifies that the pipeline is triggered on a push event to the principal branch.

The publish-model job includes the next steps:

Trying out the code
Establishing the Python environment
Installing dependencies
Pulling a model from a distant storage location using DVC
Establishing flyctl to make use of Fly.io
Deploying the model to Fly.io

Note that for the job to operate properly, it requires the next:

AWS credentials to tug the model
Fly.io’s access token to deploy the model

To make sure the secure storage of sensitive information in our repository and enable GitHub Actions to access them, we are going to use encrypted secrets.

To create encrypted secrets, click “Settings” -> “Actions” -> “Latest repository secret.”

That’s it! Now let’s check out this project and see if it really works as expected.

Setup

To check out this project, start with making a recent repository using the project template.

Clone the brand new repository to your local machine:

git clone https://github.com/your-username/cicd-mlops-demo

Arrange the environment:

# Go to the project directory
cd cicd-mlops-demo# Create a recent branch
git checkout -b experiment
# Install dependencies
pip install -r requirements.txt

Pull data from the distant storage location called “read”:

dvc pull -r read

Create a recent model

svm_kernel is an inventory of values used to check the kernel hyperparameter while tuning the SVM model. To generate a recent model, add rbf to svm__kernel within the params.yaml file.

Run a recent experiment with the change:

dvc exp run

Push the modified model to distant storage called “read-write”:

dvc push -r read-write

Add, commit, and push changes to the repository within the “experiment” branch:

git add .
git commit -m 'change svm kernel'
git push origin experiment

Create a pull request

Next, create a pull request by clicking the Contribute button.

After making a pull request within the repository, a GitHub workflow shall be triggered to run tests on the code and model.

In spite of everything the tests have passed, click “Merge pull request.”

Deploy the model

Once the changes are merged, a CD pipeline shall be triggered to deploy the ML model.

To view the workflow run, click the workflow then click the publish-model job.

Click the link under the “Deploy model” step to view the web site to which the model is deployed.

Here’s what the web site looks like:

View the web site.

Congratulations! You’ve just learned the right way to create a CD pipeline to automate your machine-learning workflows. Combining CD with CI will allow your firms to catch errors early, reduce costs, and reduce time-to-market.

Automate Machine Learning Deployment with GitHub Actions Motivation What’s Continuous Deployment? CD Pipeline Overview Construct a CD Pipeline Try it Out Conclusion

Faster Time to Market and Increase Efficiency

Save model

Serve the model locally

Push the model to a distant storage

Arrange a platform to deploy your model

Create a GitHub workflow

Setup

Create a recent model

Create a pull request

Deploy the model

What are your thoughts on this topic?
Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

NVIDIA Open Sources Audio2Face Animation Model

Blazingly fast whisper transcriptions with Inference Endpoints

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

AlphaFold: Five Years of Impact

Ascentra Labs raises $2 million to assist consultants use AI as an alternative of all-night Excel marathons

Automate Machine Learning Deployment with GitHub Actions Motivation What’s Continuous Deployment? CD Pipeline Overview Construct a CD Pipeline Try it Out Conclusion

Faster Time to Market and Increase Efficiency

Save model

Serve the model locally

Push the model to a distant storage

Arrange a platform to deploy your model

Create a GitHub workflow

Setup

Create a recent model

Create a pull request

Deploy the model

What are your thoughts on this topic? Let us know in the comments below.

1 COMMENT

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.