Automate Machine Learning Deployment with GitHub Actions

Faster Time to Market and Increase Efficiency

Within the previous article, we learned about using continuous integration to securely and efficiently merge a latest machine-learning model into the primary branch.

Nevertheless, once the model is within the primary branch, how can we deploy it into production?

Counting on an engineer to deploy the model so can have some drawbacks, similar to:

Slowing down the discharge process
Consuming precious engineering time that could possibly be used for other tasks

These problems change into more pronounced if the model undergoes frequent updates.

Wouldn’t or not it’s nice if the model is routinely deployed into production each time a latest model is pushed to the primary branch? That’s when continuous integration turns out to be useful.

Continuous deployment (CD) is the practice of routinely deploying software changes to production after they pass a series of automated tests. In a machine learning project, continuous deployment can offer several advantages:

Faster time-to-market: Continuous deployment reduces the time needed to release latest machine learning models to production.
Increased efficiency: Automating the deployment process reduces the resources required to deploy machine learning models to production.

This text will show you find out how to create a CD pipeline for a machine-learning project.

Be at liberty to play and fork the source code of this text here:

Before constructing a CD pipeline, let’s discover the workflow for the pipeline:

After a series of tests, a latest machine-learning model is merged into the primary branch
A CD pipeline is triggered and a latest model is deployed into production

To construct a CD pipeline, we are going to perform the next steps:

Save model object and model metadata
Serve the model locally
Upload the model to a distant storage
Arrange a platform to deploy your model
Create a GitHub workflow to deploy models into production

Let’s explore each of those steps intimately.

Save model

We are going to use MLEM, an open-source tool, to save lots of and deploy the model.

To avoid wasting an experiment’s model using MLEM, begin by calling its save method.

from mlem.api import save
...# as an alternative of joblib.dump(model, "model/svm")
save(model, "model/svm", sample_data=X_train)

Full script.

Running this script will create two files: a model file and a metadata file.

The metadata file captures various information from a model object, including:

Model artifacts similar to the model’s size and hash value, that are useful for versioning
Model methods similar topredict and predict_proba
Input data schema
Python requirements used to coach the model

artifacts:
data:
hash: ba0c50b412f6b5d5c5bd6c0ef163b1a1
size: 148163
uri: svm
call_orders:
predict:
- - model
- predict
object_type: model
processors:
model:
methods:
predict:
args:
- name: X
type_:
columns:
- ''
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- ...
dtypes:
- int64
- float64
- float64
- float64
- float64
- ...
index_cols:
- ''
type: dataframe
name: predict
returns:
dtype: int64
shape:
- null
type: ndarray
varkw: predict_params
type: sklearn_pipeline
requirements:
- module: numpy
version: 1.24.2
- module: pandas
version: 1.5.3
- module: sklearn
package_name: scikit-learn
version: 1.2.2

View the metadata file.

Serve the model locally

Let’s check out the model by serving it locally. To launch a FastAPI model server locally, simply run:

mlem serve fastapi --model model/svm

Go to http://0.0.0.0:8080 to view the model. Click “Try it out” to check out the model on a sample dataset.

Push the model to a distant storage

By pushing the model to distant storage, we will store our models and data in a centralized location that will be accessed by the GitHub workflow.

We are going to use DVC for model management since it offers the next advantages:

Version control: DVC enables keeping track of changes to models and data over time, making it easy to revert to previous versions.
Storage: DVC can store models and data in several types of storage systems, similar to Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage.
Reproducibility: By versioning data and models, experiments will be easily reproduced with the very same data and model versions.

To integrate DVC with MLEM, we will use DVC pipeline. With the DVC pipeline, we will specify the command, dependencies, and parameters needed to create certain outputs within the dvc.yaml file.

stages:
train:
cmd: python src/train.py
deps:
- data/intermediate
- src/train.py
params:
- data
- model
- train
outs:
- model/svm
- model/svm.mlem:
cache: false

View the total file.

In the instance above, we specify the outputs to be the files model/svm and model/svm.mlem under the outs field. Specifically,

The model/svm is cached, so it is going to be uploaded to a DVC distant storage, but not committed to Git. This ensures that giant binary files don’t decelerate the performance of the repository.
The mode/svm.mlem will not be cached, so it won’t be uploaded to a DVC distant storage but shall be committed to Git. This enables us to trace changes within the model while still keeping the repository’s size small.

To run the pipeline, type the next command in your terminal:

$ dvc exp runRunning stage 'train':                                                                                                                          
> python src/train.py

Next, specify the distant storage location where the model shall be uploaded to within the file .dvc/config :

['remote "read"']
url = https://winequality-red.s3.amazonaws.com/
['remote "read-write"']
url = s3://your-s3-bucket/

To push the modified files to the distant storage location named “read-write”, simply run:

dvc push -r read-write

Arrange a platform to deploy your model

Next, let’s work out a platform to deploy our model. MLEM supports deploying your model to the next platforms:

Docker
Heroku
Fly.io
Kubernetes
Sagemaker

This project chooses Fly.io as a deployment platform because it’s easy and low cost to start.

To create applications on Fly.io in a GitHub workflow, you’ll need an access token. Here’s how you’ll be able to get one:

Enroll for a Fly.io account (you’ll need to offer a bank card, but they won’t charge you until you exceed free limits).
Log in and click on “Access Tokens” under the “Account” button in the highest right corner.
Create a latest access token and duplicate it for later use.

Create a GitHub workflow

Now it involves the exciting part: Making a GitHub workflow to deploy your model! In case you usually are not accustomed to GitHub workflow, I like to recommend reading this text for a fast overview.

We are going to create the workflow called publish-model within the file .github/workflows/publish.yaml :

Here’s what the file looks like:

name: publish-modelon:
push:
branches:
- primary
paths:
- model/svm.mlem
jobs:
publish-model:
runs-on: ubuntu-latest
steps:
- name: Checkout 
uses: actions/checkout@v2
- name: Environment setup
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: pip install -r requirements.txt
- name: Download model
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: dvc pull model/svm -r read-write
- name: Setup flyctl
uses: superfly/flyctl-actions/setup-flyctl@master
- name: Deploy model
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
run: mlem deployment run flyio svm-app --model model/svm

The on field specifies that the pipeline is triggered on a push event to the primary branch.

The publish-model job includes the next steps:

Testing the code
Establishing the Python environment
Installing dependencies
Pulling a model from a distant storage location using DVC
Establishing flyctl to make use of Fly.io
Deploying the model to Fly.io

Note that for the job to operate properly, it requires the next:

AWS credentials to tug the model
Fly.io’s access token to deploy the model

To make sure the secure storage of sensitive information in our repository and enable GitHub Actions to access them, we are going to use encrypted secrets.

To create encrypted secrets, click “Settings” -> “Actions” -> “Recent repository secret.”

That’s it! Now let’s check out this project and see if it really works as expected.

Setup

To check out this project, start with making a latest repository using the project template.

Clone the brand new repository to your local machine:

git clone https://github.com/your-username/cicd-mlops-demo

Arrange the environment:

# Go to the project directory
cd cicd-mlops-demo# Create a latest branch
git checkout -b experiment
# Install dependencies
pip install -r requirements.txt

Pull data from the distant storage location called “read”:

dvc pull -r read

Create a latest model

svm_kernel is a listing of values used to check the kernel hyperparameter while tuning the SVM model. To generate a latest model, add rbf to svm__kernel within the params.yaml file.

Run a latest experiment with the change:

dvc exp run

Push the modified model to distant storage called “read-write”:

dvc push -r read-write

Add, commit, and push changes to the repository within the “experiment” branch:

git add .
git commit -m 'change svm kernel'
git push origin experiment

Create a pull request

Next, create a pull request by clicking the Contribute button.

After making a pull request within the repository, a GitHub workflow shall be triggered to run tests on the code and model.

In any case the tests have passed, click “Merge pull request.”

Deploy the model

Once the changes are merged, a CD pipeline shall be triggered to deploy the ML model.

To view the workflow run, click the workflow then click the publish-model job.

Click the link under the “Deploy model” step to view the web site to which the model is deployed.

Here’s what the web site looks like:

View the web site.

Congratulations! You’ve just learned find out how to create a CD pipeline to automate your machine-learning workflows. Combining CD with CI will allow your corporations to catch errors early, reduce costs, and reduce time-to-market.

Automate Machine Learning Deployment with GitHub Actions

Faster Time to Market and Increase Efficiency

Save model

Serve the model locally

Push the model to a distant storage

Arrange a platform to deploy your model

Create a GitHub workflow

Setup

Create a latest model

Create a pull request

Deploy the model

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

Automate Machine Learning Deployment with GitHub Actions

Faster Time to Market and Increase Efficiency

Save model

Serve the model locally

Push the model to a distant storage

Arrange a platform to deploy your model

Create a GitHub workflow

Setup

Create a latest model

Create a pull request

Deploy the model

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.