Home Artificial Intelligence Automate Machine Learning Deployment with GitHub Actions Motivation What’s Continuous Deployment? CD Pipeline Overview Construct a CD Pipeline Try it Out Conclusion

Automate Machine Learning Deployment with GitHub Actions Motivation What’s Continuous Deployment? CD Pipeline Overview Construct a CD Pipeline Try it Out Conclusion

1
Automate Machine Learning Deployment with GitHub Actions
Motivation
What’s Continuous Deployment?
CD Pipeline Overview
Construct a CD Pipeline
Try it Out
Conclusion

Within the previous article, we learned about using continuous integration to soundly and efficiently merge a recent machine-learning model into the principal branch.

Image by Writer

Nevertheless, once the model is within the principal branch, how can we deploy it into production?

Image by Writer

Counting on an engineer to deploy the model so can have some drawbacks, equivalent to:

  • Slowing down the discharge process
  • Consuming priceless engineering time that may very well be used for other tasks

These problems turn into more pronounced if the model undergoes frequent updates.

Image by Writer

Wouldn’t or not it’s nice if the model is deployed into production each time a recent model is pushed to the principal branch? That’s when continuous integration is useful.

Continuous deployment (CD) is the practice of robotically deploying software changes to production after they pass a series of automated tests. In a machine learning project, continuous deployment can offer several advantages:

  1. : Continuous deployment reduces the time needed to release recent machine learning models to production.
  2. : Automating the deployment process reduces the resources required to deploy machine learning models to production.

This text will show you the right way to create a CD pipeline for a machine-learning project.

Be happy to play and fork the source code of this text here:

Before constructing a CD pipeline, let’s discover the workflow for the pipeline:

  • After a series of tests, a recent machine-learning model is merged into the principal branch
  • A CD pipeline is triggered and a recent model is deployed into production
Image by Writer

To construct a CD pipeline, we are going to perform the next steps:

  1. Save model object and model metadata
  2. Serve the model locally
  3. Upload the model to a distant storage
  4. Arrange a platform to deploy your model
  5. Create a GitHub workflow to deploy models into production

Let’s explore each of those steps intimately.

Save model

We’ll use MLEM, an open-source tool, to avoid wasting and deploy the model.

To save lots of an experiment’s model using MLEM, begin by calling its save method.

from mlem.api import save
...

# as a substitute of joblib.dump(model, "model/svm")
save(model, "model/svm", sample_data=X_train)

Full script.

Running this script will create two files: a model file and a metadata file.

Image by Writer

The metadata file captures various information from a model object, including:

  • Model artifacts equivalent to the model’s size and hash value, that are useful for versioning
  • Model methods equivalent topredict and predict_proba
  • Input data schema
  • Python requirements used to coach the model
artifacts:
data:
hash: ba0c50b412f6b5d5c5bd6c0ef163b1a1
size: 148163
uri: svm
call_orders:
predict:
- - model
- predict
object_type: model
processors:
model:
methods:
predict:
args:
- name: X
type_:
columns:
- ''
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- ...
dtypes:
- int64
- float64
- float64
- float64
- float64
- ...
index_cols:
- ''
type: dataframe
name: predict
returns:
dtype: int64
shape:
- null
type: ndarray
varkw: predict_params
type: sklearn_pipeline
requirements:
- module: numpy
version: 1.24.2
- module: pandas
version: 1.5.3
- module: sklearn
package_name: scikit-learn
version: 1.2.2

View the metadata file.

Serve the model locally

Let’s check out the model by serving it locally. To launch a FastAPI model server locally, simply run:

mlem serve fastapi --model model/svm

Go to http://0.0.0.0:8080 to view the model. Click “Try it out” to check out the model on a sample dataset.

Image by Writer

Push the model to a distant storage

By pushing the model to distant storage, we will store our models and data in a centralized location that may be accessed by the GitHub workflow.

Image by Writer

We’ll use DVC for model management since it offers the next advantages:

  1. : DVC enables keeping track of changes to models and data over time, making it easy to revert to previous versions.
  2. : DVC can store models and data in various kinds of storage systems, equivalent to Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage.
  3. : By versioning data and models, experiments may be easily reproduced with the very same data and model versions.

To integrate DVC with MLEM, we will use DVC pipeline. With the DVC pipeline, we will specify the command, dependencies, and parameters needed to create certain outputs within the dvc.yaml file.

stages:
train:
cmd: python src/train.py
deps:
- data/intermediate
- src/train.py
params:
- data
- model
- train
outs:
- model/svm
- model/svm.mlem:
cache: false

View the total file.

In the instance above, we specify the outputs to be the files model/svm and model/svm.mlem under the outs field. Specifically,

  • The model/svm is cached, so it is going to be uploaded to a DVC distant storage, but not committed to Git. This ensures that giant binary files don’t decelerate the performance of the repository.
  • The mode/svm.mlem just isn’t cached, so it won’t be uploaded to a DVC distant storage but shall be committed to Git. This enables us to trace changes within the model while still keeping the repository’s size small.
Image by Writer

To run the pipeline, type the next command in your terminal:

$ dvc exp run

Running stage 'train':
> python src/train.py

Next, specify the distant storage location where the model shall be uploaded to within the file .dvc/config :

['remote "read"']
url = https://winequality-red.s3.amazonaws.com/
['remote "read-write"']
url = s3://your-s3-bucket/

To push the modified files to the distant storage location named “read-write”, simply run:

dvc push -r read-write

Arrange a platform to deploy your model

Next, let’s determine a platform to deploy our model. MLEM supports deploying your model to the next platforms:

  • Docker
  • Heroku
  • Fly.io
  • Kubernetes
  • Sagemaker

This project chooses Fly.io as a deployment platform because it’s easy and low cost to start.

To create applications on Fly.io in a GitHub workflow, you’ll need an access token. Here’s how you possibly can get one:

  1. Enroll for a Fly.io account (you’ll need to supply a bank card, but they won’t charge you until you exceed free limits).
  2. Log in and click on “Access Tokens” under the “Account” button in the highest right corner.
  3. Create a recent access token and replica it for later use.
Image by Writer

Create a GitHub workflow

Now it involves the exciting part: Making a GitHub workflow to deploy your model! Should you aren’t accustomed to GitHub workflow, I like to recommend reading this text for a fast overview.

We’ll create the workflow called publish-model within the file .github/workflows/publish.yaml :

Image by Writer

Here’s what the file looks like:

name: publish-model

on:
push:
branches:
- principal
paths:
- model/svm.mlem

jobs:
publish-model:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Environment setup
uses: actions/setup-python@v2
with:
python-version: 3.8

- name: Install dependencies
run: pip install -r requirements.txt

- name: Download model
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: dvc pull model/svm -r read-write

- name: Setup flyctl
uses: superfly/flyctl-actions/setup-flyctl@master

- name: Deploy model
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
run: mlem deployment run flyio svm-app --model model/svm

The on field specifies that the pipeline is triggered on a push event to the principal branch.

The publish-model job includes the next steps:

  • Trying out the code
  • Establishing the Python environment
  • Installing dependencies
  • Pulling a model from a distant storage location using DVC
  • Establishing flyctl to make use of Fly.io
  • Deploying the model to Fly.io

Note that for the job to operate properly, it requires the next:

  • AWS credentials to tug the model
  • Fly.io’s access token to deploy the model

To make sure the secure storage of sensitive information in our repository and enable GitHub Actions to access them, we are going to use encrypted secrets.

To create encrypted secrets, click “Settings” -> “Actions” -> “Latest repository secret.”

Image by Writer

That’s it! Now let’s check out this project and see if it really works as expected.

Setup

To check out this project, start with making a recent repository using the project template.

Image by Writer

Clone the brand new repository to your local machine:

git clone https://github.com/your-username/cicd-mlops-demo

Arrange the environment:

# Go to the project directory
cd cicd-mlops-demo

# Create a recent branch
git checkout -b experiment

# Install dependencies
pip install -r requirements.txt

Pull data from the distant storage location called “read”:

dvc pull -r read

Create a recent model

svm_kernel is an inventory of values used to check the kernel hyperparameter while tuning the SVM model. To generate a recent model, add rbf to svm__kernel within the params.yaml file.

Image by Writer

Run a recent experiment with the change:

dvc exp run

Push the modified model to distant storage called “read-write”:

dvc push -r read-write

Add, commit, and push changes to the repository within the “experiment” branch:

git add .
git commit -m 'change svm kernel'
git push origin experiment

Create a pull request

Next, create a pull request by clicking the Contribute button.

Image by Writer

After making a pull request within the repository, a GitHub workflow shall be triggered to run tests on the code and model.

In spite of everything the tests have passed, click “Merge pull request.”

Image by Writer

Deploy the model

Once the changes are merged, a CD pipeline shall be triggered to deploy the ML model.

To view the workflow run, click the workflow then click the publish-model job.

Image by Writer
Image by Writer

Click the link under the “Deploy model” step to view the web site to which the model is deployed.

Image by Writer

Here’s what the web site looks like:

Image by Writer

View the web site.

Congratulations! You’ve just learned the right way to create a CD pipeline to automate your machine-learning workflows. Combining CD with CI will allow your firms to catch errors early, reduce costs, and reduce time-to-market.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here