Setup Gitlab CI/CD for Machine Learning Project Prerequisites Setup Web App on Fly.io Setup CI/CD Conclusion

Artificial Intelligence

Setup Gitlab CI/CD for Machine Learning Project Prerequisites Setup Web App on Fly.io Setup CI/CD Conclusion

admin

June 26, 2023

Setup Gitlab CI/CD for Machine Learning Project
Prerequisites
Setup Web App on Fly.io
Setup CI/CD
Conclusion

Please read these pages to know the concept of CI/CD

andd yeap, that is me writing technical tutorial while occupied with going to the beach

Let’s proceed:

Create Gitlab account at gitlab.com
Create fly.io account for web api deployment
Create repository on gitlab.com

run git clone git@gitlab.com:/iris-api.git

Construct a straightforward RESTFUL API

Create src/foremost.py

"""Iris Web API Service."""
from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
from src import distance, irisdataset = iris.get_iris_data()
app = FastAPI()
class Item(BaseModel):
"""Input class for predict endpoint.
Args:
BaseModel (BaseModle): Inherited from pydantic
"""
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
@app.get("/")
def homepage():
"""Homepage for the online.
Returns:
str: Homepage
"""
return "Homepage Iris Flower - tags 0.0.2"
@app.post("/predict/")
async def predict(item: Item):
"""Predict function for inference.
Args:
item (Item): dictionary of sepal dan petal data
Returns:
str: predict the goal
"""
sepal_length = item.sepal_length
sepal_width = item.sepal_width
petal_length = item.petal_length
petal_width = item.petal_width
data_input = np.array([[sepal_length, sepal_width, petal_length, petal_width]])
result = distance.calculate_manhattan(dataset, data_input)
return result

Create src/iris.py

"""Load iris dataset from scikit-learn."""
from sklearn import datasetsdef get_iris_data():
"""Load iris dataset.
Returns:
set: consists of X, y, feature names, and target_names
"""
iris = datasets.load_iris()
x_data = iris.data
y_label = iris.goal
features_names = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
target_names = iris.target_names
return x_data, y_label, features_names, target_names
if __name__ == "__main__":
x_data, y_label, features, target_names = get_iris_data()
print("X", x_data)
print("y", y_label)
print("features", features)
print("target_names", target_names)

Create src/distance.py

"""Distance module for calculating distance between data input and dataset."""
import numpy as npdef calculate_manhattan(iris_data: np.ndarray, input_data: np.ndarray):
"""Calculate the gap between 2 vectors using manhattan distance.
Args:
dataset (np.ndarray): Iris dataset
input_data (np.ndarray): 1x4 matrix data input
Returns:
string: Return prediction
"""
x_data, y_label, _, target_names = iris_data
distance = np.sqrt(np.sum(np.abs(x_data - input_data), axis=1))
distance_index = np.argsort(distance)
y_pred = target_names[y_label[distance_index[0]]]
return y_pred
if __name__ == "__main__":
dataset = [
np.array([[4.9, 3.0, 1.4, 0.2], [4.9, 3.0, 1.4, 0.9]]),
[0, 0],
["sepal_length", "sepal_width", "petal_length", "petal_width"],
["setosa", "versicolor", "virginica"],
]
sample_data = np.array([[4.9, 3.0, 1.4, 0.2]])
print(calculate_manhattan(dataset, sample_data))

Create test/test_distance.py

import numpy as np
from src.iris import get_iris_data
from src.distance import calculate_manhattan def test_calculate_manhattan():
dataset = get_iris_data()
input_data = np.array([[4.9, 3.0, 1.4, 0.2]])
result = calculate_manhattan(dataset, input_data)
assert result == 'setosa'

Create Dockerfile

FROM python:3.10EXPOSE 8000
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port",  "8000"]

Make requirements.txt

# python
pydoclint>=0.0.10
pylint>=2.17.0
black>=22.6.0
pydocstyle>=6.1.1
pytest>=7.1.2# web app
fastapi>=0.98.0
uvicorn>=0.22.0
# models
numpy>=1.21.6
scikit-learn>=1.2.2

Create fly.toml

app = "iris-api-demo-stg"
primary_region = "sin"[build]
dockerfile = "Dockerfile"
[http_service]
internal_port = 8000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0

Very first thing first, please install flyctl in your computer by following this link https://fly.io/docs/hands-on/install-flyctl/

Then, do the authentication flyctl auth login

Then create your personal access token for Gitlab here https://fly.io/user/personal_access_tokens save the token to your notepad, later we’ll add the token to gitlab environment.

Now, you want to create 2 app: staging and production.

Staging app

flyctl launch --auto-confirm --copy-config --dockerfile Dockerfile --name iris-api-demo-stg --now --org personal --region sin

Production app

flyctl launch --auto-confirm --copy-config --dockerfile Dockerfile --name iris-api-demo--now --org personal --region sin

Eventually, you’ll look something like this in your fly.io dashboard

Don’t forget so as to add access token fly.io to gitlab environment for deployment purposes. Add variable and named it as FLY_TOKEN.

*drum-roll*

Now, let’s focus the foremost content here, configurating the CI/CD pipline.

Let’s create a recent file namedgitlab-ci.yml and call this v1

image: python:latestdocker-build:
stage: construct
script:
- echo "Construct Docker"
code-test:
stage: test
script:
- echo "Run Code Test"
production:
stage: deploy
environment: production
script:
- echo "Deploy to fly.io"

This is a straightforward gitlab-ci that runs each push that you just make to distant repo. What it does is while you push a change, 3 jobs shall be triggered. docker-build, code-test, and production.

Let’s dive in on how the things work.

image: python:latest implies that all these jobs run on top of docker image of python latest version which yow will discover on docker hub.

docker-build is the name of the job. The name of the job may be anything and you may create quite a few jobs in a single .yml file.

stage means which stage this job falls into. There are 3 common stages within the CI/CD pipeline, construct, test and deploy.

environment is used for specify which environment this job will run. You’ll get a listing of jobs that has specific environments. This help you deploy which commit you should redeploy. Hence, this makes easier if something went south within the staging or production environment.

script lets you write a shell command within the container. Think like a set of script will run within the terminal.

When you done:

git add gitlab-ci.yml

git commit -m "add gitlab-ci.yml

git push

Then, you may navigate to pipeline tab

As you may see, there are 3 green check mark that shows successful jobs had been run. If it fails, the icon shall be red cross.

Now, you will have created a straightforward pipeline.

Let’s create a pipeline that sometimes used for ML API development.

image: python:latestcode-check:
stage: construct
only:
- merge_requests
script:
- echo "Construct Docker"
- pip install -r requirements.txt
- pylint src --rcfile=.pylintrc
- black src --check
- pydocstyle src
code-test:
stage: test
only:
- merge_requests
script:
- echo "Run Code Test"
- pip install -r requirements.txt
- pytest
staging:
stage: deploy
environment: staging
only:
- staging
script:
- echo "Deploy to fly.io in staging environment"
- curl -L https://fly.io/install.sh | sh
- bash
- /root/.fly/bin/flyctl deploy --app iris-api-demo-stg --access-token $FLY_TOKEN
production:
stage: deploy
environment: production
only:
- tags
script:
- echo "Deploy to fly.io in production environment"
- curl -L https://fly.io/install.sh | sh
- /root/.fly/bin/flyctl deploy --app iris-api-demo --access-token $FLY_TOKEN

We now have 4 jobs:

code-check this job runs code quality check akin to linting using pylint, formatter using black, and docstring using pydocstyle. That is used to be certain that the written code follow the rules. This job will only run on merge request. For those who just push to the distant branch, it won’t trigger this job.
code-test Then, we’ve got code test, we’ve got already created a straightforward unit test above within the test_main.pyThat is to be sure that the module that we created run as expected.
staging this job will run if the merge request has been approved into staging branch. This shall be robotically deployed to fly.io using stagging application. This lets you do user acceptance test.
production Finally, we’ve got production job. The aim is sort of similar with staging one. This job shall be triggered in case you create a tag within the repository.

Create tag for deploying into production web app

When you create merge request and merge into staging branch. it’s going to deploy to staging app. Whether it is as expected, you may proceed to merge request to foremost branch, then approve. Once done, you may create tag to deploy into production web app.

That’s kind of to setup CI/CD on Gitlab. This may increasingly seems simplified, I’ll create an increasing number of complex pipeline that involves MLOps akin to model tracking, data versioning, model registry, model monitoring, etc. Hit the follow button and please connect on Linkedin at https://www.linkedin.com/in/chandraandreas/

LEAVE A REPLY Cancel reply