A Data Scientist’s Guide to Docker Containers

a ML to be useful it must run somewhere. This somewhere is almost certainly not your local machine. A not-so-good model that runs in a production environment is healthier than an ideal model that never leaves your local machine.

Nevertheless, the production machine is often different from the one you developed the model on. So, you ship the model to the production machine, but by some means the model doesn’t work anymore. That’s weird, right? You tested every little thing in your local machine and it worked nice. You even wrote unit tests.

What happened? Almost definitely the production machine differs out of your local machine. Perhaps it doesn’t have all of the needed dependencies installed to run your model. Perhaps installed dependencies are on a special version. There may be many reasons for this.

How will you solve this problem? One approach might be to precisely replicate the production machine. But that could be very inflexible as for every latest production machine you would wish to construct a neighborhood replica.

A much nicer approach is to make use of Docker containers.

Docker is a tool that helps us to create, manage, and run code and applications in containers. A container is a small isolated computing environment through which we will package an application with all its dependencies. In our case our ML model with all of the libraries it must run. With this, we don’t have to depend on what’s installed on the host machine. A Docker Container enables us to separate applications from the underlying infrastructure.

For instance, we package our ML model locally and push it to the cloud. With this, Docker helps us to make sure that our model can run anywhere and anytime. Using Docker has several benefits for us. It helps us to deliver latest models faster, improve reproducibility, and make collaboration easier. All because now we have the exact same dependencies irrespective of where we run the container.

As Docker is widely utilized in the industry Data Scientists have to give you the option to construct and run containers using Docker. Hence, in this text, I’ll undergo the fundamental concept of containers. I’ll show you all you should find out about Docker to start. After now we have covered the idea, I’ll show you the way you’ll be able to construct and run your individual Docker container.

What’s a container?

A container is a small, isolated environment through which every little thing is self-contained. The environment packages up all code and dependencies.

A container has five primary features.

self-contained: A container isolates the applying/software, from its environment/infrastructure. Because of this isolation, we don’t have to depend on any pre-installed dependencies on the host machine. All the pieces we’d like is an element of the container. This ensures that the applying can all the time run whatever the infrastructure.
isolated: The container has a minimal influence on the host and other containers and vice versa.
independent: We will manage containers independently. Deleting a container doesn’t affect other containers.
portable: As a container isolates the software from the hardware, we will run it seamlessly on any machine. With this, we will move it between machines with out a problem.
lightweight: Containers are lightweight as they share the host machine’s OS. As they don’t require their very own OS, we don’t have to partition the hardware resource of the host machine.

This might sound much like virtual machines. But there may be one big difference. The difference is in how they use their host computer’s resources. Virtual machines are an abstraction of the physical hardware. They partition one server into multiple. Thus, a VM features a full copy of the OS which takes up extra space.

In contrast, containers are an abstraction at the applying layer. All containers share the host’s OS but run in isolated processes. Because containers don’t contain an OS, they’re more efficient in using the underlying system and resources by reducing overhead.

Containers vs. Virtual Machines (Image by the writer based on docker.com)

Now we all know what containers are. Let’s get some high-level understanding of how Docker works. I’ll briefly introduce the technical terms which are used often.

What’s Docker?

To know how Docker works, let’s have a temporary take a look at its architecture.

Docker uses a client-server architecture containing three primary parts: A Docker client, a Docker daemon (server), and a Docker registry.

The Docker client is the first approach to interact with Docker through commands. We use the client to speak through a REST API with as many Docker daemons as we would like. Often used commands are docker run, docker construct, docker pull, and docker push. I’ll explain later what they do.

The Docker daemon manages Docker objects, equivalent to images and containers. The daemon listens for Docker API requests. Depending on the request the daemon builds, runs, and distributes Docker containers. The Docker daemon and client can run on the identical or different systems.

The Docker registry is a centralized location that stores and manages Docker images. We will use them to share images and make them accessible to others.

Sounds a bit abstract? No worries, once we start it would be more intuitive. But before that, let’s run through the needed steps to create a Docker container.

Docker Architecture (Image by writer based on docker.com)

What do we’d like to create a Docker container?

It is easy. We only have to do three steps:

create a Dockerfile
construct a Docker Image from the Dockerfile
run the Docker Image to create a Docker container

Let’s go step-by-step.

A Dockerfile is a text file that accommodates instructions on tips on how to construct a Docker Image. Within the Dockerfile we define what the applying looks like and its dependencies. We also state what process should run when launching the Docker container. The Dockerfile consists of layers, representing a portion of the image’s file system. Each layer either adds, removes, or modifies the layer below it.

Based on the Dockerfile we create a Docker Image. The image is a read-only template with instructions to run a Docker container. Images are immutable. Once we create a Docker Image we cannot modify it anymore. If we have the desire to make changes, we will only add changes on top of existing images or create a brand new image. After we rebuild a picture, Docker is clever enough to rebuild only layers which have modified, reducing the construct time.

A Docker Container is a runnable instance of a Docker Image. The container is defined by the image and any configuration options that we offer when creating or starting the container. After we remove a container all changes to its internal states are also removed in the event that they usually are not stored in a persistent storage.

Using Docker: An example

With all the idea, let’s get our hands dirty and put every little thing together.

For instance, we’ll package an easy ML model with Flask in a Docker container. We will then run requests against the container and receive predictions in return. We’ll train a model locally and only load the artifacts of the trained model within the Docker Container.

I’ll undergo the overall workflow needed to create and run a Docker container together with your ML model. I’ll guide you thru the next steps:

construct model
create requirements.txt file containing all dependencies
create Dockerfile
construct docker image
run container

Before we start, we’d like to put in Docker Desktop. We’ll use it to view and run our Docker containers afterward.

1. Construct a model

First, we’ll train an easy RandomForestClassifier on scikit-learn’s Iris dataset after which store the trained model.

Second, we construct a script making our model available through a Rest API, using Flask. The script can be easy and accommodates three primary steps:

extract and convert the info we would like to pass into the model from the payload JSON
load the model artifacts and create an onnx session and run the model
return the model’s predictions as json

I took a lot of the code from here and here and made only minor changes.

2. Create requirements

Once now we have created the Python file we would like to execute when the Docker container is running, we must create a requirements.txt file containing all dependencies. In our case, it looks like this:

3. Create Dockerfile

The last item we’d like to arrange before with the ability to construct a Docker Image and run a Docker container is to put in writing a Dockerfile.

The Dockerfile accommodates all of the instructions needed to construct the Docker Image. Essentially the most common instructions are

FROM — this specifies the bottom image that the construct will extend.
WORKDIR — this instruction specifies the “working directory” or the trail within the image where files can be copied and commands can be executed.
COPY — this instruction tells the builder to repeat files from the host and put them into the container image.
RUN — this instruction tells the builder to run the desired command.
ENV — this instruction sets an environment variable that a running container will use.
EXPOSE — this instruction sets the configuration on the image that indicates a port the image would really like to reveal.
USER — this instruction sets the default user for all subsequent instructions.
CMD ["", ""] — this instruction sets the default command a container using this image will run.

With these, we will create the Dockerfile for our example. We want to follow the next steps:

Determine the bottom image
Install application dependencies
Copy in any relevant source code and/or binaries
Configure the ultimate image

Let’s undergo them step-by-step. Each of those steps leads to a layer within the Docker Image.

First, we specify the bottom image that we then construct upon. As now we have written in the instance in Python, we’ll use a Python base image.

Second, we set the working directory into which we’ll copy all of the files we’d like to give you the option to run our ML model.

Third, we refresh the package index files to make sure that now we have the newest available details about packages and their versions.

Fourth, we copy in and install the applying dependencies.

Fifth, we copy within the source code and all other files we’d like. Here, we also expose port 8080, which we’ll use for interacting with the ML model.

Sixth, we set a user, in order that the container doesn’t run as the foundation user

Seventh, we define that the example.py file can be executed once we run the Docker container. With this, we create the Flask server to run our requests against.

Besides creating the Dockerfile, we may also create a .dockerignore file to enhance the construct speed. Just like a .gitignore file, we will exclude directories from the construct context.

If you must know more, please go to docker.com.

4. Create Docker Image

After we created all of the files we would have liked to construct the Docker Image.

To construct the image we first have to open Docker Desktop. You possibly can check if Docker Desktop is running by running docker ps within the command line. This command shows you all running containers.

To construct a Docker Image, we should be at the identical level as our Dockerfile and requirements.txt file. We will then run docker construct -t our_first_image . The -t flag indicates the name of the image, i.e., our_first_image, and the . tells us to construct from this current directory.

Once we built the image we will do several things. We will

view the image by running docker image ls
view the history or how the image was created by running docker image history
push the image to a registry by running docker push

5. Run Docker Container

Once now we have built the Docker Image, we will run our ML model in a container.

For this, we only have to execute docker run -p 8080:8080 within the command line. With -p 8080:8080 we connect the local port (8080) with the port within the container (8080).

If the Docker Image doesn’t expose a port, we could simply run docker run . As an alternative of using the image_name, we may also use the image_id.

Okay, once the container is running, let’s run a request against it. For this, we’ll send a payload to the endpoint by running curl X POST http://localhost:8080/invocations -H "Content-Type:application/json" -d @.path/to/sample_payload.json

Conclusion

In this text, I showed you the fundamentals of Docker Containers, what they’re, and tips on how to construct them yourself. Although I only scratched the surface it needs to be enough to get you began and give you the option to package your next model. With this information, you must give you the option to avoid the “it really works on my machine” problems.

I hope that you just find this text useful and that it would aid you grow to be a greater Data Scientist.

See you in my next article and/or leave a comment.

A Data Scientist’s Guide to Docker Containers

What’s a container?

What’s Docker?

What do we’d like to create a Docker container?