Docker Containers As Machine Learning Environments
(Part Two)

Extend Official Docker Images to Create Custom Containers.

In part I of this guide, we discussed some concepts of virtualization and some use-cases for containers. We demonstrated that with some simple Docker CLI commands, we could pull down and image from DockerHub and run a container. But what if that container does not have everything that you need. In this guide, we will expand on what we learned with a more practical, real-world example.

The instructor and the grader

Suppose you are teaching a class in machine learning in python. Your students will be submitting code that will be run on a server for evaluation. The submissions will run through a series of tests that will determine the grades. Lets call this server, the grader.

As part of the course introduction, you write some guidance on setting up a python environment. As part of the requirements, you state that submissions will be run against Python v3.6 with the following dependencies:

numpy==1.21.1
matplotlib==3.4.3
scikit-learn==0.24.2

Dependency management

You provide installation instructions for the dependencies using pip, a package manager. You note that a package manager alone is not a robust solution as the number of projects grow. Because each project may depend on a different set of dependencies with specific version numbers, virtual environments become necessary. You provide introductions and resources for two existing solutions:

  • Conda - a popular package and environment manager
  • A container image with all packages and dependencies installed and configured

The python docs provide guidance on creating virtual environments for managing dependencies in isolated environments. To list a few, pipenv, (virtualenv or venv), conda and hatch. Other methods for managing environments that are not on this list include Docker and Poetry.

Getting started building an image

The focus of this guide is on building the aforementioned container image. Following the best practices, we will base our container image from the official python container image. Lets get started by pulling down this base image.

> docker pull 3.9-slim-buster

Python has a slim variant that includes only what is necessary to run python applications — this will help keep our image size as small as possible.

Next, create a new directory. We will be adding two files to the directory, dockerfile and requirements.txt.

> mkdir docker_images && cd docker_images && \
  touch Dockerfile requirements.txt

The Dockerfile

All that is needed to get container image is to run docker build with a dockerfile. The dockerfile is a simple text file containing a list of sequential steps to build an image. A simple example would be:

FROM python:3.9-slim-buster

CMD [ "python" ]

The FROM field specifies the base image to extend from. The CMD specifies which command to run when the container starts. With the above dockerfile, the container would run the python with no arguments – which executes commands from a prompt.

That was a very simple example. Lets add a step to pip install dependencies from the empty requirements.txt file.

Open requirements.txt file and add the following:

numpy==1.21.1
matplotlib==3.4.3
scikit-learn==0.24.2

Next, we will copy the following to our empty dockerfile.

FROM python:3.9-slim-buster

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

CMD [ "python" ]

First, we use the COPY command to create a step that copies requirements.txt over to the container environment. The RUN command executes pip install with requirements.txt file as an argument. This will install all of the dependencies and add an additional layer onto the container image.

Building the image

Now that we have a dockerfile which tells the runtime what to do, we need to build the container image. Lets use the docker run command from the same directory containing the the dockerfile and requirements.txt file:

You may want to enable buildkit to take advantage of several performance improvements. To enable, export the environment variable with: export DOCKER_BUILDKIT=1.

> docker build -t flanders/python3.9:v1 .

The -t option tells the runtime to tag the image. An image is referenced by {repository}/{name}[:{tag}]

Now check to see if the image is there:

> docker images

You should see your image in the list.

Running your new image

Now we have an image with Python 3.9 and the dependencies in the requirements.txt file. Lets run our image to check our dependencies. This time we will overwrite the default CMD by appending arguments to the docker run command.

> docker run --rm flanders/python3.9:v1 \
  python -m pip list --not-required

This executes python -m pip list --not-required inside the container and then exits when the process is finished. The output should be something like:

Package      Version
------------ -------
matplotlib   3.4.3
pip          21.2.2
scikit-learn 0.24.2
setuptools   57.4.0
wheel        0.36.2

Notice we do not need -it this time because we are not running an interactive process (e.g., bash or python).

On users and groups

It is a best practice to use a non-root user if elevated privileges are not required. To do this, we will add a RUN command that adds a user and group and creates a home directory for this user. So, our dockerfile now looks like:

FROM python:3.9-slim-buster

RUN groupadd developers && useradd --no-user-group --no-log-init -g developers -m joe

WORKDIR /home/joe

USER joe

COPY --chown=joe:developers requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

CMD [ "python" ]

Running the container, we can now see that the user is joe:

docker run -it --rm flanders/python3.9:v1 /bin/sh
$ whoami
joe
$ pwd
/home/joe