available not for hire
Docker Containers As Machine Learning Environments
(Part Two)
Extend Official Docker Images to Create Custom Containers.
In part I of this guide, we discussed some concepts of virtualization and some use-cases for containers. We demonstrated that with some simple Docker CLI commands, we could pull down and image from DockerHub and run a container. But what if that container does not have everything that you need. In this guide, we will expand on what we learned with a more practical, real-world example.
The instructor and the grader
Suppose you are teaching a class in machine learning in python. Your students will be submitting code that will be run on a server for evaluation. The submissions will run through a series of tests that will determine the grades. Lets call this server, the grader.
As part of the course introduction, you write some guidance on setting up a python environment. As part of the requirements, you state that submissions will be run against Python v3.6
with the following dependencies:
numpy==1.21.1
matplotlib==3.4.3
scikit-learn==0.24.2
Dependency management
You provide installation instructions for the dependencies using pip
, a package manager. You note that a package manager alone is not a robust solution as the number of projects grow. Because each project may depend on a different set of dependencies with specific version numbers, virtual environments become necessary. You provide introductions and resources for two existing solutions:
- Conda - a popular package and environment manager
- A container image with all packages and dependencies installed and configured
The python docs provide guidance on creating virtual environments for managing dependencies in isolated environments. To list a few, pipenv, (virtualenv or venv), conda and hatch. Other methods for managing environments that are not on this list include Docker and Poetry.
Getting started building an image
The focus of this guide is on building the aforementioned container image. Following the best practices, we will base our container image from the official python container image. Lets get started by pulling down this base image.
> docker pull 3.9-slim-buster
Python has a
slim
variant that includes only what is necessary to run python applications — this will help keep our image size as small as possible.
Next, create a new directory. We will be adding two files to the directory, dockerfile
and requirements.txt
.
> mkdir docker_images && cd docker_images && \
touch Dockerfile requirements.txt
The Dockerfile
All that is needed to get container image is to run docker build
with a dockerfile. The dockerfile is a simple text file containing a list of sequential steps to build an image. A simple example would be:
FROM python:3.9-slim-buster
CMD [ "python" ]
The FROM
field specifies the base image to extend from. The CMD
specifies which command to run when the container starts. With the above dockerfile, the container would run the python
with no arguments – which executes commands from a prompt.
That was a very simple example. Lets add a step to pip install
dependencies from the empty requirements.txt file.
Open requirements.txt file and add the following:
numpy==1.21.1
matplotlib==3.4.3
scikit-learn==0.24.2
Next, we will copy the following to our empty dockerfile.
FROM python:3.9-slim-buster
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python" ]
First, we use the COPY
command to create a step that copies requirements.txt over to the container environment. The RUN
command executes pip install
with requirements.txt file as an argument. This will install all of the dependencies and add an additional layer onto the container image.
Building the image
Now that we have a dockerfile which tells the runtime what to do, we need to build the container image. Lets use the docker run command from the same directory containing the the dockerfile and requirements.txt file:
You may want to enable buildkit to take advantage of several performance improvements. To enable, export the environment variable with:
export DOCKER_BUILDKIT=1
.
> docker build -t flanders/python3.9:v1 .
The -t
option tells the runtime to tag the image. An image is referenced by {repository}/{name}[:{tag}]
Now check to see if the image is there:
> docker images
You should see your image in the list.
Running your new image
Now we have an image with Python 3.9 and the dependencies in the requirements.txt file. Lets run our image to check our dependencies. This time we will overwrite the default CMD
by appending arguments to the docker run command.
> docker run --rm flanders/python3.9:v1 \
python -m pip list --not-required
This executes python -m pip list --not-required
inside the container and then exits when the process is finished. The output should be something like:
Package Version
------------ -------
matplotlib 3.4.3
pip 21.2.2
scikit-learn 0.24.2
setuptools 57.4.0
wheel 0.36.2
Notice we do not need
-it
this time because we are not running an interactive process (e.g., bash or python).
On users and groups
It is a best practice to use a non-root user if elevated privileges are not required. To do this, we will add a RUN
command that adds a user and group and creates a home directory for this user. So, our dockerfile now looks like:
FROM python:3.9-slim-buster
RUN groupadd developers && useradd --no-user-group --no-log-init -g developers -m joe
WORKDIR /home/joe
USER joe
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python" ]
Running the container, we can now see that the user is joe:
❯ docker run -it --rm flanders/python3.9:v1 /bin/sh
$ whoami
joe
$ pwd
/home/joe