How to use GitHub Actions for creating docker images and pushing it to Docker Hub

Lasse Schmidt
Analytics Vidhya
Published in
5 min readMar 20, 2022

--

Photo by Roman Synkevych 🇺🇦 on Unsplash

While the field of Machine Learning is more and more maturing, the operationalization side of this field is still in its infancy. This short blogpost should give you a small guide on how to use GitHub Actions to first CI your code and then further automize the creating of docker images and pushing these images to a container registry. In my case, I use Docker Hub for this.

First of, why should you care about this? In my experience, creating a prototype of a model which runs successfully takes way less time then putting this model then into production. This is why Machine Learning should embrace Docker and use it to package Scripts, which then could run anywhere. Also with the emergence of cloud computing this will evermore become the blueprint on how to deploy models, may it be for batch prediction or on edge devices.

Second, many Machine Learning projects are still lacking a sound Dev-architecture. By this I mean more often than not there are no automated linting checks and testing checks whenever you push code or merge pull requests. This blogpost should provide you with a really simple example project, which contains all of the points I just mentioned.

Let’s start by login into DockerHub and create a repository. In my case, I did create the “hello_world_test” repository:

Next, create a GitHub repository. In this repository, go to “Settings” and then to “Secrets” and create two secrets: one names “DOCKER_USER” and the other one named “DOCKER_PASSWORD”. Give you respective DockerID and password as values to these secrets.

As this should be a kind of blueprint for Python-packages, let’s create a hello.py file and put it into the “src” folder:

This is just a simple example. You can think of this file to be the main.py file, the file you do want to execute. Let’s also write a simple test_hello.py file:

As I already mentioned in the beginning, I think many ML-projects are missing a proper testing. With GitHub actions we want all of this to automatically run whenever I do push something in my repository. A great way of defining what should be run is a Makefile, which works like a recipe. This is how the Makefile looks like:

It’s really simple, we define three steps: installing, linting and testing. You run these commands using “make install” or “make lint”.

When it comes to our example project, we only need one more file, the Dockerfile:

Again, this is sufficient enough for most Python-Scripts to just work. What this Dockerfile does is the following:

  • it downloads a predefined python image from dockerhub, in this case with python version 3.8
  • it adds the VERSION as an argument, which we then use to automatically LABEL our image once we build it
  • we copy the requirements.txt file into that image, in a folder we call /src/
  • we set the working directory within that image to that src folder
  • we than pip install the requirements into our image
  • we then copy everything from our src-folder (from our repository) to the src-folder within our image
  • When we run our image, we tell the image that it should execute the hello.py file

And that’s it. When it comes to packaging, we’re already done. The last step is to use GitHub Actions to automate the complete process:

  • whenever we push, run linting, run testing
  • when these steps are successful, build the image from our Dockerfile
  • when this is successful, automatically push the image to our (Docker Hub) registry

How is this done? Within GitHub, go to actions and create a new workflow. This will automatically create a main.yml file for you in your repository. The workflow should look like this:

It is quite simple, but yet incredibly powerful. Let me explain what happens here:

  • “on: [push]” will run this whenever we push something to our repository
  • the build command will tell GitHub that the following steps should be executed. Where the build process should take place is defined via “runs-on”. I use the “ubuntu-latest” which already comes with python and docker, so I advise you to use this too. GitHub will create an instance for us, on which the steps we will define will then run
  • the first step we will do is to login into our DockerHub registry. This is where the above mentioned secrets come in. We use them as environment variables so the ubuntu-latest machine knows where to find them. Then we use the CLI command for docker to login with our user-name and password
  • we then define on which Python version we want to run our script. As I already mentioned, we want to run it with Python 3.8
  • next, our Makefile comes in handy. As we already defined what should be done when we call “make install” or “make lint”, we can easily pass these arguments here and they then automatically install the required packages (on the instance GitHub offers us), run the linting and run the tests. Only when this is successful we get to the next step
  • after these tests have passed, we will build our image
  • finally, we push our image to our registry

And that’s it. This is how the GitHub Actions look like:

And that’s my registry on Docker Hub after the build job was completed:

Every machine in the world which has docker installed can now pull this image from my Docker Hub registry and simply do:

docker run — rm -d lasse1990/hello_world_test:0.0.1

And the outcome will be 2.

You can find the repository here. As I said, almost all Python projects can make use of this simple technique. This makes your code first of all more stable, more professional and second with the created image your script now runs on every machine which does have Docker installed. If you push into a private registry, only users with access to that registry will be able to run your image. You can also take these images and deploy them on AWS/GCP or Azure.

I hope this small example did clarify GitHub Actions a bit and also made it more clear, why Docker and Docker images should be the way to go for your Machine Learning Projects.

Lasse

--

--