In this blog post I’ll walk through the creation of a Python Docker image based on the Distroless container published by Google, but with an up-to-date version of Python and operating system updates — unlike their experimental version. This image still has the same security and operational benefits — such as no shell or unnecessary OS libraries to reduce the security attack surface, as well as preserving the tiny image size.
If you don’t want the background and just want to see how I built it, you can head straight to the source code on Github. If you are interested in a bit more of the detail, then I’ve written up a more thorough post on my own blog.
Demystifying Distroless
A little bit like the term “serverless”, the term “distroless” is a trendy misnomer in my opinion. The Linux distribution is still in there, but what we really mean is that the image has been stripped down to just the bare minimum we can get away with — just enough to run your application. In particular, there’s no shell.
Why do we care about this? Well, it leads to a smaller attack surface. There are far fewer OS libraries and tools available to be exploited to help a baddie break into your running container, and even if they do (say, through your app itself), there’s less handy bits of software lying around in the image to take advantage of to break out onto the host or discover more dangerous intel.
Whilst distributions like Alpine Linux are excellent at helping with this too, you still have a shell to (ab)use. A former colleague of mine has written a great article illustrating the sorts of differences you’d see as a would-be attacker on a container with and without a shell.
The Python Predicament
This base image choice is made even more complex with Python. Whilst common languages like Java and Node have well-established Distroless variants that are updated frequently, the Python one continues to be marked as experimental and appears to change more rarely. It ships with whatever is in upstream Debian, leading to issues like this one.
To illustrate this, here’s a vulnerability scan from trivy, showing that the CVE mentioned in that Github issue is still present 3+ months later:
> trivy image -s=HIGH,CRITICAL gcr.io/distroless/python3:latest2022-06-26T13:49:19.271+0100 INFO Detecting Debian vulnerabilities...gcr.io/distroless/python3:latest (debian 11.3)Total: 10 (HIGH: 7, CRITICAL: 3)
You might think then that perhaps with Python specifically it’d be better to use Alpine? Indeed that’s something I’ve done a fair few times myself. Whilst this does help, it creates new problems. Its different choice of C library can create horrendous build times when certain (common) packages are in use (no pip wheels available, forcing a build from source) — grpcio
and pandas
are notable examples in my world.
This isn’t the only issue you could face too — as this excellent blog post elaborates on, you can encounter subtle bugs and performance issues as a result of the choice too.
I have a strong suspicion that the most common choice out there therefore becomes python:*-slim-bullseye
. This image works very well, but does suffer from OS library vulnerabilities. Here’s a scan from the latest 3.9 image at the time I was writing up my findings:
> trivy image -s=HIGH,CRITICAL python:3.9-slim-bullseye2022-06-26T13:59:34.109+0100 INFO Detecting Debian vulnerabilities...python:3.9-slim-bullseye (debian 11.3)Total: 16 (HIGH: 13, CRITICAL: 3)
Even if you take the view that all of these vulnerabilities are not actually as severe as they’re being classified (Debian’s tracker is pretty good, imho), simply going through the process of reviewing the vulnerability, updating the image if possible, suppressing the vulnerability from your scanning tool if not a concern, and so on is all extremely toilsome work, especially at scale.
So, instead, let’s see how you can have your cake and eat it too (mostly), with your own distroless python image.
Building a Better Base
I initially began as probably most people would — taking a copy of the distroless repo itself and hacking about with it. The images are built using Google’s Bazel. I’d not used this before and to be honest I found the learning curve rather steep. A short while in I decided to pivot to technology I was more familiar with — Docker itself.
As both the Python image we like and the Distroless base image are both based on Debian, we can “cheat” with a multi-layer build instead, avoiding a lot of that complexity.
I ended up with a Dockerfile
that looks a bit like this structure:
FROM FULLY_FEATURED_PYTHON_UPSTREAM_IMAGE as base### do nothing, if you like. Or do something. Up to you #casualFROM GOOGLE_DISTROLESS_IMAGE# copy the files we need from the python base into distroless
COPY --from=base /path/on/base /path/on/distroless### do anything else you find useful here too. If you want to# run python
ENTRYPOINT ["/usr/local/bin/python"]
The COPY --from
is key - and in reality a little more complicated than I’ve made it look. We need to copy Python and its dependencies into distroless
, as well as any useful compiled libraries that we’ll need for other Python packages. For me, this involved some trial and error — my knowledge of Python’s internals is not up to the task yet!
You can fake it by copying the whole of e.g.
/lib
and/usr
just to see if it works, but this: a) results in a 300Mb+ image, and b) is kinda defeating the point of keeping the number of libs to a minimum! Google’s Python is 50.2Mb at time of writing, and my image clocks in at 57.7Mb
I also chose to use the C version of Google’s distroless rather than base
, as so much of Python depends on it anyway. I pinched this idea from the Python variant that Google build.
The final version of my Dockerfile therefore looks a bit more complicated than the above — I’ll talk through this in a bit more detail below:
I elaborate on the detail behind most of the lines here in my more detailed blog post.
It’s less deterministic than the Bazel approach, but for my needs it has been working reliably so far and made vulnerabilities much easier to manage.
Want to use the images? The distroless image itself can be found here:
docker pull al3xos/python-distroless:3.9-debian11
(there is also a 3.10 tag)
And if you find the builder image useful too (it is for installing your dependencies in an earlier layer):
docker pull al3xos/python-builder:3.9-debian11
(again, 3.10 also available at time of writing)
Testing The Thing
Speaking of actually using the images — I also added some examples that serve as the smoke tests for the published images above.
They range from a simple hello-world to using common libraries I use (such as kubernetes-client and pandas), as well as running more complex frameworks like Flask and FastAPI. Again, more detail in my other post.
Summary
So there we have it — a Distroless Python image that uses Google’s distroless as a base, but layers in an up-to-date version of Python and its dependencies that are under your control, to tailor to your needs. Whilst this is something I’ve only recently put together, I’m now using it in a few places and it seems to be working well — whilst still delivering on the tiny image size (faster container startup matters!) and in particular leaner attack surface and less toil in managing OS vulnerabilities.
I hope you find some inspiration from this post to use it yourself or create your own equivalent 😄 🐍