Skip to content

Buildkit Dockerfile Frontend Caching

Ok, where was I? In the last blog post (BuildKit Dockerfile Frontend) scratching the surface by introcuding here-docs in Dockerfiles. The original inspiration for the post - series, as it turned out - was my to cut the cristiangreco/docker-pdflatex image down by a GB or so.
I went from this:

COPY install.sh /install.sh
RUN sh /install.sh && rm /install.sh

to this:

RUN <<eot bash
  apt-get update
  apt-cache depends texlive-full \
  | grep "Depends:" \
  | grep -v "doc$" \
  | egrep -v "texlive-(games|music)" \
  | egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
  | cut -d ' ' -f 4 \
  | xargs apt-get install --no-install-recommends -y
  apt-get autoremove
eot

Iteration Speed

The problem with the above is that it takes forever to build the image. Each time I was fiddling with the list of packages I wanted to exclude the build downloaded the package list and all packages itself (2GB).

Thus, the iteration speed was terible.

RUN mounts FTW

That's where RUN mounts within the frontend are going to speed up things. First, I need to instruct apt to not throw away the packages already downloaded.

RUN <<eot bash
  rm -f /etc/apt/apt.conf.d/docker-clean
  echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
        > /etc/apt/apt.conf.d/keep-cache
eot
# syntax = docker/dockerfile:1.4
FROM debian:bullseye-20230109-slim
RUN <<eot bash
      rm -f /etc/apt/apt.conf.d/docker-clean
      echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
        > /etc/apt/apt.conf.d/keep-cache
eot
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
  apt-get update
  apt-cache depends texlive-full \
  | grep "Depends:" \
  | grep -v "doc$" \
  | egrep -v "texlive-(games|music)" \
  | egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
  | cut -d ' ' -f 4 \
  | xargs apt-get install --no-install-recommends -y
  apt-get autoremove
eot
VOLUME ["/sources"]
WORKDIR /sources

Next, I'll add two mountpoints to keep the caches around.

RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
  apt-get update
  apt-cache depends texlive-full \
  | grep "Depends:" \
  | grep -v "doc$" \
  | egrep -v "texlive-(games|music)" \
  | egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
  | cut -d ' ' -f 4 \
  | xargs apt-get install --no-install-recommends -y
  apt-get autoremove
eot
# syntax = docker/dockerfile:1.4
FROM debian:bullseye-20230109-slim
RUN <<eot bash
      rm -f /etc/apt/apt.conf.d/docker-clean
      echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
        > /etc/apt/apt.conf.d/keep-cache
eot
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
  apt-get update
  apt-cache depends texlive-full \
  | grep "Depends:" \
  | grep -v "doc$" \
  | egrep -v "texlive-(games|music)" \
  | egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
  | cut -d ' ' -f 4 \
  | xargs apt-get install --no-install-recommends -y
  apt-get autoremove
eot
VOLUME ["/sources"]
WORKDIR /sources

Awesome! Now I can run the build and maintain the cached files. No need to download 2GB of packages each time I iterate. 🤯

Use in GOLANG

This method also works wonders when you build GOLANG images a lot, just put the go-cache into a cache and off you go.

RUN --mount=type=cache,target=/go/pkg/mod go mod tidy
ENV CGO_ENABLED=1 GOOS=linux
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go build -o /usr/bin/test -a -ldflags '-extldflags "-static"' .
eot
RUN /usr/bin/test
# syntax = docker/dockerfile:1.4
FROM golang
WORKDIR /go/src/test
COPY <<-"eot" /go/src/test/main.go
package main

import (
  "github.com/sirupsen/logrus"
)

func main() {
  logrus.Println("Hello, world!")
}

eot
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go mod init
go mod tidy
eot
ENV CGO_ENABLED=1 GOOS=linux
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go build -o /usr/bin/test -a -ldflags '-extldflags "-static"' .
eot
RUN /usr/bin/test

Why should HPC care?

Pretty sure my readers are going to see where we as the HPC community will benefit by using this: containerized builds...
Using spack containerize is pretty cool already, but using caching between builds is hard because the Dockerfile looks totally different between builds if you change the spack.env ever so slightly.

Using --mount=type=cache is going to be a fun optimization. Need to fiddle around with that before FOSDEM next week.

Comments