January 27, 2023 • qnib • 3 min read
Buildkit Dockerfile Frontend Caching
100% human-written — no AI tools are used to write these posts.
Ok, where was I? In the last blog post (BuildKit Dockerfile Frontend) scratching the surface by introcuding here-docs in Dockerfiles. The original inspiration for the post - series, as it turned out - was my to cut the cristiangreco/docker-pdflatex image down by a GB or so.
I went from this:
COPY install.sh /install.sh
RUN sh /install.sh && rm /install.sh
to this:
RUN <<eot bash
apt-get update
apt-cache depends texlive-full \
| grep "Depends:" \
| grep -v "doc$" \
| egrep -v "texlive-(games|music)" \
| egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
| cut -d ' ' -f 4 \
| xargs apt-get install --no-install-recommends -y
apt-get autoremove
eot
Iteration Speed
The problem with the above is that it takes forever to build the image. Each time I was fiddling with the list of packages I wanted to exclude the build downloaded the package list and all packages itself (2GB).
Thus, the iteration speed was terible.
RUN mounts FTW
That’s where RUN mounts within the frontend are going to speed up things. First, I need to instruct apt to not throw away the packages already downloaded.
Snippet
```bash
RUN <<eot bash
rm -f /etc/apt/apt.conf.d/docker-clean
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
> /etc/apt/apt.conf.d/keep-cache
eot
```
Working Dockerfile
```bash
# syntax = docker/dockerfile:1.4
FROM debian:bullseye-20230109-slim
RUN <<eot bash
rm -f /etc/apt/apt.conf.d/docker-clean
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
> /etc/apt/apt.conf.d/keep-cache
eot
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
apt-get update
apt-cache depends texlive-full \
| grep "Depends:" \
| grep -v "doc$" \
| egrep -v "texlive-(games|music)" \
| egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
| cut -d ' ' -f 4 \
| xargs apt-get install --no-install-recommends -y
apt-get autoremove
eot
VOLUME ["/sources"]
WORKDIR /sources
```
Next, I’ll add two mountpoints to keep the caches around.
Snippet
```bash
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
apt-get update
apt-cache depends texlive-full \
| grep "Depends:" \
| grep -v "doc$" \
| egrep -v "texlive-(games|music)" \
| egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
| cut -d ' ' -f 4 \
| xargs apt-get install --no-install-recommends -y
apt-get autoremove
eot
```
Working Dockerfile
```bash
# syntax = docker/dockerfile:1.4
FROM debian:bullseye-20230109-slim
RUN <<eot bash
rm -f /etc/apt/apt.conf.d/docker-clean
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
> /etc/apt/apt.conf.d/keep-cache
eot
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt <<eot bash
apt-get update
apt-cache depends texlive-full \
| grep "Depends:" \
| grep -v "doc$" \
| egrep -v "texlive-(games|music)" \
| egrep -v "texlive-lang-(arabic|cjk|chinese|cyrillic|czechslovak|european|french|german|greek|italian|japanese|korean|other|polish|portuguese|spanish)$" \
| cut -d ' ' -f 4 \
| xargs apt-get install --no-install-recommends -y
apt-get autoremove
eot
VOLUME ["/sources"]
WORKDIR /sources
```
Awesome! Now I can run the build and maintain the cached files. No need to download 2GB of packages each time I iterate. 🤯
Use in GOLANG
This method also works wonders when you build GOLANG images a lot, just put the go-cache into a cache and off you go.
Snippet
```bash
RUN --mount=type=cache,target=/go/pkg/mod go mod tidy
ENV CGO_ENABLED=1 GOOS=linux
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go build -o /usr/bin/test -a -ldflags '-extldflags "-static"' .
eot
RUN /usr/bin/test
```
Working Dockerfile
```bash
# syntax = docker/dockerfile:1.4
FROM golang
WORKDIR /go/src/test
COPY <<-"eot" /go/src/test/main.go
package main
import (
"github.com/sirupsen/logrus"
)
func main() {
logrus.Println("Hello, world!")
}
eot
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go mod init
go mod tidy
eot
ENV CGO_ENABLED=1 GOOS=linux
RUN --mount=type=cache,target=/go/pkg/mod <<eot bash
go build -o /usr/bin/test -a -ldflags '-extldflags "-static"' .
eot
RUN /usr/bin/test
```
Why should HPC care?
Pretty sure my readers are going to see where we as the HPC community will benefit by using this: containerized builds…
Using spack containerize is pretty cool already, but using caching between builds is hard because the Dockerfile looks totally different between builds if you change the spack.env ever so slightly.
Using --mount=type=cache is going to be a fun optimization. Need to fiddle around with that before FOSDEM next week.