Metahub: Dynamic Registry Proxy
I won't say "Long time, no post" - but...
As I had some time at my hands the last couple of months, I was iterating on my idea on hardware optimization using manifest list from the last post Match Node-Specific Needs Using Manifest Lists.
The gist is that hardware optimization with containers is a bit of a step back, as the kernel virtualization (aka containers) promises to provide isolation on-top of a Linux (or Windows) kernel without caring to much about the underlying host configuration.
Ok, hold your horses - let me explain. The OCI Image Spec specifies a platform object, best explained by the manifest-tool:
Here you go; I am on an ARM (variant
v6 means 'ARM 32-bit, v6'). The OS is Windows and the image needs the Windows Nano server (os.feature:
win32k). That's pretty much all the runtimes of today can handle.
And YES - it is an unlikely scenario.
Here's the kicker
If you are on amd64 (aka x86-64) you can not specify different microarchitectures (like Broadwell, Skylake) and you can not specify hardware that is outside of the kernels control (GPUs anyone?).
As of today people either compile everything directly on the host before running the container or strangle the name of the container to differentiate between different configs.
Let me use a more likely scenario:
cpu:broadwell above as a hardware feature, but that is not recognise by runtimes unless you patch them.
A patch to enable
dockerd to deal with it was introduced last time: moby/moby #38715.
But I do not care about that patch anymore...
What to do about it?
The OCI Image spec allows for (hardware) features; back in the last post I elaborated on how this features can be used to specify hardware configurations and let the runtime decide on what to download.
My understanding is that it was introduced to allow for CPU flags to be handled. The idea seems to have been to parse all CPU flags of the host and populate the hardware features. The problem was (IMHO) that for this to work one need to convince image creators to compile a broad set of images so that it is available on DockerHub. For most of the workload it might not even matter much (NGINX optimised?) Defining features for everyone to use is hard - I might even say it is impossible.
Another hard issue is that it needs to be implemented in each runtime. I have a pending PR to the aforementioned opencontainers/image-spec #761 to make the first step to introduce it to the containerd project.
But after some more thought I figured that why not externalise the logic so that the runtime is not bothered?
IMHO one way forward might be to just put some logic in between the runtimes and the registries that does all the work.
With this MetaHub will act as an intermediary, take in the request and query the backend registry (e.g. hub.docker.com) to figure out what it has in stock.
qnib/bench manifest list as an example one can query DockerHub by hand via manifest-tool:
$ manifest-tool inspect qnib/bench Name: qnib/bench (Type: application/vnd.docker.distribution.manifest.list.v2+json) Digest: sha256:7414c0ace0e1ae07744de83984ac60af982fb6846c0527acfc5c46401f0e333e * Contains 4 manifest references: 1 Mfst Type: application/vnd.docker.distribution.manifest.v2+json 1 Digest: sha256:b8914262d3517a80bccaf8e12c9e9fc5bca5ae804adf025ad175dea1106f0098 1 Mfst Length: 735 1 Platform: 1 - OS: linux 1 - OS Vers: 1 - OS Feat:  1 - Arch: amd64 1 - Variant: 1 - Feature: 1 # Layers: 2 layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926 layer 2: digest = sha256:78beee8c3c4b9c7dc2eb37816f113d0b04df2f75fc7750a2df492702655be95e 2 Mfst Type: application/vnd.docker.distribution.manifest.v2+json 2 Digest: sha256:ff1738d040d2e7141a84c30cd780763544009dcf6758bed2a1d4f937eceae7aa 2 Mfst Length: 735 2 Platform: 2 - OS: linux 2 - OS Vers: 2 - OS Feat:  2 - Arch: amd64 2 - Variant: 2 - Feature: cpu:skylake 2 # Layers: 2 layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926 layer 2: digest = sha256:de979171fd434c6be674c2bc3bbda6a0c18c4f900cd4aa2f71705bf5b32ecfb4 3 Mfst Type: application/vnd.docker.distribution.manifest.v2+json 3 Digest: sha256:e561d14bf57588b9d096b6bf09fce1ec7e86d3085d1dddfd1f376cce86f68ce6 3 Mfst Length: 735 3 Platform: 3 - OS: linux 3 - OS Vers: 3 - OS Feat:  3 - Arch: amd64 3 - Variant: 3 - Feature: cpu:broadwell 3 # Layers: 2 layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926 layer 2: digest = sha256:8c1099e54f20552bce2ebe59624980a7153164caeed8591fef3519d340cab6c0 4 Mfst Type: application/vnd.docker.distribution.manifest.v2+json 4 Digest: sha256:3ff5a83983654319766e8f89a690b0ce28d03c9aa76623becfc2f46c922cf34f 4 Mfst Length: 735 4 Platform: 4 - OS: linux 4 - OS Vers: 4 - OS Feat:  4 - Arch: amd64 4 - Variant: 4 - Feature: cpu:broadwell,nvcap:5.2 4 # Layers: 2 layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926 layer 2: digest = sha256:cd9e8d97d5988ba80906afebc0353f1c9bd08f9262eb85564402858d99582440
Thus, the image name does not represent a single image (manifest), but a list of images indexed by the
A bit more palatable is the source YAML file, which created this manifest list in the first place:
image: docker.io/qnib/bench manifests: - image: docker.io/qnib/plain-featuretest:generic platform: architecture: amd64 os: linux - image: docker.io/qnib/plain-featuretest:cpu-skylake platform: architecture: amd64 os: linux features: - cpu:skylake - image: docker.io/qnib/plain-featuretest:cpu-broadwell platform: architecture: amd64 os: linux features: - cpu:broadwell - image: docker.io/qnib/plain-featuretest:cpu_broadwell-nvcap_52 platform: architecture: amd64 os: linux features: - cpu:broadwell - nvcap:5.2
Within MetaHub (metahub.qnib.org) you can define machine-types to trim this manifest list down to only a single choice.
Just take a breath: Depending on how the machines log in the runtime will be presented with the right choice. :)
Enough talking now! Let us give it a spin...
Login as type1
Assuming the machine in question has a Broadwell CPU we need to login with machine type
Downloading and running the image will provide us with a (fake) image that is 'optimised' for Broadwell CPUs.
$ docker run metahub.qnib.org/qnib/bench && docker run -ti --rm metahub.qnib.org/qnib/bench Unable to find image 'metahub.qnib.org/qnib/bench:latest' locally latest: Pulling from qnib/bench Status: Downloaded newer image for metahub.qnib.org/qnib/bench:latest >> This container is optimised for: cpu:broadwell $ docker inspect metahub.qnib.org/qnib/bench |jq '..RepoDigests "metahub.qnib.org/qnib/bench@sha256:f972f05f4ff0c5df22c1f4fc339b14068b8cee96d0525f4fb13dbea84a900c89"
Login as different type
On a Skylake system machine type
type2 is your friend:
$ docker login -u qnib-type2 -p qnib-type2 metahub.qnib.org Login Succeeded $ docker run metahub.qnib.org/qnib/bench && docker run -ti --rm metahub.qnib.org/qnib/bench Unable to find image 'metahub.qnib.org/qnib/bench:latest' locally latest: Pulling from qnib/bench Status: Downloaded newer image for metahub.qnib.org/qnib/bench:latest >> This container is optimised for: cpu:skylake $ docker inspect metahub.qnib.org/qnib/bench |jq '..RepoDigests "metahub.qnib.org/qnib/bench@sha256:c62049e0707d3461b0a5b69d0ebe322f16a17b4439b5d9844e24cf97d40faa64"
As the content store assumes that the image
qnib/bench did not change
docker pull will trigger a request to the registry (MetaHub) to check for a different image.
As seen above a new image is downloaded:
Status: Downloaded newer image...
The previously downloaded image will still be on the host, but without a name associated with it.
With MetaHub you can run a heterogenous cluster of different host types and make sure that they are serve an optimised image, while using the same image name in your scheduler.
This puts an end to create images for all the clusters around. If you get a new host type, a new image for this host type has to be created - the old once are untouched.
I just put the project on github.com/qnib/metahub. Please feel free to open issue, contribute or support it by adding a star. :)
As it is WIP it has some limitations.
- There is only one user (qnib). For starters that should be fine, but at the end of the day a more sophisticated (like anything else) user management and RBAC should be in place.
- metahub.qnib.org is running on a small VM somewhere in the internet. So bandwidth is a finite resource. Please run your own (see below).
- Machine-types on metahub.qnib.org are fixed for now. To have a fully working MetaHub - please run your own!
This project just gets started. It is a gift that keeps giving:
- Adding CI/CD for images that are not yet there
- Reverse queries so that MetaHub tells you which hardware configuration you need to run an image
- Automated performance tests with different permutation of an application
Run your own
This may or may not work anymore as I started working on MetaHub in 2022.
You can run your own instance of MetaHub by firing up this:
If you want hard coded machine types run this:
$ docker service create --name metahub-static --publish 8081:80 \ -e STATIC_MACHINES=true qnib/metahub:2019-06-12.1
In order to use it make sure that you use SSL (e.g. via nginx/traefik in front) or add the IP address to your