I won't say "Long time, no post" - but...

As I had some time at my hands the last couple of months, I was iterating on my idea on hardware optimization using manifest list from the last post Match Node-Specific Needs Using Manifest Lists.

ReCap

The gist is that hardware optimization with containers is a bit of a step back, as the kernel virtualization (aka containers) promises to provide isolation on-top of a Linux (or Windows) kernel without caring to much about the underlying host configuration.

Ok, hold your horses - let me explain. The OCI Image Spec specifies a platform object, best explained by the manifest-tool:

platform:
   architecture: arm
   variant: v6
   os: windows
   osFeatures:
    - win32k

Here you go; I am on an ARM (variant v6 means 'ARM 32-bit, v6'). The OS is Windows and the image needs the Windows Nano server (os.feature: win32k). That's pretty much all the runtimes of today can handle.
And YES - it is an unlikely scenario.

Here's the kicker

If you are on amd64 (aka x86-64) you can not specify different microarchitectures (like Broadwell, Skylake) and you can not specify hardware that is outside of the kernels control (GPUs anyone?).
As of today people either compile everything directly on the host before running the container or strangle the name of the container to differentiate between different configs.

Let me use a more likely scenario:

platform:
   architecture: amd64
   os: linux
   features:
    - cpu:broadwell

I added cpu:broadwell above as a hardware feature, but that is not recognise by runtimes unless you patch them. A patch to enable dockerd to deal with it was introduced last time: moby/moby #38715.

But I do not care about that patch anymore...

What to do about it?

The OCI Image spec allows for (hardware) features; back in the last post I elaborated on how this features can be used to specify hardware configurations and let the runtime decide on what to download.
My understanding is that it was introduced to allow for CPU flags to be handled. The idea seems to have been to parse all CPU flags of the host and populate the hardware features. The problem was (IMHO) that for this to work one need to convince image creators to compile a broad set of images so that it is available on DockerHub. For most of the workload it might not even matter much (NGINX optimised?) Defining features for everyone to use is hard - I might even say it is impossible.

Another hard issue is that it needs to be implemented in each runtime. I have a pending PR to the aforementioned opencontainers/image-spec #761 to make the first step to introduce it to the containerd project.

But after some more thought I figured that why not externalise the logic so that the runtime is not bothered?

Hello Metahub

IMHO one way forward might be to just put some logic in between the runtimes and the registries that does all the work.

With this MetaHub will act as an intermediary, take in the request and query the backend registry (e.g. hub.docker.com) to figure out what it has in stock. Using my qnib/bench manifest list as an example one can query DockerHub by hand via manifest-tool:

$ manifest-tool inspect qnib/bench
Name:   qnib/bench (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:7414c0ace0e1ae07744de83984ac60af982fb6846c0527acfc5c46401f0e333e
 * Contains 4 manifest references:
1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
1       Digest: sha256:b8914262d3517a80bccaf8e12c9e9fc5bca5ae804adf025ad175dea1106f0098
1  Mfst Length: 735
1     Platform:
1           -      OS: linux
1           - OS Vers:
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant:
1           - Feature:
1     # Layers: 2
         layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926
         layer 2: digest = sha256:78beee8c3c4b9c7dc2eb37816f113d0b04df2f75fc7750a2df492702655be95e

2    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
2       Digest: sha256:ff1738d040d2e7141a84c30cd780763544009dcf6758bed2a1d4f937eceae7aa
2  Mfst Length: 735
2     Platform:
2           -      OS: linux
2           - OS Vers:
2           - OS Feat: []
2           -    Arch: amd64
2           - Variant:
2           - Feature: cpu:skylake
2     # Layers: 2
         layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926
         layer 2: digest = sha256:de979171fd434c6be674c2bc3bbda6a0c18c4f900cd4aa2f71705bf5b32ecfb4

3    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
3       Digest: sha256:e561d14bf57588b9d096b6bf09fce1ec7e86d3085d1dddfd1f376cce86f68ce6
3  Mfst Length: 735
3     Platform:
3           -      OS: linux
3           - OS Vers:
3           - OS Feat: []
3           -    Arch: amd64
3           - Variant:
3           - Feature: cpu:broadwell
3     # Layers: 2
         layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926
         layer 2: digest = sha256:8c1099e54f20552bce2ebe59624980a7153164caeed8591fef3519d340cab6c0

4    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
4       Digest: sha256:3ff5a83983654319766e8f89a690b0ce28d03c9aa76623becfc2f46c922cf34f
4  Mfst Length: 735
4     Platform:
4           -      OS: linux
4           - OS Vers:
4           - OS Feat: []
4           -    Arch: amd64
4           - Variant:
4           - Feature: cpu:broadwell,nvcap:5.2
4     # Layers: 2
         layer 1: digest = sha256:88286f41530e93dffd4b964e1db22ce4939fffa4a4c665dab8591fbab03d4926
         layer 2: digest = sha256:cd9e8d97d5988ba80906afebc0353f1c9bd08f9262eb85564402858d99582440

Thus, the image name does not represent a single image (manifest), but a list of images indexed by the platform object.

A bit more palatable is the source YAML file, which created this manifest list in the first place:

$cat manifest.yml
image: docker.io/qnib/bench
manifests:
  -
    image: docker.io/qnib/plain-featuretest:generic
    platform:
      architecture: amd64
      os: linux
  -
    image: docker.io/qnib/plain-featuretest:cpu-skylake
    platform:
      architecture: amd64
      os: linux
      features:
        - cpu:skylake
  -
    image: docker.io/qnib/plain-featuretest:cpu-broadwell
    platform:
      architecture: amd64
      os: linux
      features:
        - cpu:broadwell
  -
    image: docker.io/qnib/plain-featuretest:cpu_broadwell-nvcap_52
    platform:
      architecture: amd64
      os: linux
      features:
        - cpu:broadwell
        - nvcap:5.2

Within MetaHub (metahub.qnib.org) you can define machine-types to trim this manifest list down to only a single choice.

Just take a breath: Depending on how the machines log in the runtime will be presented with the right choice. :)

Show me!

Enough talking now! Let us give it a spin...

Login as type1

Assuming the machine in question has a Broadwell CPU we need to login with machine type type1.

$ docker login -u qnib-type1 -p qnib-type1 metahub.qnib.org
Login Succeeded

Downloading and running the image will provide us with a (fake) image that is 'optimised' for Broadwell CPUs.

$ docker run metahub.qnib.org/qnib/bench && docker run -ti --rm metahub.qnib.org/qnib/bench
Unable to find image 'metahub.qnib.org/qnib/bench:latest' locally
latest: Pulling from qnib/bench
Status: Downloaded newer image for metahub.qnib.org/qnib/bench:latest
>> This container is optimised for: cpu:broadwell
$ docker inspect  metahub.qnib.org/qnib/bench |jq '.[].RepoDigests[1]
"metahub.qnib.org/qnib/bench@sha256:f972f05f4ff0c5df22c1f4fc339b14068b8cee96d0525f4fb13dbea84a900c89"

Login as different type

On a Skylake system machine type type2 is your friend:

$ docker login -u qnib-type2 -p qnib-type2 metahub.qnib.org
Login Succeeded
$ docker run metahub.qnib.org/qnib/bench && docker run -ti --rm metahub.qnib.org/qnib/bench
Unable to find image 'metahub.qnib.org/qnib/bench:latest' locally
latest: Pulling from qnib/bench
Status: Downloaded newer image for metahub.qnib.org/qnib/bench:latest
>> This container is optimised for: cpu:skylake
$ docker inspect  metahub.qnib.org/qnib/bench |jq '.[].RepoDigests[1]
"metahub.qnib.org/qnib/bench@sha256:c62049e0707d3461b0a5b69d0ebe322f16a17b4439b5d9844e24cf97d40faa64"

As the content store assumes that the image qnib/bench did not change docker pull will trigger a request to the registry (MetaHub) to check for a different image.
As seen above a new image is downloaded: Status: Downloaded newer image...

The previously downloaded image will still be on the host, but without a name associated with it.

Wrap Up

With MetaHub you can run a heterogenous cluster of different host types and make sure that they are serve an optimised image, while using the same image name in your scheduler.
This puts an end to create images for all the clusters around. If you get a new host type, a new image for this host type has to be created - the old once are untouched.

OpenSource WIP

I just put the project on github.com/qnib/metahub. Please feel free to open issue, contribute or support it by adding a star. :)

As it is WIP it has some limitations.

  1. There is only one user (qnib). For starters that should be fine, but at the end of the day a more sophisticated (like anything else) user management and RBAC should be in place.
  2. metahub.qnib.org is running on a small VM somewhere in the internet. So bandwidth is a finite resource. Please run your own (see below).
  3. Machine-types on metahub.qnib.org are fixed for now. To have a fully working MetaHub - please run your own!

Future Ideas

This project just gets started. It is a gift that keeps giving:

  • Adding CI/CD for images that are not yet there
  • Reverse queries so that MetaHub tells you which hardware configuration you need to run an image
  • Automated performance tests with different permutation of an application
  • ...

Run your own

You can run your own instance of MetaHub by firing up this:

$ docker service create --name metahub --publish 8080:80 \
                        qnib/metahub:2019-06-12.1

If you want hard coded machine types run this:

$ docker service create --name metahub-static --publish 8081:80 \
                        -e STATIC_MACHINES=true qnib/metahub:2019-06-12.1

In order to use it make sure that you use SSL (e.g. via nginx/traefik in front) or add the IP address to your insecure-registries list.