The HPC Container Conformance Project

A lot of HPC sites have an unwritten understanding of how to use them:

You need a user to log in. Either on the submithost or when a job launches on the compute nodes,
Once logged in, your environment is setup with default apps, specific application are available using module load from a central software share,
you might be even able to install new software in your home directory.

A central software-share with curated sets of scientific applications and libraries is/was a powerful concept. But with containers this falls a part to some degree...

One way of dealing with container in such environment is 'just' extend the module concept to include containers, shpc by Vanessa is one example. Tempting for some I admit, but didn't we get into containers that you can share them freely with a promise around portability?

Problem

I get where the idea is coming from tho; runing container from different origins is painful as all of them are likely constructed differently and it's hard to make them work out of the box.
Having differnet origins is not the root problem, not knowing what to expect and how to tewak the container to get the most out of the executing environment. Thus, creating a process to 'onboard' containers to make them work on the host make sense.

Within the HPC Container Advisory Council (a monthly call of container folks) we pushed a couple of times to converge towards a better place in portability while keeping performance. Back in September 2022 CJ from Nvidia proposed a new push by picking two application and run through how they can be build and annotated so that they are able to run on as many sites as possible.

That's were the HPC Container Conformance project (hpc3 in short) comes in.

HPC Container Conformance

The HPC3 project tackles the two main problems with containers today:

Define the expected behaviour of HPC containers so that they can be swapped out with ease
Define a set of mandatory and optional annotations for container images so that end-users and SysAdmins get an idea of a. what is the content of the container image (SBOM) and b. how the container expects to be tweaked to get the most of the underlying execution environment.

This post will just introduce the concept - I'll follow up with more post about the specifics.

A nice paper to read in that context is Recommendations for the packaging and containerizing of bioinformatics software from 2019. It sets a baseline of what we want to achieve as HPC community as well. We'll add annotations to deal with performance and portability.

Selected Applications

To get started wihtout boiling the ocean we picked GROMACS (as an HPC app) and PyTorch (to represent AI/ML workloads) as guinea pig applications. The result of this first stab at the problem is suppose to be adoped by many more applications tho - but we need to get started somewhere...

Expected Container Behaviour

The first part should be rather uncontroversial: What consitutes a container used (primarily) in batch systems?

We expect the contianer to drop into a shell with the environment prepared to use the main application and tools provided
A container should only provide one version of the application and its dependencies - one module load view if you will
The ENTRYPOINT should be as small as possible with the least amount of runtime tweaks possible.
The CMD might print a help message how to use the container. This definition should allow us to use a container interactively (docker run) or within a batch system and even swap them out as they all behave in the same way.

To debug a submit-script you might want to mout the file and iterate by editing the file and executing it wihtin the container, like that:

Behaviour Example

docker run -ti --rm -v $(pwd):/data -w /data \
       quay.io/cqnib/gromacs/gcc-7.3.1/2021.5/tmpi:multi-arch
bash-4.2#

Back when I created a container with a smart entrypoint, which is anoying to debug as it expect the input file to process as argument (CMD).

$ docker run --platform=linux/amd64 -ti --rm \
         quay.io/cqnib/gromacs-2021.5_gcc-7.3.1:x86_64_v2
[ERROR] Arguments: ''
[ERROR] This container expects the first argument to be the input file and the file to exists

Supercontainer Annotations

The unchartered territory part of the project is to converge towards a common set of annoations for HPC containers.

Labels vs Annotations

Often labels and annotations are used interchangeably in the container context. In practice I admit that is a natural tendancy, but let's make sure we all understand what is what here:

Labels are part of the Container Image (each and every image seperatly) as they are included in the image configuration object. Thus, when downloading ubuntu:22 for ARM and for AMD64 you might have diffferent labels for each image.
Annotations on the other hand are part of the OCI standard for Manifests and Image Indexes and are independent of the configuration object of the container.

My hope in early 2023 is that we build up from the image and work our way up to the Image Index. * The config object of a container image is the source of truth for the image. * The OCI Manifest duplicates the labels of the image into annotation of the manifest. * The Image Index aggregates the labels from all images in some way. The lowest comon denominator of sorts.

I admit that this is a bit confusing - I reckon it won't matter if we use label and annotations as synonyms even though they are not. As long the key/value pairs they include are agreed upon.

Speaking about hope - I am hopeful that the HPC community can agree on a certain set of annotations by the end of the year (SC23?).

As said above, I'll add more post about HPC3 in the coming weeks - let me touch on some ideas so far.

Mandatory / Optional Annotations

When discusing annotations we are going to have some that are mandatory - otherwise we won't consider the container image to be HPC Container Conformant.
One such annotations is going to be for which taget the container is compiled and how specific the target is choosen:

org.supercontainers.hardware.cpu.optimized.mode: Compiled for an architecture (arm64/v8, or x86_64), genericMicro (x86_64_v4), or a specific microarchitecture (skylake_avx512).
org.supercontainers.hardware.cpu.optimized.version: the specific value for the different modes (x86_64, x86_64_v4, skylake/zen3,skylake_avx512).

Other annotations are obviously optional, like which cuda ABI version is used - this is only interesting for containers using CUDA.

Presentation 2023/January

I reported the current state of affairs in December 2022 and mid January (Slides 'HPC_OCI_Conformance_v10.pdf'), with (hopefully) more discussion happening in the upcoming weeks and month.