ISC 2019 Workshop

The '5th Annual High Performance Container Workshop' workshop was held as part of the International Supercomputing Conference in Frankfurt on June 20nd from 9AM to 6PM at the Marriott Hotel.

Agenda

The first half of the day was spend with introducing the speakers, providing an overview and discuss the topics which are not exclusively HPC specific, but are fundamentals that are also important in non-HPC use cases: Which runtime fits my use-case? How to build my container image? How to distribute the artefacts? Depending on my use-case, discipline, vertical - what should I focus on and what is less important?

Segments

A complete Youtube playlist can be found here

Intro (09:00 - 10:00)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 09:00 Welcome Christian Kniep QNIB Solutions Video/Slides
1 09:05 Intro UberCloud Burak Yenier UberCloud Video/Slides
2 09:10 Intro NVIDIA CJ Newburn NVIDIA Video/Slides
3 09:15 Intro Sylabs Michael Bauer Sylabs Video/Slides
4 09:20 Intro AWS Arthur Petitpierre AWS Video/Slides
5 09:25 Intro Mellanox Dror Goldenberg Mellanox Video/Slides
6 09:30 Intro RedHat Valentin Rothberg RedHat Video/Slides
7 09:35
Workshop Overview, Segments and Personas
Besides describing the workshop 'Personas' are introduces, which will attend the panel discussion with a narrow view of a particular use case in mind (SME, Large/Small Academia & Research Sites, Ops, Infrastructure).
Christian Kniep QNIB Solutions Video/Slides

Runtime (10:00 - 11:00)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 09:40 Introduction and Scope Christian Kniep QNIB Solutions Video/Slides
1 09:45 Current State of root-less dockerd Akihiro Suda NTT Video/Slides
2 09:50 The podman runtime Valentin Rothberg RedHat Video/Slides
3 10:00 The Singularity runtime Michael Bauer Sylabs Video/Slides
4 10:05 The SARUS runtime Lucas Benedicic CSCS Video/Slides
10 10:10 PANEL: Q&A Video
11:00 Coffee Break

Build (11:30 - 12:20)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 11:30
Introduction and Scope
How we build (CI/CD vs interactive) and why we do so. Portability, reproducibility (CI/CD) vs optimization (interactive).
Christian Kniep QNIB Solutions Video/Slides
1 11:35 Rootless build with BuildKit Akihiro Suda NTT Video/Slides
2 11:40 Buildah, a tool that facilitates building OCI images Valentin Rothberg RedHat Video/Slides
3 11:45 Singularity build Michael Bauer Sylabs Video/Slides
4 11:50
Optimize for hardware again!
By adopting containers using the kernel as abstraction, images need to be compatible with all target systems. HW optimization - key to performance are hard to come by. This talk will explain how to craft Dockerfiles and build processes to allow for that again.
Christian Kniep QNIB Solutions Video/Slides
5 11:55 Tools: NVIDIA HPC Container Maker CJ Newburn NVIDIA Video/Slides
6 12:00 Build Tools like SPACK/EasyBuild Massimiliano Culpo EPFL Video/Slides
12:05 Panel: Q&A Video

Distribute (12:20 - 13:00)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 12:20
Introduction and Scope
The audience should get the gist, that distribution is meant to provide a scalable, reliable transport to ship the application. A challenge for the runtime is how to reuse images and containerFS within a clustered setting.
Christian Kniep QNIB Solutions Video/Slides
1 12:25
OCI Image Spec
Principles behind the OCI Image Spec and how it is leveraged.
Akihiro Suda Video/Slides
2 12:30 Singularity Image Format Michael Bauer Sylabs Video/Slides
3 12:35 Skopeo Distribution Tool Valentin Rothberg RedHat Video/Slides
4 12:40 Hardware Optimized Images via MetaHub Registry Proxy Christian Kniep QNIB Solutions Video/Slides
12:45
PANEL: Q&A
When running an (OCI) image on a large amount of nodes, each node downloads the image and create a snapshot to start the container in.
HPC runtimes tend to create a snapshot that resides on a shared file-system. This slot will discuss the benefits and drawbacks.
Video
13:00 Lunch Break

Orchestration/Scheduling (14:00 - 15:15)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 14:00 Introduction and Scope Christian Kniep QNIB Solutions Video/Slides
1 14:01
Simple Orchestration with SWARM
The most simple orchestration out there is most likely SWARM. It has a simple model that explains what needs to be done to run container in a clustered environment. SWARM can be seen as a simple example of scheduling with the developer in mind.
Abdulrahman Azab University of Oslo Video/Slides
2 14:05
Recap on Kubernetes
After having a brief intro to orchestration via SWARM this slot will briefly explain how Kubernetes extends this to provide a more resilient and extendable system.
Daniel Gruber UberCloud Video/Slides
3 14:15 Nextflow to model (bioinformatic) workloads Paolo Di Tommaso CRG Video/Slides
4 14:20
Lustre within Kubernetes
Extending the Kubernetes intro even further; Arthur will explain how AWS puts Lustre within Kubernetes and make it scale.
Arthur Petitpierre AWS Video/Slides
5 14:23
Using K8s operators for containerized RDMA workloads
RDMA is well-known high-performance networking interface for low latency, low overhead communications. RDMA accelerated Kubernetes clusters are set using standard device plugin and CNI interface for InfiniBand or Ethernet. Compute nodes join Kubernetes cluster dynamically. It is desired to advance the user experience for automated configuration and deployment. In this talk we will discuss how Kubernetes operators help to automate, deploy and upgrade infrastructure software components for faster node availability.
Dror Goldenberg Mellanox Video/Slides
6 14:25 Slurm Operator for Kubernetes Michael Bauer Sylabs Video/Slides
7 14:30 AWS Batch Arthur Petitpierre AWS Video/Slides
14:35 PANEL: Q&A Video

Infrastructure (15:15 - 15:30)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 15:15 Introduction and Scope Christian Kniep QNIB Solutions Video/Slides
1 15:16 OpenStack Update and Direction Martial Michel Data Machines Corp Video/Slides
2 15:20 Dynamic HPC in a cloud environment. Arthur Petitpierre AWS Video/Slides

HPC Specific / Distributed Workloads (15:30 - 16:00)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
0 15:25 Introduction and Scope Christian Kniep QNIB Solutions Video/Slides
1 15:27 How AWS blends fast POSIX (Lustre) and object stores (S3) Arthur Petitpierre AWS Video/Slides
2 15:30 RDMA Device Isolation Dror Goldberg Mellanox Video/Slides
3 15:35 PANEL: Q&A Video
16:00 Coffee Break

Use-Cases/Conclusions/Discussion (16:30 - 18:00)

Youtube Video / Combined Slides

# Start Title Speaker Company Links
1 16:30
RDMA-GPU use-case
Heterogeneous cluster architectures are being used for HPC, data science, scientific and ML/DL/AI and other applications. Such platforms leverage high speed, low latency and smart interconnects to work optimally. RDMA has been a de-facto networking technology along with GPUDirect to accelerates CPU to CPU, CPU to GPU and GPU to GPU communications. When such applications are containerized, it poses challenges on configuring, deploying and orchestrating the system devices. In this session, we will discuss the challenges, how to enable containerized application using GPUDirect and RDMA in a Kubernetes cluster.
Dror Goldenberg Mellanox Video/Slides
2 16:40 Mellanox Containerization Journey Dror Goldenberg Mellanox Video/Slides
3 16:55 Looking back on 5y of containerization Burak Yenier UberCloud Video/Slides
4 17:15 NERSC: Looking back Shane Canon NERSC Video/Slides
5 17:30 NVIDIAs journey with Containers CJ Newburn NVIDIA Video/Slides
17:50 PANEL: Q&A Video
18:00 Workshop Ending

Previous ISC Workshops

Abstract

Linux Containers continue to gain momentum within data centers all over the world. They are able to benefit legacy infrastructures by leveraging the lower overhead compared to traditional, hypervisor-based virtualization. But there is more to Linux Containers, which this workshop will explore. Their portability, reproducibility and distribution capabilities outclass all prior technologies and disrupt former monolithic architectures, due to sub-second life cycles and self-service provisioning.

This workshop will outline the current state of Linux Containers in HPC/AI, what challenges are hindering the adoption in HPC/BigData and how containers can foster improvements when applied to the field of HPC, Big Data and AI in the mid- and long-term. By dissecting the different layers within the container ecosystem (runtime, supervision, engine, orchestration, distribution, security, scalability) this workshop will provide a holistic and a state-of-the-container overview, so that participants can make informed discussions on how to start, improve or continue their container adoption.