Ffmk: A fast and fault-tolerant microkernel-based system for exascale computing

Carsten Weinhold*, Adam Lackorzynski, Jan Bierbaum, Martin Küttler, Maksym Planeta, Hannes Weisbach, Matthias Hille, Hermann Härtig, Alexander Margolin, Dror Sharf, Ely Levy, Pavel Gak, Amnon Barak, Masoud Gholami, Florian Schintke, Thorsten Schütt, Alexander Reinefeld, Matthias Lieber, Wolfgang E. Nagel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations

Abstract

The FFMK project designs, builds and evaluates a system-software architecture to address the challenges expected in Exascale systems. In particular, these challenges include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failures. The FFMK OS platform is built upon a multi-kernel architecture, which combines the L4Re microkernel and a virtualized Linux kernel into a noise-free, yet feature-rich execution environment. It further includes global, distributed platform management and system-level optimization services that transparently minimize checkpoint/restart overhead for applications. The project also researched algorithms to make collective operations fault tolerant in presence of failing nodes. In this paper, we describe the basic components, algorithms, and services we developed in Phase 2 of the project.

Original languageEnglish
Title of host publicationLecture Notes in Computational Science and Engineering
PublisherSpringer
Pages483-516
Number of pages34
DOIs
StatePublished - 2020

Publication series

NameLecture Notes in Computational Science and Engineering
Volume136
ISSN (Print)1439-7358
ISSN (Electronic)2197-7100

Bibliographical note

Publisher Copyright:
© The Author(s) 2020.

Fingerprint

Dive into the research topics of 'Ffmk: A fast and fault-tolerant microkernel-based system for exascale computing'. Together they form a unique fingerprint.

Cite this