Abstract
In this paper we describe the hardware and application-inherent challenges that future exascale systems pose to high-performance computing (HPC) and propose a system architecture that addresses them. This architecture is based on proven building blocks and few principles: (1) a fast light-weight kernel that is supported by a virtualized Linux for tasks that are not performance critical, (2) decentralized load and health management using fault-tolerant gossip-based information dissemination, (3) a maximally-parallel checkpoint store for cheap checkpoint/restart in the presence of frequent component failures, and (4) a runtime that enables applications to interact with the underlying system platform through new interfaces. The paper discusses the vision behind FFMK and the current state of a prototype implementation of the system, which is based on a microkernel and an adapted MPI runtime.
| Original language | English |
|---|---|
| Title of host publication | Software for Exascale Computing - SPPEXA 2013-2015 |
| Editors | Wolfgang E. Nagel, Hans-Joachim Bungartz, Philipp Neumann |
| Publisher | Springer Verlag |
| Pages | 405-426 |
| Number of pages | 22 |
| ISBN (Print) | 9783319405261 |
| DOIs | |
| State | Published - 2016 |
| Event | International Conference on Software for Exascale Computing, SPPEXA 2015 - Munich, Germany Duration: 25 Jan 2016 → 27 Jan 2016 |
Publication series
| Name | Lecture Notes in Computational Science and Engineering |
|---|---|
| Volume | 113 |
| ISSN (Print) | 1439-7358 |
Conference
| Conference | International Conference on Software for Exascale Computing, SPPEXA 2015 |
|---|---|
| Country/Territory | Germany |
| City | Munich |
| Period | 25/01/16 → 27/01/16 |
Bibliographical note
Publisher Copyright:© Springer International Publishing Switzerland 2016.