The workload on parallel supercomputers: Modelling the characteristics of rigid jobs

Uri Lublin, Dror G. Feitelson*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

327 Scopus citations

Abstract

The analysis of workloads is important for understanding how systems are used. In addition, workload models are needed as input for the evaluation of new system designs, and for the comparison of system designs. This is especially important in costly large-scale parallel systems. Luckily, workload data are available in the form of accounting logs. Using such logs from three different sites, we analyze and model the job-level workloads with an emphasis on those aspects that are universal to all sites. As many distributions turn out to span a large range, we typically first apply a logarithmic transformation to the data, and then fit it to a novel hyper-Gamma distribution or one of its special cases. This is a generalization of distributions proposed previously, and leads to good goodness-of-fit scores. The parameters for the distribution are found using the iterative EM algorithm. The results of the analysis have been codified in a modeling program that creates a synthetic workload based on the results of the analysis.

Original languageAmerican English
Pages (from-to)1105-1122
Number of pages18
JournalJournal of Parallel and Distributed Computing
Volume63
Issue number11
DOIs
StatePublished - Nov 2003

Bibliographical note

Funding Information:
This research was supported in part by the Israel Science Foundation (Grant No. 219/99). The workload logs on which it is based are available on-line from the Parallel Workloads Archive [19] . The workload log from the SDSC Paragon was graciously provided by Reagan Moore and Allen Downey. The workload log from the KTH SP2 was graciously provided by Lars Malinowsky. The workload log from the LANL CM-5 was graciously provided by Curt Canada. Many thanks to them for making the data available and for their help with background information and interpretation. Thanks also to Prof. Ya'acov Ritov for statistics advice.

Keywords

  • Arrival pattern
  • Parallel jobs
  • Runtime distribution
  • Size distribution
  • Workload model

Fingerprint

Dive into the research topics of 'The workload on parallel supercomputers: Modelling the characteristics of rigid jobs'. Together they form a unique fingerprint.

Cite this