Abstract
As parallel jobs get bigger in size and finer in granularity, "system noise" is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of "system noise" from the computation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes. We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly proportional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts ("ticks"), that are used by all general-purpose OSs as a means of maintaining control. This is shown for various grain sizes, platforms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call "smart timers". This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted.
Original language | English |
---|---|
Pages | 303-312 |
Number of pages | 10 |
DOIs | |
State | Published - 2005 |
Event | ICS05 - 19th ACM International Conference on Supercomputing - Cambridge, MA, United States Duration: 20 Jun 2005 → 22 Jun 2005 |
Conference
Conference | ICS05 - 19th ACM International Conference on Supercomputing |
---|---|
Country/Territory | United States |
City | Cambridge, MA |
Period | 20/06/05 → 22/06/05 |
Keywords
- HPC
- Modeling system noise
- Operating systems
- Smart timers
- Synchronization
- Ticks
- Timer interrupts
- Timing services