TY - GEN
T1 - Instability in parallel job scheduling simulation
T2 - 20th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2006
AU - Tsafrir, Dan
AU - Feitelson, Dror G.
PY - 2006
Y1 - 2006
N2 - The performance of computer systems depends, among other things, on the workload. This motivates the use of real workloads (as recorded in activity logs) to drive simulations of new designs. Unfortunately, real workloads may contain various anomalies that contaminate the data. A previously unrecognized type of anomaly is workload flurries: rare surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. We find that long workloads often include at least one such event. We show that in the context of parallel job scheduling these events can have a significant effect on performance evaluation results, e.g. a very small perturbation of the simulation conditions might lead to a large and disproportional change in the outcome. This instability is due to jobs in the flurry being effected in unison, a consequence of the flurry's repetitive nature. We therefore advocate that flurries be filtered out before the workload is used, in order to achieve stable and more reliable evaluation results (analogously to the removal of outliers in statistical analysis). At the same time, we note that more research is needed on the possible effects of flurries.
AB - The performance of computer systems depends, among other things, on the workload. This motivates the use of real workloads (as recorded in activity logs) to drive simulations of new designs. Unfortunately, real workloads may contain various anomalies that contaminate the data. A previously unrecognized type of anomaly is workload flurries: rare surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. We find that long workloads often include at least one such event. We show that in the context of parallel job scheduling these events can have a significant effect on performance evaluation results, e.g. a very small perturbation of the simulation conditions might lead to a large and disproportional change in the outcome. This instability is due to jobs in the flurry being effected in unison, a consequence of the flurry's repetitive nature. We therefore advocate that flurries be filtered out before the workload is used, in order to achieve stable and more reliable evaluation results (analogously to the removal of outliers in statistical analysis). At the same time, we note that more research is needed on the possible effects of flurries.
UR - http://www.scopus.com/inward/record.url?scp=33847150570&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2006.1639311
DO - 10.1109/IPDPS.2006.1639311
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:33847150570
SN - 1424400546
SN - 9781424400546
T3 - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
BT - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
PB - IEEE Computer Society
Y2 - 25 April 2006 through 29 April 2006
ER -