TY - GEN
T1 - Improving and stabilizing parallel computer performance using adaptive backfilling
AU - Talby, David
AU - Feitelson, Dror G.
PY - 2005
Y1 - 2005
N2 - The scheduler is a key component in determining the overall performance of a parallel computer, and as we show here, the schedulers in wide use today exhibit large unexplained gaps in performance during their operation. Also, different scheduling algorithms often vary in the gaps they show, suggesting that choosing the correct scheduler for each time frame can improve overall performance. We present two adaptive algorithms that achieve this: One chooses by recent past performance, and the other by the recent average degree of parallelism, which is shown to be correlated to algorithmic superiority. Simulation results for the algorithms on production workloads are analyzed, and illustrate unique features of the chaotic temporal structure of parallel workloads. We provide best parameter configurations for each algorithm, which both achieve average improvements of 10% in performance and 35% in stability for the tested workloads.
AB - The scheduler is a key component in determining the overall performance of a parallel computer, and as we show here, the schedulers in wide use today exhibit large unexplained gaps in performance during their operation. Also, different scheduling algorithms often vary in the gaps they show, suggesting that choosing the correct scheduler for each time frame can improve overall performance. We present two adaptive algorithms that achieve this: One chooses by recent past performance, and the other by the recent average degree of parallelism, which is shown to be correlated to algorithmic superiority. Simulation results for the algorithms on production workloads are analyzed, and illustrate unique features of the chaotic temporal structure of parallel workloads. We provide best parameter configurations for each algorithm, which both achieve average improvements of 10% in performance and 35% in stability for the tested workloads.
UR - http://www.scopus.com/inward/record.url?scp=33746272430&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2005.252
DO - 10.1109/IPDPS.2005.252
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:33746272430
SN - 0769523129
SN - 0769523129
SN - 9780769523125
T3 - Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
SP - 84a
BT - Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
T2 - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
Y2 - 4 April 2005 through 8 April 2005
ER -