Backfilling with lookahead to optimize the packing of parallel jobs

Edi Shmueli, Dror G. Feitelson*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

71 Scopus citations


The utilization of parallel computers depends on how jobs are packed together: if the jobs are not packed tightly, resources are lost due to fragmentation. The problem is that the goal of high utilization may conflict with goals of fairness or even progress for all jobs. The common solution is to use backfilling, which combines a reservation for the first job in the interest of progress with packing of later jobs to fill in holes and increase utilization. However, backfilling considers the queued jobs one at a time, and thus might miss better packing opportunities. We propose the use of dynamic programming to find the best packing possible given the current composition of the queue, thus maximizing the utilization on every scheduling step. Simulations of this algorithm, called lookahead optimizing scheduler (LOS), using trace files from several IBM SP parallel systems, show that LOS indeed improves utilization, and thereby reduces the mean response time and mean slowdown of all jobs. Moreover, it is actually possible to limit the lookahead depth to about 50 jobs and still achieve essentially the same results. Finally, we experimented with selecting among alternative sets of jobs that achieve the same utilization. Surprising results indicate that choosing the set at the head of the queue does not necessarily guarantee best performance. Instead, repeatedly selecting the set with the maximal overall expected slowdown boosts performance when compared to all other alternatives checked.

Original languageAmerican English
Pages (from-to)1090-1107
Number of pages18
JournalJournal of Parallel and Distributed Computing
Issue number9
StatePublished - Sep 2005


  • Backfilling
  • Optimal packing
  • Parallel job scheduling


Dive into the research topics of 'Backfilling with lookahead to optimize the packing of parallel jobs'. Together they form a unique fingerprint.

Cite this