Implementing a blocked Aasen's algorithm with a dynamic scheduler on multicore architectures

Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, Ichitaro Yamazaki*

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

10 Scopus citations

Abstract

Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen's algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques, e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.

Original languageEnglish
Pages895-907
Number of pages13
DOIs
StatePublished - 2013
Externally publishedYes
Event27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States
Duration: 20 May 201324 May 2013

Conference

Conference27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013
Country/TerritoryUnited States
CityBoston, MA
Period20/05/1324/05/13

Fingerprint

Dive into the research topics of 'Implementing a blocked Aasen's algorithm with a dynamic scheduler on multicore architectures'. Together they form a unique fingerprint.

Cite this