TY - GEN
T1 - Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout
AU - Ballard, Grey
AU - Demmel, James
AU - Lipshitz, Benjamin
AU - Schwartz, Oded
AU - Toledo, Sivan
PY - 2013
Y1 - 2013
N2 - High performance for numerical linear algebra often comes at the expense of stability. Computing the LU decomposition of a matrix via Gaussian Elimination can be organized so that the computation involves regular and efficient data access. However, maintaining numerical stability via partial pivoting involves row interchanges that lead to inefficient data access patterns. To optimize communication efficiency throughout the memory hierarchy we confront two seemingly contradictory requirements: partial pivoting is efficient with column-major layout, whereas a block-recursive layout is optimal for the rest of the computation. We resolve this by introducing a shape morphing procedure that dynamically matches the layout to the computation throughout the algorithm, and show that Gaussian Elimination with partial pivoting can be performed in a communication efficient and cache-oblivious way. Our technique extends to QR decomposition, where computing Householder vectors prefers a different data layout than the rest of the computation.
AB - High performance for numerical linear algebra often comes at the expense of stability. Computing the LU decomposition of a matrix via Gaussian Elimination can be organized so that the computation involves regular and efficient data access. However, maintaining numerical stability via partial pivoting involves row interchanges that lead to inefficient data access patterns. To optimize communication efficiency throughout the memory hierarchy we confront two seemingly contradictory requirements: partial pivoting is efficient with column-major layout, whereas a block-recursive layout is optimal for the rest of the computation. We resolve this by introducing a shape morphing procedure that dynamically matches the layout to the computation throughout the algorithm, and show that Gaussian Elimination with partial pivoting can be performed in a communication efficient and cache-oblivious way. Our technique extends to QR decomposition, where computing Householder vectors prefers a different data layout than the rest of the computation.
KW - Cache oblivious algorithms
KW - Communication-avoiding algorithms
KW - Matrix data layouts
KW - Matrix factorization
UR - http://www.scopus.com/inward/record.url?scp=84883500668&partnerID=8YFLogxK
U2 - 10.1145/2486159.2486198
DO - 10.1145/2486159.2486198
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84883500668
SN - 9781450315722
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 232
EP - 240
BT - SPAA 2013 - Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures
PB - Association for Computing Machinery
T2 - 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2013
Y2 - 23 July 2013 through 25 July 2013
ER -