TY - GEN
T1 - Communication-avoiding parallel Strassen
T2 - 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
AU - Lipshitz, Benjamin
AU - Ballard, Grey
AU - Demmel, James
AU - Schwartz, Oded
PY - 2012
Y1 - 2012
N2 - Matrix multiplication is a fundamental kernel of many high performance and scientific computing applications. Most parallel implementations use classical O(n3) matrix multiplication, even though there exist algorithms with lower arithmetic complexity. We recently presented a new Communication-Avoiding Parallel Strassen algorithm (CAPS), based on Strassen's fast matrix multiplication, that minimizes communication (SPAA'12). It communicates asymptotically less than all classical and all previous Strassen-based algorithms, and it attains theoretical lower bounds. In this paper we show that CAPS is also faster in practice. We benchmark and compare its performance to previous algorithms on Hopper (Cray XE6), Intrepid (IBM BG/P), and Franklin (Cray XT4). We demonstrate significant speedups over previous algorithms both for large matrices and for small matrices on large numbers of processors. We model and analyze the performance of CAPS and predict its performance on future exascale platforms.
AB - Matrix multiplication is a fundamental kernel of many high performance and scientific computing applications. Most parallel implementations use classical O(n3) matrix multiplication, even though there exist algorithms with lower arithmetic complexity. We recently presented a new Communication-Avoiding Parallel Strassen algorithm (CAPS), based on Strassen's fast matrix multiplication, that minimizes communication (SPAA'12). It communicates asymptotically less than all classical and all previous Strassen-based algorithms, and it attains theoretical lower bounds. In this paper we show that CAPS is also faster in practice. We benchmark and compare its performance to previous algorithms on Hopper (Cray XE6), Intrepid (IBM BG/P), and Franklin (Cray XT4). We demonstrate significant speedups over previous algorithms both for large matrices and for small matrices on large numbers of processors. We model and analyze the performance of CAPS and predict its performance on future exascale platforms.
UR - http://www.scopus.com/inward/record.url?scp=84877716093&partnerID=8YFLogxK
U2 - 10.1109/SC.2012.33
DO - 10.1109/SC.2012.33
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84877716093
SN - 9781467308069
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -