TY - GEN
T1 - Beating MKL and ScaLAPACK at rectangular matrix multiplication using the BFS/DFS approach
AU - Demmel, James
AU - Eliahu, David
AU - Fox, Armando
AU - Kamil, Shoaib
AU - Lipshitz, Benjamin
AU - Schwartz, Oded
AU - Spillinger, Omer
PY - 2012
Y1 - 2012
N2 - We present CARMA, the first implementation of a communication-avoiding parallel rectangular matrix multiplication algorithm, attaining significant speedups over both MKL and ScaLAPACK. Combining the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (SPAA '12) with the dimension splitting technique of Frigo, Leiserson, Prokop and Ramachandron (FOCS '99), CARMA is communication-optimal, cache- and network-oblivious, and simple to implement (60 lines of code for the shared-memory version). Since CARMA minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well on both shared- and distribute-memory machines.
AB - We present CARMA, the first implementation of a communication-avoiding parallel rectangular matrix multiplication algorithm, attaining significant speedups over both MKL and ScaLAPACK. Combining the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (SPAA '12) with the dimension splitting technique of Frigo, Leiserson, Prokop and Ramachandron (FOCS '99), CARMA is communication-optimal, cache- and network-oblivious, and simple to implement (60 lines of code for the shared-memory version). Since CARMA minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well on both shared- and distribute-memory machines.
UR - http://www.scopus.com/inward/record.url?scp=84876576999&partnerID=8YFLogxK
U2 - 10.1109/SC.Companion.2012.195
DO - 10.1109/SC.Companion.2012.195
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84876576999
SN - 9780769549569
T3 - Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
SP - 1370
BT - Proceedings - 2012 SC Companion
T2 - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -