Beating MKL and ScaLAPACK at rectangular matrix multiplication using the BFS/DFS approach

James Demmel*, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, Omer Spillinger

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

We present CARMA, the first implementation of a communication-avoiding parallel rectangular matrix multiplication algorithm, attaining significant speedups over both MKL and ScaLAPACK. Combining the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (SPAA '12) with the dimension splitting technique of Frigo, Leiserson, Prokop and Ramachandron (FOCS '99), CARMA is communication-optimal, cache- and network-oblivious, and simple to implement (60 lines of code for the shared-memory version). Since CARMA minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well on both shared- and distribute-memory machines.

Original languageEnglish
Title of host publicationProceedings - 2012 SC Companion
Subtitle of host publicationHigh Performance Computing, Networking Storage and Analysis, SCC 2012
Pages1370
Number of pages1
DOIs
StatePublished - 2012
Event2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 - Salt Lake City, UT, United States
Duration: 10 Nov 201216 Nov 2012

Publication series

NameProceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Conference

Conference2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Country/TerritoryUnited States
CitySalt Lake City, UT
Period10/11/1216/11/12

Fingerprint

Dive into the research topics of 'Beating MKL and ScaLAPACK at rectangular matrix multiplication using the BFS/DFS approach'. Together they form a unique fingerprint.

Cite this