Communication-optimal parallel recursive rectangular matrix multiplication

James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, Omer Spillinger

Research output: Contribution to conferencePaperpeer-review

91 Scopus citations

Abstract

Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache- and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared- and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.

Original languageEnglish
Pages261-272
Number of pages12
DOIs
StatePublished - 2013
Externally publishedYes
Event27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States
Duration: 20 May 201324 May 2013

Conference

Conference27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013
Country/TerritoryUnited States
CityBoston, MA
Period20/05/1324/05/13

Keywords

  • linear algebra
  • matrix multiplication
  • ommunication-avoiding algorithms

Fingerprint

Dive into the research topics of 'Communication-optimal parallel recursive rectangular matrix multiplication'. Together they form a unique fingerprint.

Cite this