Communication-avoiding parallel Strassen: Implementation and performance

Benjamin Lipshitz, Grey Ballard, James Demmel, Oded Schwartz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

48 Scopus citations

Abstract

Matrix multiplication is a fundamental kernel of many high performance and scientific computing applications. Most parallel implementations use classical O(n3) matrix multiplication, even though there exist algorithms with lower arithmetic complexity. We recently presented a new Communication-Avoiding Parallel Strassen algorithm (CAPS), based on Strassen's fast matrix multiplication, that minimizes communication (SPAA'12). It communicates asymptotically less than all classical and all previous Strassen-based algorithms, and it attains theoretical lower bounds. In this paper we show that CAPS is also faster in practice. We benchmark and compare its performance to previous algorithms on Hopper (Cray XE6), Intrepid (IBM BG/P), and Franklin (Cray XT4). We demonstrate significant speedups over previous algorithms both for large matrices and for small matrices on large numbers of processors. We model and analyze the performance of CAPS and predict its performance on future exascale platforms.

Original languageEnglish
Title of host publication2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012 - Salt Lake City, UT, United States
Duration: 10 Nov 201216 Nov 2012

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
Country/TerritoryUnited States
CitySalt Lake City, UT
Period10/11/1216/11/12

Fingerprint

Dive into the research topics of 'Communication-avoiding parallel Strassen: Implementation and performance'. Together they form a unique fingerprint.

Cite this