Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication

Ariful Azad, Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams

Research output: Contribution to journalArticlepeer-review

55 Scopus citations

Abstract

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the at MPI model on Erd}os{Rffenyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.

Original languageAmerican English
Pages (from-to)C624-C651
JournalSIAM Journal of Scientific Computing
Volume38
Issue number6
DOIs
StatePublished - 2016

Bibliographical note

Funding Information:
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract DE-AC02-05CH11231. This research was supported in part by an appointment to the Sandia National Laboratories Truman Fellowship in National Security Science and Engineering, sponsored by Sandia Corporation (a wholly owned subsidiary of Lockheed Martin Corporation) as Operator of Sandia National Laboratories under its U.S. Department of Energy Contract DE-AC04- 94AL85000. The research of some of the authors was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0010200, by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, X-Stack program under awards DE-SC0008699, DE-SC0008700, and AC02-05CH11231, and by DARPA award HR0011-12-2-0016, with contributions from Intel, Oracle, and MathWorks. Research is supported by grants 1878/14 and 1901/14 from the Israel Science Foundation (founded by the Israel Academy of Sciences and Humanities) and grant 3-10891 from the Ministry of Science and Technology, Israel. Research is also supported by the Einstein Foundation and the Minerva Foundation. This work was supported by the HUJI Cyber Security Research Center in conjunction with the Israel National Cyber Bureau in the Prime Minister's Office. This paper is supported by the Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI). This research was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231, and resources of the Oak Ridge Leadership Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC05-00OR22725.

Publisher Copyright:
© 2016 Society for Industrial and Applied Mathematics.

Keywords

  • 2.5D algorithms
  • 2D decomposition
  • 3D algorithms
  • Graph algorithms
  • Multithreading
  • Numerical linear algebra
  • Parallel computing
  • SpGEMM
  • Sparse matrix-matrix multiplication

Fingerprint

Dive into the research topics of 'Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication'. Together they form a unique fingerprint.

Cite this