Fault-Tolerant Parallel Integer Multiplication

Roy Nissim, Oded Schwartz, Yuval Spiizer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Exascale machines have a small mean time between failures, necessitating fault tolerance. Out-of-the-box fault-tolerant solutions, such as checkpoint-restart and replication, apply to any algorithm but incur significant overhead costs. Long integer multiplication is a fundamental kernel in numerical linear algebra and cryptography. The na ve, schoolbook multiplication algorithm runs inΘ ( n2k while Toom-Cook algorithms runs in Θ ( nlogκ (2κ-1) for 2 ≤ κ. We obtain the first efficient fault-tolerant parallel Toom-Cook algorithm. While asymptotically faster FFT-based algorithms exist, Toom-Cook algorithms are often favored in practice on small scale and on supercomputers. Our algorithm enables fault tolerance with negligible overhead costs. Compared to existing, general-purpose, faulttolerant solutions, our algorithm reduces the arithmetic and communication (bandwidth) overhead costs by a factor of Θ P (2κ-1) (where P is the number of processors). To this end, we adapt the fault-tolerant BFS-DFS method of Birnbaum et al. (2020) for fast matrix multiplication and combine it with a coding strategy tailored for Toom-Cook. This eliminates the need for recomputations, resulting in a much faster algorithm..

Original languageEnglish
Title of host publicationSPAA 2024 - Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages207-218
Number of pages12
ISBN (Electronic)9798400704161
DOIs
StatePublished - 17 Jun 2024
Event36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024 - Nantes, France
Duration: 17 Jun 202421 Jun 2024

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures
ISSN (Print)1548-6109

Conference

Conference36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024
Country/TerritoryFrance
CityNantes
Period17/06/2421/06/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • fault tolerance
  • i/o complexity
  • long integer multiplication
  • parallel computing
  • toom-cook

Fingerprint

Dive into the research topics of 'Fault-Tolerant Parallel Integer Multiplication'. Together they form a unique fingerprint.

Cite this