Skip to main navigation Skip to search Skip to main content

Minimizing Processor Count for Fault Tolerant Toom-Cook Algorithms

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Long integer multiplication is a fundamental kernel in various scientific areas, including numerical linear algebra, cryptography, and quantum computing. Toom-Cook-k algorithms run in Θ(nlogk (2k-1)) and are often favored in practice over the Θ (n2) schoolbook algorithm. Faults are a major bottleneck in large-scale computing. The growing size of machines and decreasing operating voltages led to a reduction in the mean time between failures, with modern exascale systems experiencing an error per second. While standard fault-tolerant solutions, such as checkpoint-restart and replication, are straightforward to implement, they incur significant overhead and limit overall system utilization. Algorithm-based fault-tolerant solutions offer a more efficient alternative by leveraging the algorithm's structure, for instance, by incorporating erasure codes into the algorithm. Nissim, Schwartz, and Spiizer (2024) proposed an algorithm-based fault-tolerant solution for the parallel Toom-Cook algorithm. Their solution incurs minor arithmetic and communication costs overheads, but requires a considerable number of additional processors. We extend their methodology and introduce a fault-tolerant parallel Toom-Cook-k algorithm that significantly reduces the number of additional required processors, from (2k - 1). f to f, where f is the number of tolerable simultaneous faults. Our error-correcting technique integrates Toom-Cook's inherent fault-tolerance properties and combines them with the fault-tolerant BFS-DFS method of Birnbaum et al. (2020). Our solution preserves similarly low arithmetic and communication overheads, representing a substantial improvement in the efficiency and practicality of fault-tolerant Toom-Cook algorithms.

Original languageEnglish
Title of host publicationSPAA 2025 - Proceedings of the 2025 37th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages514-524
Number of pages11
ISBN (Electronic)9798400712586
DOIs
StatePublished - 16 Jul 2025
Event37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025 - Portland, United States
Duration: 28 Jul 20251 Aug 2025

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures
ISSN (Print)1548-6109

Conference

Conference37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025
Country/TerritoryUnited States
CityPortland
Period28/07/251/08/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computing Machinery. All rights reserved.

Keywords

  • Fault Tolerance
  • I/O Complexity
  • Long Integer Multiplication
  • Parallel Computing
  • Toom-Cook

Fingerprint

Dive into the research topics of 'Minimizing Processor Count for Fault Tolerant Toom-Cook Algorithms'. Together they form a unique fingerprint.

Cite this