Abstract
Long integer multiplication is a fundamental kernel in various scientific areas, including numerical linear algebra, cryptography, and quantum computing. Toom-Cook-k algorithms run in Θ(nlogk (2k-1)) and are often favored in practice over the Θ (n2) schoolbook algorithm. Faults are a major bottleneck in large-scale computing. The growing size of machines and decreasing operating voltages led to a reduction in the mean time between failures, with modern exascale systems experiencing an error per second. While standard fault-tolerant solutions, such as checkpoint-restart and replication, are straightforward to implement, they incur significant overhead and limit overall system utilization. Algorithm-based fault-tolerant solutions offer a more efficient alternative by leveraging the algorithm's structure, for instance, by incorporating erasure codes into the algorithm. Nissim, Schwartz, and Spiizer (2024) proposed an algorithm-based fault-tolerant solution for the parallel Toom-Cook algorithm. Their solution incurs minor arithmetic and communication costs overheads, but requires a considerable number of additional processors. We extend their methodology and introduce a fault-tolerant parallel Toom-Cook-k algorithm that significantly reduces the number of additional required processors, from (2k - 1). f to f, where f is the number of tolerable simultaneous faults. Our error-correcting technique integrates Toom-Cook's inherent fault-tolerance properties and combines them with the fault-tolerant BFS-DFS method of Birnbaum et al. (2020). Our solution preserves similarly low arithmetic and communication overheads, representing a substantial improvement in the efficiency and practicality of fault-tolerant Toom-Cook algorithms.
| Original language | English |
|---|---|
| Title of host publication | SPAA 2025 - Proceedings of the 2025 37th ACM Symposium on Parallelism in Algorithms and Architectures |
| Publisher | Association for Computing Machinery |
| Pages | 514-524 |
| Number of pages | 11 |
| ISBN (Electronic) | 9798400712586 |
| DOIs | |
| State | Published - 16 Jul 2025 |
| Event | 37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025 - Portland, United States Duration: 28 Jul 2025 → 1 Aug 2025 |
Publication series
| Name | Annual ACM Symposium on Parallelism in Algorithms and Architectures |
|---|---|
| ISSN (Print) | 1548-6109 |
Conference
| Conference | 37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025 |
|---|---|
| Country/Territory | United States |
| City | Portland |
| Period | 28/07/25 → 1/08/25 |
Bibliographical note
Publisher Copyright:© 2025 Association for Computing Machinery. All rights reserved.
Keywords
- Fault Tolerance
- I/O Complexity
- Long Integer Multiplication
- Parallel Computing
- Toom-Cook
Fingerprint
Dive into the research topics of 'Minimizing Processor Count for Fault Tolerant Toom-Cook Algorithms'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver