## Abstract

Exascale machines have a small mean time between failures, necessitating fault tolerance. Out-of-the-box fault-tolerant solutions, such as checkpoint-restart and replication, apply to any algorithm but incur significant overhead costs. Long integer multiplication is a fundamental kernel in numerical linear algebra and cryptography. The na ve, schoolbook multiplication algorithm runs inΘ ( n2k while Toom-Cook algorithms runs in Θ ( nlogκ (2κ-1) for 2 ≤ κ. We obtain the first efficient fault-tolerant parallel Toom-Cook algorithm. While asymptotically faster FFT-based algorithms exist, Toom-Cook algorithms are often favored in practice on small scale and on supercomputers. Our algorithm enables fault tolerance with negligible overhead costs. Compared to existing, general-purpose, faulttolerant solutions, our algorithm reduces the arithmetic and communication (bandwidth) overhead costs by a factor of Θ P (2κ-1) (where P is the number of processors). To this end, we adapt the fault-tolerant BFS-DFS method of Birnbaum et al. (2020) for fast matrix multiplication and combine it with a coding strategy tailored for Toom-Cook. This eliminates the need for recomputations, resulting in a much faster algorithm..

Original language | English |
---|---|

Title of host publication | SPAA 2024 - Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures |

Publisher | Association for Computing Machinery |

Pages | 207-218 |

Number of pages | 12 |

ISBN (Electronic) | 9798400704161 |

State | Published - 17 Jun 2024 |

Event | 36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024 - Nantes, France Duration: 17 Jun 2024 → 21 Jun 2024 |

### Publication series

Name | Annual ACM Symposium on Parallelism in Algorithms and Architectures |
---|---|

ISSN (Print) | 1548-6109 |

### Conference

Conference | 36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024 |
---|---|

Country/Territory | France |

City | Nantes |

Period | 17/06/24 → 21/06/24 |

### Bibliographical note

Publisher Copyright:© 2024 Owner/Author.

## Keywords

- fault tolerance
- i/o complexity
- long integer multiplication
- parallel computing
- toom-cook