Accelerating Distributed Matrix Multiplication with 4-Dimensional Polynomial Codes

Roy Nissim*, Oded Schwartz*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

A single straggler worker may delay an entire distributed system. The state-of-the-art strategies for mitigating delays in large-scale distributed matrix multiplication are polynomial-based coded computations such as the Polynomial Codes and Entangled Polynomial Codes. While such strategies deal with stragglers efficiently, they discard partial computations performed by stragglers. Hence, they are sub-optimal. Here, we present the Multi Entangled Polynomial Codes, a straggler mitigation strategy that utilizes the computations performed by all workers and significantly reduces the running time. Furthermore, it allows the final output to be decoded before any worker completes its tasks, thereby breaking the lower bound of Yu, Maddah-Ali, and Avestimehr (2020). Previous studies that utilize partial computations performed by stragglers require large Maximal Distance Separable codes, resulting in high overhead costs. In contrast, our strategy requires short codes comparable to Entangled Polynomial Codes. Thus, we preserve efficient encoding and decoding complexity and reduce the arithmetic overhead of previous solutions by a factor of (Formula presented), where N and W are the matrices dimension and the number of workers, respectively. We provide experimental results on an Amazon EC2 cluster that demonstrate up to 15% speedup over previous strategies. Moreover, we show that our strategy is optimal up to a factor of (1 + o(1)).

Original languageEnglish
Title of host publicationSIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023
EditorsJonathan Berry, David Shmoys, Lenore Cowen, Uwe Naumann
PublisherSociety for Industrial and Applied Mathematics Publications
Pages134-146
Number of pages13
ISBN (Electronic)9781713899631
StatePublished - 2023
Event2nd SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023 - Seattle, United States
Duration: 31 May 20232 Jun 2023

Publication series

NameSIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023

Conference

Conference2nd SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023
Country/TerritoryUnited States
CitySeattle
Period31/05/232/06/23

Bibliographical note

Publisher Copyright:
© 2023 Copyright for this paper is retained by the authors.

Fingerprint

Dive into the research topics of 'Accelerating Distributed Matrix Multiplication with 4-Dimensional Polynomial Codes'. Together they form a unique fingerprint.

Cite this