## Abstract

Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on the total number of reads and writes. However, most previous work does not distinguish between the costs of reads and writes. Writes can be much more expensive than reads in some current and emerging storage devices such as nonvolatile memories. This motivates us to ask whether there are lower bounds on the number of writes that certain algorithms must perform, and whether these bounds are asymptotically smaller than bounds on the sum of reads and writes together. When these smaller lower bounds exist, we then ask when they are attainable, we call such algorithms "write-avoiding" (WA), to distinguish them from "communication-avoiding" (CA) algorithms, which only minimize the sum of reads and writes. We identify a number of cases in linear algebra and direct N-body methods where known CA algorithms are also WA (some are and some aren't). We also identify classes of algorithms, including Strassen's matrix multiplication, Cooley-Tukey FFT, and cache oblivious algorithms for classical linear algebra, where a WA algorithm cannot exist: the number of writes is unavoidably within a constant factor of the total number of reads and writes. We explore the interaction of WA algorithms with cache replacement policies and argue that the Least Recently Used policy works well with the WA algorithms in this paper. We provide empirical hardware counter measurements from Intel's Nehalem-EX microarchitecture to validate our theory. In the parallel case, for classical linear algebra, we show that it is impossible to attain lower bounds both on interprocessor communication and on writes to local memory, but either one is attainable by itself. Finally, we discuss WA algorithms for sparse iterative linear algebra.

Original language | American English |
---|---|

Title of host publication | Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 648-658 |

Number of pages | 11 |

ISBN (Electronic) | 9781509021406 |

DOIs | |

State | Published - 18 Jul 2016 |

Event | 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 - Chicago, United States Duration: 23 May 2016 → 27 May 2016 |

### Publication series

Name | Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 |
---|

### Conference

Conference | 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 |
---|---|

Country/Territory | United States |

City | Chicago |

Period | 23/05/16 → 27/05/16 |

### Bibliographical note

Publisher Copyright:© 2016 IEEE.

## Keywords

- Communication avoiding algorithms
- Krylov subspace methods
- Linear algebra
- Lower bounds
- N-body methods
- Non-volatile memories
- Write complexity