## Abstract

Algorithms and implementations for computing the sign function of a triangular matrix are fundamental building blocks for computing the sign of arbitrary square real or complex matrices. We present novel recursive and cache-efficient algorithms that are based on Higham's stabilized specialization of Parlett's substitution algorithm for computing the sign of a triangular matrix. We show that the new recursive algorithms are asymptotically optimal in terms of the number of cache misses that they generate. One algorithm that we present performs more arithmetic than the nonrecursive version, but this allows it to benefit from calling highly optimized matrix multiplication routines; the other performs the same number of operations as the nonrecursive version, suing custom computational kernels instead. We present implementations of both, as well as a cache-efficient implementation of a block version of Parlett's algorithm. Our experiments demonstrate that the blocked and recursive versions are much faster than the previous algorithms and that the inertia strongly influences their relative performance, as predicted by our analysis.

Original language | American English |
---|---|

Article number | e2139 |

Journal | Numerical Linear Algebra with Applications |

Volume | 25 |

Issue number | 2 |

DOIs | |

State | Published - Mar 2018 |

### Bibliographical note

Publisher Copyright:Copyright © 2017 John Wiley & Sons, Ltd.

## Keywords

- blocked matrix algorithms
- cache-efficient algorithms
- communication-efficient algorithms
- matrix functions
- partitioned matrix algorithms