Abstract
Deep learning architectures have enabled unprecedented advances in a wide range of artificial intelligence-related applications. The empirical success of these architectures has posed fundamental riddles regarding their operation in the front lines of modern theoretical machine learning research. Related theoretical efforts can be broadly divided into (i) explaining the observed success of deep learning architectures and (ii) harnessing these insights for improving their operation. In this chapter, we outline a tensor analysis-based contribution to understanding and improving the expressivity of prominent deep learning architecture classes. We detail a successful proof methodology which includes analyzing grid tensors of the functions realized by deep learning architecture classes, which was applied for convolutional, recurrent, and self-attention networks. The rank of an architecture’s grid tensor is used for bounding the input dependencies that can be modeled by the architecture and for establishing superiority of one architectural configuration over the other. We demonstrate how the above methodology has promoted the understanding of the architectures’ operations and consequently led to their practical improvements.
Original language | English |
---|---|
Title of host publication | Tensors for Data Processing |
Subtitle of host publication | Theory, Methods, and Applications |
Publisher | Elsevier |
Pages | 215-248 |
Number of pages | 34 |
ISBN (Electronic) | 9780128244470 |
ISBN (Print) | 9780323859653 |
DOIs | |
State | Published - 1 Jan 2021 |
Bibliographical note
Publisher Copyright:© 2022 Elsevier Inc. All rights reserved.
Keywords
- Convolutional networks
- Deep learning
- Depth efficiency
- Expressivity
- Grid tensors
- Recurrent networks
- Self-attention networks
- Separation rank