Benefits of depth for long-term memory of recurrent networks

Yoav Levine, Or Sharir, Amnon Shashua

Research output: Contribution to conferencePaperpeer-review

7 Scopus citations


The key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ever-improving ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs’ long-term memory capacity is lacking, and thus formal understanding of their ability to correlate data throughout time is limited. Though depth efficiency in convolutional networks is well established by now, it does not suffice in order to account for the success of deep RNNs on inputs of varying lengths, and the need to address their ‘time-series expressive power’ arises. In this paper, we analyze the effect of depth on the ability of recurrent networks to express correlations ranging over long time-scales. To meet the above need, we introduce a measure of the information flow across time that can be supported by the network, referred to as the Start-End separation rank. Essentially, this measure reflects the distance of the function realized by the recurrent network from a function that models no interaction whatsoever between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are exponentially higher than those supported by their shallow counterparts. Moreover, we show that the ability of deep recurrent networks to correlate different parts of the input sequence increases exponentially as the input sequence extends, while that of vanilla shallow recurrent networks does not adapt to the sequence length at all. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute which may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks. We obtain our results by considering a class of recurrent networks referred to as Recurrent Arithmetic Circuits (RACs), which merge the hidden state with the input via the Multiplicative Integration operation.

Original languageAmerican English
StatePublished - 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: 30 Apr 20183 May 2018


Conference6th International Conference on Learning Representations, ICLR 2018

Bibliographical note

Publisher Copyright:
© 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings. All rights reserved.


Dive into the research topics of 'Benefits of depth for long-term memory of recurrent networks'. Together they form a unique fingerprint.

Cite this