TY - JOUR
T1 - SAGRNN
T2 - Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation
AU - Tan, Ke
AU - Xu, Buye
AU - Kumar, Anurag
AU - Nachmani, Eliya
AU - Adi, Yossi
N1 - Publisher Copyright:
© 1994-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
AB - Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
KW - Binaural speaker separation
KW - interaural cue preservation
KW - self-attention
KW - time-domain
UR - http://www.scopus.com/inward/record.url?scp=85097942142&partnerID=8YFLogxK
U2 - 10.1109/LSP.2020.3043977
DO - 10.1109/LSP.2020.3043977
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85097942142
SN - 1070-9908
VL - 28
SP - 26
EP - 30
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
M1 - 9292089
ER -