Single channel voice separation for unknown number of speakers under reverberant and noisy settings

Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations

Abstract

We present a unified network for voice separation of an unknown number of speakers. The proposed approach is composed of several separation heads optimized together with a speaker classification branch. The separation is carried out in the time domain, together with parameter sharing between all separation heads. The classification branch estimates the number of speakers while each head is specialized in separating a different number of speakers. We evaluate the proposed model under both clean and noisy reverberant settings. Results suggest that the proposed approach is superior to the baseline model by a significant margin. Additionally, we present a new noisy and reverberant dataset of up to five different speakers speaking simultaneously.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3730-3734
Number of pages5
ISBN (Electronic)9781728176055
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
ISSN (Print)1520-6149

Conference

Conference2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Country/TerritoryCanada
CityVirtual, Toronto
Period6/06/2111/06/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE

Keywords

  • Source separation
  • Speaker classification
  • Speech processing

Fingerprint

Dive into the research topics of 'Single channel voice separation for unknown number of speakers under reverberant and noisy settings'. Together they form a unique fingerprint.

Cite this