Real time speech enhancement in the waveform domain

Alexandre Défossez, Gabriel Synnaeve, Yossi Adi

Research output: Contribution to journalConference articlepeer-review

130 Scopus citations

Abstract

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

Original languageAmerican English
Pages (from-to)3291-3295
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
DOIs
StatePublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Bibliographical note

Publisher Copyright:
© 2020 ISCA

Keywords

  • Neural networks
  • Raw waveform
  • Speech denoising
  • Speech enhancement

Fingerprint

Dive into the research topics of 'Real time speech enhancement in the waveform domain'. Together they form a unique fingerprint.

Cite this