Convolutional neural networks contain strong priors for generating natural looking images . These priors enable image denoising, super resolution, and inpainting in an unsupervised manner. Previous attempts to demonstrate similar ideas in audio, namely deep audio priors, (i) use hand picked architectures such as harmonic convolutions, (ii) only work with spectrogram input, and (iii) have been used mostly for eliminating Gaussian noise . In this work we show that existing State-Of-The-Art (SOTA) architectures for audio source separation contain deep priors even when working with the raw waveform. Deep priors can be discovered by training a neural network to generate a single corrupted signal when given white noise as input. A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal. We demonstrate this restoration effect with several corruptions: background noise, reverberations, and a gap in the signal (audio inpainting).
|Original language||American English|
|Number of pages||5|
|Journal||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|State||Published - 2022|
|Event||23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of|
Duration: 18 Sep 2022 → 22 Sep 2022
Bibliographical noteFunding Information:
Acknowledgements: This research was supported by grants from the Israel Science Foundation, the DFG, and the Crown Family Foundation.
Copyright © 2022 ISCA.
- audio denoising
- audio inpainting
- deep priors