Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.
|Title of host publication
|2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - 10 Sep 2018
|2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 15 Apr 2018 → 20 Apr 2018
|ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
|2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
|15/04/18 → 20/04/18
Bibliographical notePublisher Copyright:
© 2018 IEEE.
- Adversarial examples
- Automatic speaker verification