Abstract
We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured loss function which can be designed to the given segmentation task. We demonstrate the effectiveness of our method by applying it to two simple tasks commonly used in phonetic studies: word segmentation and voice onset time segmentation. Results suggest the proposed model is superior to previous methods, obtaining state-of-the-art results on the tested datasets.
Original language | English |
---|---|
Title of host publication | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 2422-2426 |
Number of pages | 5 |
ISBN (Electronic) | 9781509041176 |
DOIs | |
State | Published - 16 Jun 2017 |
Externally published | Yes |
Event | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States Duration: 5 Mar 2017 → 9 Mar 2017 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
ISSN (Print) | 1520-6149 |
Conference
Conference | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 5/03/17 → 9/03/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.
Keywords
- Sequence segmentation
- recurrent neural networks (RNNs)
- structured prediction
- voice onset time
- word segmentation