Abstract
Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.
Original language | English |
---|---|
Title of host publication | 2015 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2015 |
Editors | Deniz Erdogmus, Serdar Kozat, Jan Larsen, Murat Akcakaya |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781467374545 |
DOIs | |
State | Published - 10 Nov 2015 |
Externally published | Yes |
Event | 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015 - Boston, United States Duration: 17 Sep 2015 → 20 Sep 2015 |
Publication series
Name | IEEE International Workshop on Machine Learning for Signal Processing, MLSP |
---|---|
Volume | 2015-November |
ISSN (Print) | 2161-0363 |
ISSN (Electronic) | 2161-0371 |
Conference
Conference | 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 17/09/15 → 20/09/15 |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- Forced alignment
- convolution neural networks
- deep belief networks
- hidden Markov models
- vowel duration measurement