Abstract
Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.
| Original language | English |
|---|---|
| Title of host publication | 2015 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2015 |
| Editors | Deniz Erdogmus, Serdar Kozat, Jan Larsen, Murat Akcakaya |
| Publisher | IEEE Computer Society |
| ISBN (Electronic) | 9781467374545 |
| DOIs | |
| State | Published - 10 Nov 2015 |
| Externally published | Yes |
| Event | 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015 - Boston, United States Duration: 17 Sep 2015 → 20 Sep 2015 |
Publication series
| Name | IEEE International Workshop on Machine Learning for Signal Processing, MLSP |
|---|---|
| Volume | 2015-November |
| ISSN (Print) | 2161-0363 |
| ISSN (Electronic) | 2161-0371 |
Conference
| Conference | 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015 |
|---|---|
| Country/Territory | United States |
| City | Boston |
| Period | 17/09/15 → 20/09/15 |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- Forced alignment
- convolution neural networks
- deep belief networks
- hidden Markov models
- vowel duration measurement