Determinants of phonetic word duration in ten language documentation corpora: Word frequency, complexity, position, and part of speech

Jan Strunk, Frank Seifart*, Swintha Danielsen, Iren Hartmann, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich, Balthasar Bickel

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


This paper explores the application of quantitative methods to study the effect of various factors on phonetic word duration in ten languages. Data on most of these languages were collected in fieldwork aiming at documenting spontaneous speech in mostly endangered languages, to be used for multiple purposes, including the preservation of cultural heritage and community work. Here we show the feasibility of studying processes of online acceleration and deceleration of speech across languages using such data, which have not been considered for this purpose before. Our results show that it is possible to detect a consistent effect of higher frequency of words leading to faster articulation even in the relatively small language documentation corpora used here. We also show that nouns tend to be pronounced more slowly than verbs when controlling for other factors. Comparison of the effects of these and other factors shows that some of them are difficult to capture with the current data and methods, including potential effects of crosslinguistic differences in morphological complexity. In general, this paper argues for widening the cross-linguistic scope of phonetic and psycholinguistic research by including the wealth of language documentation data that has recently become available.

Original languageAmerican English
Pages (from-to)423-461
Number of pages39
JournalLanguage Documentation and Conservation
StatePublished - 2020

Bibliographical note

Funding Information:
1FS and JS wrote the paper, with input and additions from all authors; JS carried out the statistical analyses; all authors collected and annotated data (see Table 1 and Section 2.1 for details). The research of FS and JS was supported by a grant from the Volkswagen Foundation’s Dokumentation Bedrohter Sprachen (DoBeS) program (89 550). FS and BP are grateful to the LABEX ASLAN (ANR-10-LABX-0081) of Université de Lyon for its financial support within the program ”Investissements d’Avenir” (ANR-11-IDEX-0007) of the French government operated by the National Research Agency (ANR). SW’s research was supported by JPICH/NWO and a subsidy of the Russian Government to support the Programme of Competitive Development of Kazan Federal University.

Funding Information:
The Nǁng corpus was collected between 2007 and 2011 in the Northern Cape province of South Africa. Nǁng belongs to the ǃUi branch of the Tuu family. Once it was spoken in a wide area in South Africa’s Gordonia district. As of 2020, Nǁng is a moribund language spoken by three elderly speakers. The collected corpus contains recordings from eight speakers. The data were collected primarily for the language documentation project “A text documentation of Nǀuu” (funded by the Endangered Language Documentation Programme (ELDP)) by Tom Güldemann, Martina Ernszt, Sven Siegmund, and Alena Witzlack-Makarevich. The corpus contains personal and traditional narratives, discussions of day-to-day issues, as well as procedural texts. For the present paper Alena Witzlack-Makarevich selected a subset of the data and extended the project’s annotations.

Publisher Copyright:
© 2020


Dive into the research topics of 'Determinants of phonetic word duration in ten language documentation corpora: Word frequency, complexity, position, and part of speech'. Together they form a unique fingerprint.

Cite this