TY - JOUR
T1 - The learnability consequences of Zipfian distributions in language
AU - Lavi-Rotbain, Ori
AU - Arnon, Inbal
N1 - Publisher Copyright:
© 2022
PY - 2022/6
Y1 - 2022/6
N2 - While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.
AB - While the languages of the world differ in many respects, they share certain commonalties, which can provide insight on our shared cognition. Here, we explore the learnability consequences of one of the striking commonalities between languages. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank. While their source in language has been studied extensively, less work has explored the learnability consequences of such distributions for language learners. We propose that the greater predictability of words in this distribution (relative to less skewed distributions) can facilitate word segmentation, a crucial aspect of early language acquisition. To explore this, we quantify word predictability using unigram entropy, assess it across languages using naturalistic corpora of child-directed speech and then ask whether similar unigram predictability facilitates word segmentation in the lab. We find similar unigram entropy in child-directed speech across 15 languages. We then use an auditory word segmentation task to show that the unigram predictability levels found in natural language are uniquely facilitative for word segmentation for both children and adults. These findings illustrate the facilitative impact of skewed input distributions on learning and raise questions about the possible role of cognitive pressures in the prevalence of Zipfian distributions in language.
KW - Distributional learning
KW - Information theory
KW - Language acquisition
KW - Word segmentation
KW - Zipf's law
UR - http://www.scopus.com/inward/record.url?scp=85123859736&partnerID=8YFLogxK
U2 - 10.1016/j.cognition.2022.105038
DO - 10.1016/j.cognition.2022.105038
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 35123219
AN - SCOPUS:85123859736
SN - 0010-0277
VL - 223
JO - Cognition
JF - Cognition
M1 - 105038
ER -