Abstract
It is generally well accepted that proficient reading requires the assimilation of myriad statistical regularities present in the writing system, including in particular the correspondences between words' orthographic and phonological forms. There is considerably less agreement, however, as to how to quantify these regularities. Here we present a comprehensive approach for this quantification using tools from Information Theory. We start by providing a glossary of the relevant information-theoretic metrics, with simplified examples showing their potential in assessing orthographic-phonological regularities. We specifically highlight the flexibility of our approach in quantifying information under different contexts (i.e., context-independent and dependent readings) and in different types of mappings (e.g., orthography-to-phonology and phonology-to-orthography). Then, we use these information-theoretic measures to assess real-world orthographic-phonological regularities of 10,093 mono-syllabic English words and examine whether these measures predict inter-item variability in accuracy and response times using available large-scale datasets of naming and lexical decision tasks. Together, the analyses demonstrate how information-theoretical measures can be used to quantify orthographical-phonological correspondences, and show that they capture variance in reading performance that is not accounted for by existing measures. We discuss the similarities and differences between the current framework and previous approaches as well as future directions towards understanding how the statistical regularities embedded in a writing system impact reading and reading acquisition.
Original language | American English |
---|---|
Pages (from-to) | 1292-1312 |
Number of pages | 21 |
Journal | Behavior Research Methods |
Volume | 52 |
Issue number | 3 |
DOIs | |
State | Published - 1 Jun 2020 |
Externally published | Yes |
Bibliographical note
Funding Information:This work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Awards P01HD070837, P20HD091013, P01HD001994, and 5R37HD090153-02. Noam Siegelman is a Rothschild Yad-Hanadiv post-doctoral fellow. We wish to thank Mark van den Bunt for his helpful comments.
Funding Information:
This work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Awards P01HD070837, P20HD091013, P01HD001994, and 5R37HD090153-02. Noam Siegelman is a Rothschild Yad-Hanadiv post-doctoral fellow. We wish to thank Mark van den Bunt for his helpful comments. The full list of the 10,093 monosyllabic words, their GPC coding, and their information-theoretic measures of orthographic-phonological regularities (entropy, surprisal and information gain; unconditional and conditional) is available at: https://osf.io/kfme8/ The full list of words, their GPC coding, and their information-theoretic measures is available as Supplementary Material https://osf.io/kfme8/. The full phonological corpus is also available at https://phinder.devinkearns.org.
Publisher Copyright:
© 2020, The Psychonomic Society, Inc.
Keywords
- information theory
- orthography-to-phonology transparency
- print-speech correspondences
- reading
- word recognition