Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars

Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber, Alena Witzlack-Makarevich

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Typological databases can contain a wealth of information beyond the collection of linguistic properties across languages. This paper shows how information often overlooked in typological databases can inform the research community about the state of description of the world's languages. We illustrate this using Grambank, a morphosyntactic typological database covering 2,467 language varieties and based on 3,951 grammatical descriptions. We classify and quantify the comments that accompany coded values in Grambank. We then aggregate these comments and the coded values to derive a level of description for 17 grammatical domains that Grambank covers (negation, adnominal modification, participant marking, tense, aspect, etc.). We show that the description level of grammatical domains varies across space and time. Information about gaps and uncertainties in the descriptive knowledge of grammatical domains within and across languages is essential for a correct analysis of data in typological databases and for the study of grammatical diversity more generally. When collected in a database, such information feeds into disciplines that focus on primary data collection, such as grammaticography and language documentation.

Original languageEnglish
Title of host publication2022 Language Resources and Evaluation Conference, LREC 2022
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages2884-2890
Number of pages7
ISBN (Electronic)9791095546726
StatePublished - 2022
Event13th International Conference on Language Resources and Evaluation Conference, LREC 2022 - Marseille, France
Duration: 20 Jun 202225 Jun 2022

Publication series

Name2022 Language Resources and Evaluation Conference, LREC 2022

Conference

Conference13th International Conference on Language Resources and Evaluation Conference, LREC 2022
Country/TerritoryFrance
CityMarseille
Period20/06/2225/06/22

Bibliographical note

Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.

Keywords

  • Less-Resourced/Endangered Languages
  • Linked Data
  • Typological Databases

Fingerprint

Dive into the research topics of 'Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars'. Together they form a unique fingerprint.

Cite this