Abstract
The data-driven investigation of the extent to which lexicons of different languages align has mostly fallen into one of two categories: colexification-based and distributional. The two approaches are grounded in distinct methodologies, operate on different assumptions, and are used in diverse ways. This raises two important questions: (a) are there settings in which the predictions of the two approaches can be directly compared? and if so, (b) what is the extent of the similarity and what are its determinants? We offer novel operationalizations for the two approaches in a manner that allows for their direct comparison, and conduct a comprehensive analysis on a diverse set of 16 languages. Our analysis is carried out at different levels of granularity. At the word-level, the two methods present different results across the board. However, intriguingly, at the level of semantic domains (e.g., kinship, quantity), the two methods show considerable convergence in their predictions. Our findings also indicate that the distributional methods likely capture a more fine-grained alignment than their counterpart colexification-based methods, and may thus be more suited for settings where fewer languages are evaluated.
Original language | English |
---|---|
Title of host publication | CoNLL 2024 - 28th Conference on Computational Natural Language Learning, Proceedings of the Conference |
Editors | Libby Barak, Malihe Alikhani |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 327-341 |
Number of pages | 15 |
ISBN (Electronic) | 9798891761780 |
State | Published - 2024 |
Event | 28th Conference on Computational Natural Language Learning, CoNLL 2024 - Miami, United States Duration: 15 Nov 2024 → 16 Nov 2024 |
Publication series
Name | CoNLL 2024 - 28th Conference on Computational Natural Language Learning, Proceedings of the Conference |
---|
Conference
Conference | 28th Conference on Computational Natural Language Learning, CoNLL 2024 |
---|---|
Country/Territory | United States |
City | Miami |
Period | 15/11/24 → 16/11/24 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.