We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation. We make our codebase publicly available.
|Original language||American English|
|Title of host publication||EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||15|
|State||Published - 2023|
|Event||17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 - Dubrovnik, Croatia|
Duration: 2 May 2023 → 6 May 2023
|Name||EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023|
|Conference||17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023|
|Period||2/05/23 → 6/05/23|
Bibliographical noteFunding Information:
We would like to thank the anonymous reviewers for their helpful comments and feedback. We would also like to thank Rotem Dror, Sharon Goldwater and Grzegorz Chrupała for consulting, and the native speakers that consulted and validated our annotation tool: Assaf Porat, Kozue Watan-abe, Arie Cattan, and Yilin Geng. This work was supported in part by the Israel Science Foundation (grant no. 2424/21), the Israeli Ministry of Science and Technology (grant no. 2336), and by the HUJI-UoM joint PhD program.
© 2023 Association for Computational Linguistics.