Skip to main navigation Skip to search Skip to main content

Cross-Lingual and Cross-Cultural Variation in Image Descriptions

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Do speakers of different languages talk differently about what they see? Behavioural and cognitive studies report cultural effects on perception; however, these are mostly limited in scope and hard to replicate. In this work, we conduct the first large-scale empirical study of cross-lingual variation in image descriptions. Using a multimodal dataset with 31 languages and images from diverse locations, we develop a method to accurately identify entities mentioned in captions and present in the images, then measure how they vary across languages. Our analysis reveals that pairs of languages that are geographically or genetically closer tend to mention the same entities more frequently. We also identify entity categories whose saliency is universally high (such as animate beings), low (clothing accessories) or displaying high variance across languages (landscape). In a case study, we measure the differences in a specific language pair (e.g., Japanese mentions clothing far more frequently than English). Furthermore, our method corroborates previous small-scale studies, including 1) Rosch et al. (1976)'s theory of basic-level categories, demonstrating a preference for entities that are neither too generic nor too specific, and 2) Miyamoto et al. (2006)'s hypothesis that environments afford patterns of perception, such as entity counts. Overall, our work reveals the presence of both universal and culture-specific patterns in entity mentions.

Original languageEnglish
Title of host publicationLong Papers
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages9453-9465
Number of pages13
ISBN (Electronic)9798891761896
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 - Hybrid, Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

NameProceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
Volume1

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Country/TerritoryUnited States
CityHybrid, Albuquerque
Period29/04/254/05/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Cross-Lingual and Cross-Cultural Variation in Image Descriptions'. Together they form a unique fingerprint.

Cite this