MuLER: Detailed and Scalable Reference-based Evaluation

Taelin Karidi, Gal Patel, Leshem Choshen, Omri Abend

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can lead to targeted improvement efforts for specific phenomena. We perform experiments in both synthetic and naturalistic settings to support MuLER’s validity and showcase its usability in MT evaluation, and other tasks, such as summarization. Analyzing all submissions to WMT in 2014−2020, we find consistent trends. For example, nouns and verbs are among the most frequent POS tags. However, they are among the hardest to translate. Performance on most POS tags improves with overall system performance, but a few are not thus correlated (their identity changes from language to language). Preliminary experiments with summarization reveal similar trends.1

Original languageAmerican English
Title of host publicationCoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings
EditorsJing Jiang, David Reitter, Shumin Deng
PublisherAssociation for Computational Linguistics (ACL)
Pages436-455
Number of pages20
ISBN (Electronic)9798891760394
StatePublished - 2023
Event27th Conference on Computational Natural Language Learning, CoNLL 2023 - Singapore, Singapore
Duration: 6 Dec 20237 Dec 2023

Publication series

NameCoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference27th Conference on Computational Natural Language Learning, CoNLL 2023
Country/TerritorySingapore
CitySingapore
Period6/12/237/12/23

Bibliographical note

Publisher Copyright:
© 2023 CoNLL 2023 - 27th Conference on Computational Natural Language Learning, Proceedings. All rights reserved.

Fingerprint

Dive into the research topics of 'MuLER: Detailed and Scalable Reference-based Evaluation'. Together they form a unique fingerprint.

Cite this