Hume: Human UCCA-based evaluation of machine translation

Alexandra Birch, Omri Abend, Ondrej Bojar, Barry Haddow

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

Human evaluation of machine translation normally uses sentence-level measures such as relative ranking or adequacy scales. However, these provide no insight into possible errors, and do not scale well with sentence length. We argue for a semantics-based evaluation, which captures what meaning components are retained in the MT output, thus providing a more fine-grained analysis of translation quality, and enabling the construction and tuning of semantics-based MT. We present a novel human semantic evaluation measure, Human UCCA-based MT Evaluation (HUME), building on the UCCA semantic representation scheme. HUME covers a wider range of semantic phenomena than previous methods and does not rely on semantic annotation of the potentially garbled MT output. We experiment with four language pairs, demonstrating HUME's broad applicability, and report good inter-annotator agreement rates and correlation with human adequacy scores.

Original languageEnglish
Title of host publicationEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages1264-1274
Number of pages11
ISBN (Electronic)9781945626258
DOIs
StatePublished - 2016
Event2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016 - Austin, United States
Duration: 1 Nov 20165 Nov 2016

Publication series

NameEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Country/TerritoryUnited States
CityAustin
Period1/11/165/11/16

Bibliographical note

Publisher Copyright:
© 2016 Association for Computational Linguistics

Fingerprint

Dive into the research topics of 'Hume: Human UCCA-based evaluation of machine translation'. Together they form a unique fingerprint.

Cite this