Q²: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering.

Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability. Inspired by recent work on evaluating factual consistency in abstractive summarization, we propose an automatic evaluation metric for factual consistency in knowledge-grounded dialogue using automatic question generation and question answering. Our metric, denoted Q², compares answer spans using natural language inference (NLI), instead of token-based matching as done in previous work. To foster proper evaluation, we curate a novel dataset of dialogue system outputs for the Wizard-of-Wikipedia dataset, manually annotated for factual consistency. We perform a thorough meta-evaluation of Q² against other metrics using this dataset and two others, where it consistently shows higher correlation with human judgements.
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics (ACL)
Pages7856-7870
Number of pages15
ISBN (Electronic)9781955917094
DOIs
StatePublished - Nov 2021
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period7/11/2111/11/21

Keywords

  • natural language inference
  • NLI
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'Q²: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering.'. Together they form a unique fingerprint.

Cite this