K-QA: A Real-World Medical Q&A Benchmark

Itay Manes, Naama Ronn, David Cohen, Ran Ilan Ber, Zehavi Horowitz-Kugler, Gabriel Stanovsky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Ensuring the accuracy of responses provided by large language models (LLMs) is crucial, particularly in clinical settings where incorrect information may directly impact patient health. To address this challenge, we construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health (an AI-driven clinical platform). We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. Additionally, we formulate two NLI-based evaluation metrics approximating recall and precision: (1) comprehensiveness, measuring the percentage of essential clinical information in the generated answer and (2) hallucination rate, measuring the number of statements from the physician-curated response contradicted by the LLM answer. Finally, we use K-QA along with these metrics to evaluate several state-of-the-art models, as well as the effect of in-context learning and medically-oriented augmented retrieval schemes developed by the authors. Our findings indicate that in-context learning improves the comprehensiveness of the models, and augmented retrieval is effective in reducing hallucinations. We will make K-QA available to to the community to spur research into medically accurate NLP applications.1,..

Original languageEnglish
Title of host publicationBioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks
EditorsDina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
PublisherAssociation for Computational Linguistics (ACL)
Pages277-294
Number of pages18
ISBN (Electronic)9798891761308
StatePublished - 2024
Event23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, BioNLP 2024 - Bangkok, Thailand
Duration: 16 Aug 2024 → …

Publication series

NameBioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks

Conference

Conference23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, BioNLP 2024
Country/TerritoryThailand
CityBangkok
Period16/08/24 → …

Bibliographical note

Publisher Copyright:
©2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'K-QA: A Real-World Medical Q&A Benchmark'. Together they form a unique fingerprint.

Cite this