Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking

Ronen Tamari*, Kyle Richardson*, Noam Kahlon, Aviad Sar-Shalom, Nelson F. Liu, Reut Tsarfaty*, Dafna Shahaf

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both approaches solve the original tasks (>99% accuracy), neither approach succeeded in the compositional generalization setting, indicating the limitations of the original training data. We explored ways to augment the original data, and found that though diversifying training data was far more useful than simply increasing dataset size, it was still insufficient for driving robust compositional generalization (with <70% accuracy for complex compositions). Our results underscore the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.

Original languageAmerican English
Title of host publication*SEM 2022 - 11th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference
EditorsVivi Nastase, Ellie Pavlick, Mohammad Taher Pilehvar, Jose Camacho-Collados, Alessandro Raganato
PublisherAssociation for Computational Linguistics (ACL)
Pages101-122
Number of pages22
ISBN (Electronic)9781955917988
StatePublished - 2022
Event11th Joint Conference on Lexical and Computational Semantics, *SEM 2022 - Hybrid conference, Seattle, United States
Duration: 14 Jul 202215 Jul 2022
Conference number: 11
https://aclanthology.org/volumes/2022.starsem-1/

Publication series

Name*SEM 2022 - 11th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference

Conference

Conference11th Joint Conference on Lexical and Computational Semantics, *SEM 2022
Abbreviated title*SEM 2022
Country/TerritoryUnited States
CitySeattle
Period14/07/2215/07/22
Internet address

Bibliographical note

Funding Information:
We thank the Aristo team at the Allen Institute for AI for valuable support and feedback. Ronen Tamari was supported by the Center for Interdisciplinary Data-science Research at HUJI. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant no. 852686, SIAM, Shahaf). Part of this research is also supported by the European Research Council, ERC-StG grant no. 677352 (Tsarfaty), which we gratefully acknowledge.

Funding Information:
Work was supported by the Center for Interdisciplinary Data-science Research (CIDR) at HUJI. This work was also supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant no. 852686, SIAM) and NSF-BSF grant no. 2017741 (Shahaf). Part of this research is also supported by the European Research Council, ERC-StG grant no. 677352 (Tsarfaty).

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking'. Together they form a unique fingerprint.

Cite this