Sample selection for statistical parsers: Cognitively driven algorithms and evaluation measures

Roi Reichart*, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Creating large amounts of manually annotated training data for statistical parsers imposes heavy cognitive load on the human annotator and is thus costly and error prone. It is hence of high importance to decrease the human efforts involved in creating training data without harming parser performance. For constituency parsers, these efforts are traditionally evaluated using the total number of constituents (TC) measure, assuming uniform cost for each annotated item. In this paper, we introduce novel measures that quantify aspects of the cognitive efforts of the human annotator that are not reflected by the TC measure, and show that they are well established in the psycholinguistic literature. We present a novel parameter based sample selection approach for creating good samples in terms of these measures. We describe methods for global optimisation of lexical parameters of the sample based on a novel optimisation problem, the constrained multiset multicover problem, and for cluster-based sampling according to syntactic parameters. Our methods outperform previously suggested methods in terms of the new measures, while maintaining similar TC performance.

Original languageAmerican English
Title of host publicationCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Pages3-11
Number of pages9
StatePublished - 2009
Event13th Conference on Computational Natural Language Learning, CoNLL 2009 - Boulder, CO, United States
Duration: 4 Jun 20095 Jun 2009

Publication series

NameCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning

Conference

Conference13th Conference on Computational Natural Language Learning, CoNLL 2009
Country/TerritoryUnited States
CityBoulder, CO
Period4/06/095/06/09

Fingerprint

Dive into the research topics of 'Sample selection for statistical parsers: Cognitively driven algorithms and evaluation measures'. Together they form a unique fingerprint.

Cite this