Skip to main navigation Skip to search Skip to main content

Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Creating large amounts of manually annotated training data for statistical parsers imposes heavy cognitive load on the human annotator and is thus costly and error prone. It is hence of high importance to decrease the human efforts involved in creating training data without harming parser performance. For constituency parsers, these efforts are traditionally evaluated using the total number of constituents (TC) measure, assuming uniform cost for each annotated item. In this paper, we introduce novel measures that quantify aspects of the cognitive efforts of the human annotator that are not reflected by the TC measure, and show that they are well established in the psycholinguistic literature. We present a novel parameter based sample selection approach for creating good samples in terms of these measures. We describe methods for global optimisation of lexical parameters of the sample based on a novel optimisation problem, the constrained multiset multicover problem, and for cluster-based sampling according to syntactic parameters. Our methods outperform previously suggested methods in terms of the new measures, while maintaining similar TC performance.

Original languageEnglish
Title of host publicationCoNLL 2009 - Proceedings of the 13th Conference on Computational Natural Language Learning
EditorsSuzanne Stevenson, Xavier Carreras
PublisherAssociation for Computational Linguistics (ACL)
Pages3-11
Number of pages9
ISBN (Electronic)9781932432299
StatePublished - 2009
Event13th Conference on Computational Natural Language Learning, CoNLL 2009 in conjunction with NAACL HLT - Boulder, United States
Duration: 4 Jun 20095 Jun 2009

Publication series

NameCoNLL 2009 - Proceedings of the 13th Conference on Computational Natural Language Learning

Conference

Conference13th Conference on Computational Natural Language Learning, CoNLL 2009 in conjunction with NAACL HLT
Country/TerritoryUnited States
CityBoulder
Period4/06/095/06/09

Bibliographical note

Publisher Copyright:
© 2009 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures'. Together they form a unique fingerprint.

Cite this