An ensemble method for selection of high quality parses

Roi Reichart*, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

While the average performance of statistical parsers gradually improves, they still attach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as information retrieval and question answering. In this paper we present a Sample Ensemble Parse Assessment (SEPA) algorithm for detecting parse quality. We use a function of the agreement among several copies of a parser, each of which trained on a different sample from the training data, to assess parse quality. We experimented with both generative and reranking parsers (Collins, Charniak and Johnson respectively). We show superior results over several baselines, both when the training and test data are from the same domain and when they are from different domains. For a test setting used by previous work, we show an error reduction of 31% as opposed to their 20%.

Original languageEnglish
Title of host publicationACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Pages408-415
Number of pages8
StatePublished - 2007
Event45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic
Duration: 23 Jun 200730 Jun 2007

Publication series

NameACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics

Conference

Conference45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
Country/TerritoryCzech Republic
CityPrague
Period23/06/0730/06/07

Fingerprint

Dive into the research topics of 'An ensemble method for selection of high quality parses'. Together they form a unique fingerprint.

Cite this