TY - GEN
T1 - An ensemble method for selection of high quality parses
AU - Reichart, Roi
AU - Rappoport, Ari
PY - 2007
Y1 - 2007
N2 - While the average performance of statistical parsers gradually improves, they still attach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as information retrieval and question answering. In this paper we present a Sample Ensemble Parse Assessment (SEPA) algorithm for detecting parse quality. We use a function of the agreement among several copies of a parser, each of which trained on a different sample from the training data, to assess parse quality. We experimented with both generative and reranking parsers (Collins, Charniak and Johnson respectively). We show superior results over several baselines, both when the training and test data are from the same domain and when they are from different domains. For a test setting used by previous work, we show an error reduction of 31% as opposed to their 20%.
AB - While the average performance of statistical parsers gradually improves, they still attach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as information retrieval and question answering. In this paper we present a Sample Ensemble Parse Assessment (SEPA) algorithm for detecting parse quality. We use a function of the agreement among several copies of a parser, each of which trained on a different sample from the training data, to assess parse quality. We experimented with both generative and reranking parsers (Collins, Charniak and Johnson respectively). We show superior results over several baselines, both when the training and test data are from the same domain and when they are from different domains. For a test setting used by previous work, we show an error reduction of 31% as opposed to their 20%.
UR - http://www.scopus.com/inward/record.url?scp=80053398112&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:80053398112
SN - 9781932432862
T3 - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
SP - 408
EP - 415
BT - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
T2 - 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
Y2 - 23 June 2007 through 30 June 2007
ER -