TY - GEN
T1 - Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets
AU - Reichart, Roi
AU - Rappoport, Ari
PY - 2007
Y1 - 2007
N2 - Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the out-of-domain adaptation scenario. In particular, we achieve 50% reduction in annotation cost for the in-domain case, yielding an improvement of 66%over previous work, and a 20-33% reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to formulate a characterization of when selftraining is valuable.
AB - Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the out-of-domain adaptation scenario. In particular, we achieve 50% reduction in annotation cost for the in-domain case, yielding an improvement of 66%over previous work, and a 20-33% reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to formulate a characterization of when selftraining is valuable.
UR - http://www.scopus.com/inward/record.url?scp=84860518415&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84860518415
SN - 9781932432862
T3 - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
SP - 616
EP - 623
BT - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
T2 - 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
Y2 - 23 June 2007 through 30 June 2007
ER -