Improved fully unsupervised parsing with zoomed learning

Roi Reichart*, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset T i its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm's effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

Original languageAmerican English
Title of host publicationEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages684-693
Number of pages10
StatePublished - 2010
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2010 - Cambridge, MA, United States
Duration: 9 Oct 201011 Oct 2010

Publication series

NameEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2010
Country/TerritoryUnited States
CityCambridge, MA
Period9/10/1011/10/10

Fingerprint

Dive into the research topics of 'Improved fully unsupervised parsing with zoomed learning'. Together they form a unique fingerprint.

Cite this