Automatic selection of high quality parses created by a fully unsupervised parser

Roi Reichart*, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsupervised parsers is an important problem. In this paper we present PUPA, a POS-based Unsupervised Parse Assessment algorithm. The algorithm assesses the quality of a parse tree using POS sequence statistics collected from a batch of parsed sentences. We evaluate the algorithm by using an unsupervised POS tagger and an unsupervised parser, selecting high quality parsed sentences from English (WSJ) and German (NEGRA) corpora. We show that PUPA outperforms the leading previous parse assessment algorithm for supervised parsers, as well as a strong unsupervised baseline. Consequently, PUPA allows obtaining high quality parses without any human involvement.

Original languageEnglish
Title of host publicationCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning
PublisherAssociation for Computational Linguistics (ACL)
Pages156-164
Number of pages9
ISBN (Print)1932432299, 9781932432299
DOIs
StatePublished - 2009
Event13th Conference on Computational Natural Language Learning, CoNLL 2009 - Boulder, CO, United States
Duration: 4 Jun 20095 Jun 2009

Publication series

NameCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning

Conference

Conference13th Conference on Computational Natural Language Learning, CoNLL 2009
Country/TerritoryUnited States
CityBoulder, CO
Period4/06/095/06/09

Fingerprint

Dive into the research topics of 'Automatic selection of high quality parses created by a fully unsupervised parser'. Together they form a unique fingerprint.

Cite this