Unsupervised induction of labeled parse trees by clustering with syntactic features

Roi Reichart*, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.

Original languageEnglish
Title of host publicationColing 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages721-728
Number of pages8
ISBN (Print)9781905593446
DOIs
StatePublished - 2008
Event22nd International Conference on Computational Linguistics, Coling 2008 - Manchester, United Kingdom
Duration: 18 Aug 200822 Aug 2008

Publication series

NameColing 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
Volume1

Conference

Conference22nd International Conference on Computational Linguistics, Coling 2008
Country/TerritoryUnited Kingdom
CityManchester
Period18/08/0822/08/08

Fingerprint

Dive into the research topics of 'Unsupervised induction of labeled parse trees by clustering with syntactic features'. Together they form a unique fingerprint.

Cite this