Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

Roi Reichart*, Raanan Fattal, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Modern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading un-supervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipfian, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.

Original languageAmerican English
Title of host publicationCoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages57-66
Number of pages10
ISBN (Print)9781932432831
StatePublished - 2010
Event14th Conference on Computational Natural Language Learning, CoNLL 2010 - Uppsala, Sweden
Duration: 15 Jul 201016 Jul 2010

Publication series

NameCoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

Conference

Conference14th Conference on Computational Natural Language Learning, CoNLL 2010
Country/TerritorySweden
CityUppsala
Period15/07/1016/07/10

Fingerprint

Dive into the research topics of 'Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint'. Together they form a unique fingerprint.

Cite this