TY - GEN
T1 - Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint
AU - Reichart, Roi
AU - Fattal, Raanan
AU - Rappoport, Ari
PY - 2010
Y1 - 2010
N2 - Modern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading un-supervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipfian, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.
AB - Modern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading un-supervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipfian, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.
UR - http://www.scopus.com/inward/record.url?scp=80053248813&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:80053248813
SN - 9781932432831
T3 - CoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
SP - 57
EP - 66
BT - CoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 14th Conference on Computational Natural Language Learning, CoNLL 2010
Y2 - 15 July 2010 through 16 July 2010
ER -