Abstract
We present a novel fully unsupervised algorithm for POS induction from plain text, motivated by the cognitive notion of prototypes. The algorithm first identifies landmark clusters of words, serving as the cores of the induced POS categories. The rest of the words are subsequently mapped to these clusters. We utilize morphological and distributional representations computed in a fully unsupervised manner. We evaluate our algorithm on English and German, achieving the best reported results for this task.
Original language | American English |
---|---|
Title of host publication | ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Conference Proceedings |
Editors | Jan Hajic, Sandra Carberry, Stephen Clark |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1298-1307 |
Number of pages | 10 |
ISBN (Electronic) | 1932432663, 9781932432664 |
State | Published - 2010 |
Externally published | Yes |
Event | 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden Duration: 11 Jul 2010 → 16 Jul 2010 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
Volume | 2010-July |
ISSN (Print) | 0736-587X |
Conference
Conference | 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 |
---|---|
Country/Territory | Sweden |
City | Uppsala |
Period | 11/07/10 → 16/07/10 |
Bibliographical note
Funding Information:∗Omri Abend is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship.
Publisher Copyright:
© 2010 Association for Computational Linguistics.