Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

Dmitry Davidov*, Roi Reichart, Ari Rappoport

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Sets of lexical items sharing a significant aspect of their meaning (concepts) are fundamental for linguistics and NLP. Unsupervised concept acquisition algorithms have been shown to produce good results, and are preferable over manual preparation of concept resources, which is labor intensive, error prone and somewhat arbitrary. Some existing concept mining methods utilize supervised language-specific modules such as POS taggers and computationally intensive parsers. In this paper we present an efficient fully unsupervised concept acquisition algorithm that uses syntactic information obtained from a fully unsupervised parser. Our algorithm incorporates the bracketings induced by the parser into the meta-patterns used by a symmetric patterns and graph-based concept discovery algorithm. We evaluate our algorithm on very large corpora in English and Russian, using both human judgments and WordNetbased evaluation. Using similar settings as the leading fully unsupervised previous work, we show a significant improvement in concept quality and in the extraction of multiword expressions. Our method is the first to use fully unsupervised parsing for unsupervised concept discovery, and requires no languagespecific tools or pattern/word seeds.

Original languageAmerican English
Title of host publicationCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning
PublisherAssociation for Computational Linguistics (ACL)
Pages48-56
Number of pages9
ISBN (Print)1932432299, 9781932432299
DOIs
StatePublished - 2009
Event13th Conference on Computational Natural Language Learning, CoNLL 2009 - Boulder, CO, United States
Duration: 4 Jun 20095 Jun 2009

Publication series

NameCoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning

Conference

Conference13th Conference on Computational Natural Language Learning, CoNLL 2009
Country/TerritoryUnited States
CityBoulder, CO
Period4/06/095/06/09

Fingerprint

Dive into the research topics of 'Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser'. Together they form a unique fingerprint.

Cite this