TY - GEN
T1 - Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser
AU - Davidov, Dmitry
AU - Reichart, Roi
AU - Rappoport, Ari
PY - 2009
Y1 - 2009
N2 - Sets of lexical items sharing a significant aspect of their meaning (concepts) are fundamental for linguistics and NLP. Unsupervised concept acquisition algorithms have been shown to produce good results, and are preferable over manual preparation of concept resources, which is labor intensive, error prone and somewhat arbitrary. Some existing concept mining methods utilize supervised language-specific modules such as POS taggers and computationally intensive parsers. In this paper we present an efficient fully unsupervised concept acquisition algorithm that uses syntactic information obtained from a fully unsupervised parser. Our algorithm incorporates the bracketings induced by the parser into the meta-patterns used by a symmetric patterns and graph-based concept discovery algorithm. We evaluate our algorithm on very large corpora in English and Russian, using both human judgments and WordNetbased evaluation. Using similar settings as the leading fully unsupervised previous work, we show a significant improvement in concept quality and in the extraction of multiword expressions. Our method is the first to use fully unsupervised parsing for unsupervised concept discovery, and requires no languagespecific tools or pattern/word seeds.
AB - Sets of lexical items sharing a significant aspect of their meaning (concepts) are fundamental for linguistics and NLP. Unsupervised concept acquisition algorithms have been shown to produce good results, and are preferable over manual preparation of concept resources, which is labor intensive, error prone and somewhat arbitrary. Some existing concept mining methods utilize supervised language-specific modules such as POS taggers and computationally intensive parsers. In this paper we present an efficient fully unsupervised concept acquisition algorithm that uses syntactic information obtained from a fully unsupervised parser. Our algorithm incorporates the bracketings induced by the parser into the meta-patterns used by a symmetric patterns and graph-based concept discovery algorithm. We evaluate our algorithm on very large corpora in English and Russian, using both human judgments and WordNetbased evaluation. Using similar settings as the leading fully unsupervised previous work, we show a significant improvement in concept quality and in the extraction of multiword expressions. Our method is the first to use fully unsupervised parsing for unsupervised concept discovery, and requires no languagespecific tools or pattern/word seeds.
UR - http://www.scopus.com/inward/record.url?scp=84862283510&partnerID=8YFLogxK
U2 - 10.3115/1596374.1596386
DO - 10.3115/1596374.1596386
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84862283510
SN - 1932432299
SN - 9781932432299
T3 - CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning
SP - 48
EP - 56
BT - CoNLL 2009 - Proceedings of the Thirteenth Conference on Computational Natural Language Learning
PB - Association for Computational Linguistics (ACL)
T2 - 13th Conference on Computational Natural Language Learning, CoNLL 2009
Y2 - 4 June 2009 through 5 June 2009
ER -