Symmetric pattern based word embeddings for improved word similarity prediction

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

97 Scopus citations

Abstract

We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., “X and Y”) from a large corpus of plain text, and generate vectors where each coordinate represents the co-occurrence in SPs of the represented word with another word of the vocabulary. Our representation has three advantages over existing alternatives: First, being based on symmetric word relationships, it is highly suitable for word similarity prediction. Particularly, on the SimLex999 word similarity dataset, our model achieves a Spearman’s ρ score of 0.517, compared to 0.462 of the state-of-the-art word2vec model. Interestingly, our model performs exceptionally well on verbs, outperforming state-of-the-art baselines by 20.2–41.5%. Second, pattern features can be adapted to the needs of a target NLP application. For example, we show that we can easily control whether the embeddings derived from SPs deem antonym pairs (e.g. (big,small)) as similar or dissimilar, an important distinction for tasks such as word classification and sentiment analysis. Finally, we show that a simple combination of the word similarity scores generated by our method and by word2vec results in a superior predictive power over that of each individual model, scoring as high as 0.563 in Spearman’s ρ on SimLex999. This emphasizes the differences between the signals captured by each of the models.

Original languageEnglish
Title of host publicationCoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages258-267
Number of pages10
ISBN (Electronic)9781941643778
DOIs
StatePublished - 2015
Event19th Conference on Computational Natural Language Learning, CoNLL 2015 - Beijing, China
Duration: 30 Jul 201531 Jul 2015

Publication series

NameCoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference19th Conference on Computational Natural Language Learning, CoNLL 2015
Country/TerritoryChina
CityBeijing
Period30/07/1531/07/15

Bibliographical note

Publisher Copyright:
© 2015 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Symmetric pattern based word embeddings for improved word similarity prediction'. Together they form a unique fingerprint.

Cite this