Multi-Word Expression identification using sentence surface features

Ram Boukobza*, Ari Rappoport

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

19 Scopus citations

Abstract

Much NLP research on Multi-Word Expressions (MWEs) focuses on the discovery of new expressions, as opposed to the identification in texts of known expressions. However, MWE identification is not trivial because many expressions allow variation in form and differ in the range of variations they allow. We show that simple rule-based baselines do not perform identification satisfactorily, and present a supervised learning method for identification that uses sentence surface features based on expressions' canonical form. To evaluate the method, we have annotated 3350 sentences from the British National Corpus, containing potential uses of 24 verbal MWEs. The method achieves an F-score of 94.86%, compared with 80.70% for the leading rule-based baseline. Our method is easily applicable to any expression type. Experiments in previous research have been limited to the compositional/non-compositional distinction, while we also test on sentences in which the words comprising the MWE appear but not as an expression.

Original languageEnglish
Pages468-477
Number of pages10
DOIs
StatePublished - 2009
Event2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009 - Singapore, Singapore
Duration: 6 Aug 20097 Aug 2009

Conference

Conference2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009
Country/TerritorySingapore
CitySingapore
Period6/08/097/08/09

Fingerprint

Dive into the research topics of 'Multi-Word Expression identification using sentence surface features'. Together they form a unique fingerprint.

Cite this