Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic

Elad Dinur, Dmitry Davidov, Ari Rappoport

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

Fully unsupervised pattern-based methods for discovery of word categories have been proven to be useful in several languages. The majority of these methods rely on the existence of function words as separate text units. However, in morphology-rich languages, in particular Semitic languages such as Hebrew and Arabic, the equivalents of such function words are usually written as morphemes attached as prefixes to other words. As a result, they are missed by word-based pattern discovery methods, causing many useful patterns to be undetected and a drastic deterioration in performance. To enable high quality lexical category acquisition, we propose a simple unsupervised word segmentation algorithm that separates these morphemes. We study the performance of the algorithm for Hebrew and Arabic, and show that it indeed improves a state-of-art unsupervised concept acquisition algorithm in Hebrew.

Original languageAmerican English
Pages36-44
Number of pages9
StatePublished - 2009
Externally publishedYes
EventEACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009 - Athens, Greece
Duration: 31 Mar 2009 → …

Conference

ConferenceEACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009
Country/TerritoryGreece
CityAthens
Period31/03/09 → …

Bibliographical note

Publisher Copyright:
© 2009 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Unsupervised concept discovery in Hebrew using simple unsupervised word prefix segmentation for Hebrew and Arabic'. Together they form a unique fingerprint.

Cite this