A diverse dirichlet process ensemble for unsupervised induction of syntactic categories

Roi Reichart*, Gal Elidan, Ari Rappoport

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

We address the problem of unsupervised tagging of phrase structure trees with phrase categories (parse tree nonterminals). Motivated by the inability of a range of direct clustering approaches to improve over the current leading algorithm, we propose a mixture of experts approach. In particular, we tackle the difficult challenge of producing a diverse collection of useful tagging experts, which can then be aggregated into a final high-quality tagging. To do so, we use the particular properties of the Dirichlet Process mixture model. We evaluate on English, German and Chinese corpora and demonstrate both a substantial and consistent improvement in overall performance over previous work, as well as empirical justification of our algorithmic choices.

Original languageEnglish
Pages2307-2324
Number of pages18
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Conference

Conference24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period8/12/1215/12/12

Keywords

  • Dirichlet process
  • Ensemble learning
  • Grammar induction
  • Non terminals
  • Unsupervised parsing

Fingerprint

Dive into the research topics of 'A diverse dirichlet process ensemble for unsupervised induction of syntactic categories'. Together they form a unique fingerprint.

Cite this