Abstract
We address the problem of unsupervised tagging of phrase structure trees with phrase categories (parse tree nonterminals). Motivated by the inability of a range of direct clustering approaches to improve over the current leading algorithm, we propose a mixture of experts approach. In particular, we tackle the difficult challenge of producing a diverse collection of useful tagging experts, which can then be aggregated into a final high-quality tagging. To do so, we use the particular properties of the Dirichlet Process mixture model. We evaluate on English, German and Chinese corpora and demonstrate both a substantial and consistent improvement in overall performance over previous work, as well as empirical justification of our algorithmic choices.
Original language | English |
---|---|
Pages | 2307-2324 |
Number of pages | 18 |
State | Published - 2012 |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Conference
Conference | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 8/12/12 → 15/12/12 |
Keywords
- Dirichlet process
- Ensemble learning
- Grammar induction
- Non terminals
- Unsupervised parsing