Abstract
We present a method for identifying corresponding themes across several corpora that are focused on related, but distinct, domains. This task is approached through simultaneous clustering of keyword sets extracted from the analyzed corpora. Our algorithm extends the information-bottleneck soft clustering method for a suitable setting consisting of several datasets. Experimentation with topical corpora reveals similar aspects of three distinct religions. The evaluation is by way of comparison to clusters constructed manually by an expert.
Original language | English |
---|---|
Journal | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
State | Published - 2002 |
Event | 6th Conference on Natural Language Learning, CoNLL 2002 - Taipei, Taiwan, Province of China Duration: 24 Aug 2002 → 1 Sep 2002 |
Bibliographical note
Publisher Copyright:© 2002 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All Rights Reserved.