Abstract
This paper proposes a new paradigm and a computational framework for revealing equivalencies (analogies) between sub-structures of distinct composite systems that are initially represented by unstructured data sets. For this purpose, we introduce and investigate a variant of traditional data clustering, termed coupled clustering, which outputs a configuration of corresponding subsets of two such representative sets. We apply our method to synthetic as well as textual data. Its achievements in detecting topical correspondences between textual corpora are evaluated through comparison to performance of human.
Original language | English |
---|---|
Pages (from-to) | 747-780 |
Number of pages | 34 |
Journal | Journal of Machine Learning Research |
Volume | 3 |
Issue number | 4-5 |
State | Published - 15 May 2003 |
Keywords
- Clustering
- Data mining in texts
- Natural language processing
- Structure mapping
- Unsupervised learning