Clustering for unsupervised relation identification

Benjamin Rosenfeld*, Ronen Feldman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

59 Scopus citations

Abstract

Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.

Original languageAmerican English
Title of host publicationCIKM 2007 - Proceedings of the 16th ACM Conference on Information and Knowledge Management
Pages411-418
Number of pages8
DOIs
StatePublished - 2007
Event16th ACM Conference on Information and Knowledge Management, CIKM 2007 - Lisboa, Portugal
Duration: 6 Nov 20079 Nov 2007

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference16th ACM Conference on Information and Knowledge Management, CIKM 2007
Country/TerritoryPortugal
CityLisboa
Period6/11/079/11/07

Keywords

  • Clustering
  • Information extraction
  • Relation learning
  • Unsupervised relation identification

Cite this