Latent topic models for hypertext

Amit Gruber*, Michal Rosen-Zvi, Yair Weiss

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

41 Scopus citations

Abstract

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.

Original languageEnglish
Title of host publicationProceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008
Pages230-239
Number of pages10
StatePublished - 2008
Event24th Conference on Uncertainty in Artificial Intelligence, UAI 2008 - Helsinki, Finland
Duration: 9 Jul 200812 Jul 2008

Publication series

NameProceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008

Conference

Conference24th Conference on Uncertainty in Artificial Intelligence, UAI 2008
Country/TerritoryFinland
CityHelsinki
Period9/07/0812/07/08

Fingerprint

Dive into the research topics of 'Latent topic models for hypertext'. Together they form a unique fingerprint.

Cite this