TY - GEN
T1 - Latent topic models for hypertext
AU - Gruber, Amit
AU - Rosen-Zvi, Michal
AU - Weiss, Yair
PY - 2008
Y1 - 2008
N2 - Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.
AB - Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.
UR - http://www.scopus.com/inward/record.url?scp=77956221485&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:77956221485
SN - 0974903949
SN - 9780974903941
T3 - Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008
SP - 230
EP - 239
BT - Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008
T2 - 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008
Y2 - 9 July 2008 through 12 July 2008
ER -