TY - JOUR
T1 - Connecting two (or less) dots
T2 - Discovering structure in news articles
AU - Shahaf, Dafna
AU - Guestrin, Carlos
PY - 2012/2
Y1 - 2012/2
N2 - Finding information is becoming a major part of our daily life. Entire sectors, from Web users to scientists and intelligence analysts, are increasingly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this article, we investigate methods for automatically connecting the dots-providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our systemautomatically finds a coherent chain linking themtogether. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the health care debate (2009). We formalize the characteristics of a good chain and provide a fast search-driven algorithm to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. We also provide a method to handle partially-specified endpoints, for users who do not know both ends of a story. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate that the objective we propose captures the users' intuitive notion of coherence, and that our algorithm effectively helps users understand the news.
AB - Finding information is becoming a major part of our daily life. Entire sectors, from Web users to scientists and intelligence analysts, are increasingly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this article, we investigate methods for automatically connecting the dots-providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our systemautomatically finds a coherent chain linking themtogether. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the health care debate (2009). We formalize the characteristics of a good chain and provide a fast search-driven algorithm to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. We also provide a method to handle partially-specified endpoints, for users who do not know both ends of a story. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate that the objective we propose captures the users' intuitive notion of coherence, and that our algorithm effectively helps users understand the news.
KW - Coherence
UR - http://www.scopus.com/inward/record.url?scp=84857764188&partnerID=8YFLogxK
U2 - 10.1145/2086737.2086744
DO - 10.1145/2086737.2086744
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84857764188
SN - 1556-4681
VL - 5
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 4
M1 - 24
ER -