TY - GEN
T1 - Scalable multi stage clustering of tagged micro-messages
AU - Tsur, Oren
AU - Littman, Adi
AU - Rappoport, Ari
PY - 2012
Y1 - 2012
N2 - The growing popularity of microblogging backed by services like Twitter, Facebook, Google+ and LinkedIn, raises the challenge of clustering short and extremely sparse documents. In this work we propose SMSC - a scalable, accurate and efficient multi stage clustering algorithm. Our algorithm leverages users practice of adding tags to some messages by bootstrapping over virtual non sparse documents. We experiment on a large corpus of tweets from Twitter, and evaluate results against a gold-standard classification validated by seven clustering evaluation measures (information theoretic, paired and greedy). Results show that the algorithm presented is both accurate and efficient, significantly outperforming other algorithms. Under reasonable practical assumptions, our algorithm scales up sublinearly in time. Copyright is held by the author/owner(s).
AB - The growing popularity of microblogging backed by services like Twitter, Facebook, Google+ and LinkedIn, raises the challenge of clustering short and extremely sparse documents. In this work we propose SMSC - a scalable, accurate and efficient multi stage clustering algorithm. Our algorithm leverages users practice of adding tags to some messages by bootstrapping over virtual non sparse documents. We experiment on a large corpus of tweets from Twitter, and evaluate results against a gold-standard classification validated by seven clustering evaluation measures (information theoretic, paired and greedy). Results show that the algorithm presented is both accurate and efficient, significantly outperforming other algorithms. Under reasonable practical assumptions, our algorithm scales up sublinearly in time. Copyright is held by the author/owner(s).
KW - Clustering
KW - Hashtags
KW - Microblogging
KW - Scalability
KW - Short documents
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=84861035731&partnerID=8YFLogxK
U2 - 10.1145/2187980.2188157
DO - 10.1145/2187980.2188157
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84861035731
SN - 9781450312301
T3 - WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
SP - 621
EP - 622
BT - WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
T2 - 21st Annual Conference on World Wide Web, WWW'12
Y2 - 16 April 2012 through 20 April 2012
ER -