TY - GEN
T1 - The effectiveness of Lloyd-type methods for the k-means problem
AU - Ostrovsky, Rafail
AU - Rabani, Yuval
AU - Schulman, Leonard J.
AU - Swamy, Chaitanya
PY - 2006
Y1 - 2006
N2 - We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.
AB - We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.
UR - http://www.scopus.com/inward/record.url?scp=35348899361&partnerID=8YFLogxK
U2 - 10.1109/FOCS.2006.75
DO - 10.1109/FOCS.2006.75
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:35348899361
SN - 0769527205
SN - 9780769527208
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 165
EP - 174
BT - 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006
T2 - 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006
Y2 - 21 October 2006 through 24 October 2006
ER -