TY - JOUR
T1 - The effectiveness of Lloyd-type methods for the k-means problem
AU - Ostrovsky, Rafail
AU - Rabani, Yuval
AU - Schulman, Leonard J.
AU - Swamy, Chaitanya
PY - 2012/12
Y1 - 2012/12
N2 - We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application.We propose and justify a clusterability criterion for data sets.We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.
AB - We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application.We propose and justify a clusterability criterion for data sets.We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.
KW - Approximation algorithms
KW - Randomized algorithms
UR - http://www.scopus.com/inward/record.url?scp=84872469320&partnerID=8YFLogxK
U2 - 10.1145/2395116.2395117
DO - 10.1145/2395116.2395117
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84872469320
SN - 0004-5411
VL - 59
JO - Journal of the ACM
JF - Journal of the ACM
IS - 6
M1 - 28
ER -