The effectiveness of Lloyd-type methods for the k-means problem

Rafail Ostrovsky*, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

153 Scopus citations

Abstract

We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application.We propose and justify a clusterability criterion for data sets.We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.

Original languageEnglish
Article number28
JournalJournal of the ACM
Volume59
Issue number6
DOIs
StatePublished - Dec 2012

Keywords

  • Approximation algorithms
  • Randomized algorithms

Fingerprint

Dive into the research topics of 'The effectiveness of Lloyd-type methods for the k-means problem'. Together they form a unique fingerprint.

Cite this