The effectiveness of Lloyd-type methods for the k-means problem

Rafail Ostrovsky*, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

205 Scopus citations

Abstract

We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.

Original languageEnglish
Title of host publication47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006
Pages165-174
Number of pages10
DOIs
StatePublished - 2006
Externally publishedYes
Event47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006 - Berkeley, CA, United States
Duration: 21 Oct 200624 Oct 2006

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
ISSN (Print)0272-5428

Conference

Conference47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006
Country/TerritoryUnited States
CityBerkeley, CA
Period21/10/0624/10/06

Fingerprint

Dive into the research topics of 'The effectiveness of Lloyd-type methods for the k-means problem'. Together they form a unique fingerprint.

Cite this