TY - JOUR
T1 - Polynomial-time approximation schemes for geometric Min-Sum median clustering
AU - Ostrovsky, Rafail
AU - Rabani, Yuval
PY - 2002
Y1 - 2002
N2 - The Johnson-Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O(Iog n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings. More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0, 1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in flavor to the k-median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
AB - The Johnson-Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O(Iog n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings. More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝ d) endowed with a distance function (e.g., L 2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0, 1} d with Hamming distance, and ℝ d either with L 1 distance, or with L 2 distance, or with the square of L 2 distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in flavor to the k-median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
UR - http://www.scopus.com/inward/record.url?scp=0037709221&partnerID=8YFLogxK
U2 - 10.1145/506147.506149
DO - 10.1145/506147.506149
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0037709221
SN - 0004-5411
VL - 49
SP - 139
EP - 156
JO - Journal of the ACM
JF - Journal of the ACM
IS - 2
ER -