TY - GEN
T1 - Learning Distance Functions using Equivalence Relations
AU - Bar-Hillel, Aharon
AU - Hertz, Tomer
AU - Shental, Noam
AU - Weinshall, Daphna
PY - 2003
Y1 - 2003
N2 - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.
AB - We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning (Xing et al., 2002), which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.
KW - Clustering
KW - Feature selection
KW - Learning from partial knowledge
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=1942517347&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:1942517347
SN - 1577351894
T3 - Proceedings, Twentieth International Conference on Machine Learning
SP - 11
EP - 18
BT - Proceedings, Twentieth International Conference on Machine Learning
A2 - Fawcett, T.
A2 - Mishra, N.
T2 - Proceedings, Twentieth International Conference on Machine Learning
Y2 - 21 August 2003 through 24 August 2003
ER -