Abstract
Summary: We introduce a novel unsupervised approach for the organization and visualization of multidimensional data. At the heart of the method is a presentation of the full pairwise distance matrix of the data points, viewed in pseudocolor. The ordering of points is iteratively permuted in search of a linear ordering, which can be used to study embedded shapes. Several examples indicate how the shapes of certain structures in the data (elongated, circular and compact) manifest themselves visually in our permuted distance matrix. It is important to identify the elongated objects since they are often associated with a set of hidden variables, underlying continuous variation in the data. The problem of determining an optimal linear ordering is shown to be NP-Complete, and therefore an iterative search algorithm with O(n3) step-complexity is suggested. By using sorting points into neighborhoods, i.e. SPIN to analyze colon cancer expression data we were able to address the serious problem of sample heterogeneity, which hinders identification of metastasis related genes in our data. Our methodology brings to light the continuous variation of heterogeneity - starting with homogeneous tumor samples and gradually increasing the amount of another tissue. Ordering the samples according to their degree of contamination by unrelated tissue allows the separation of genes associated with irrelevant contamination from those related to cancer progression.
Original language | English |
---|---|
Pages (from-to) | 2301-2308 |
Number of pages | 8 |
Journal | Bioinformatics |
Volume | 21 |
Issue number | 10 |
DOIs | |
State | Published - 15 May 2005 |
Externally published | Yes |
Bibliographical note
Funding Information:This work was supported by the NIH under grant #5 P01 CA 65930-06. We thank P.B. Paty and W.L. Gerald for preparation of the colon cancer samples and acknowledge the use of the Gene Expression Core Facility of the Cancer Institute of New Jersey. We acknowledge the partial support by an EC Research Training Network (STIPCO), by the Ridgefield Foundation and by EC FP6 funding. This publication reflects the author’s views and not necessarily those of the EC. The Community is not liable for any use that may be made of the information contained herein. We thank U. Feige, I. Kanter, A. Natanzon, Y. Pilpel and R. Raz for useful discussions and their comments.