Sorting points into neighborhoods (SPIN): Data analysis and visualization by ordering distance matrices

D. Tsafrir*, I. Tsafrir, L. Ein-Dor, O. Zuk, D. A. Notterman, E. Domany

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

120 Scopus citations

Abstract

Summary: We introduce a novel unsupervised approach for the organization and visualization of multidimensional data. At the heart of the method is a presentation of the full pairwise distance matrix of the data points, viewed in pseudocolor. The ordering of points is iteratively permuted in search of a linear ordering, which can be used to study embedded shapes. Several examples indicate how the shapes of certain structures in the data (elongated, circular and compact) manifest themselves visually in our permuted distance matrix. It is important to identify the elongated objects since they are often associated with a set of hidden variables, underlying continuous variation in the data. The problem of determining an optimal linear ordering is shown to be NP-Complete, and therefore an iterative search algorithm with O(n3) step-complexity is suggested. By using sorting points into neighborhoods, i.e. SPIN to analyze colon cancer expression data we were able to address the serious problem of sample heterogeneity, which hinders identification of metastasis related genes in our data. Our methodology brings to light the continuous variation of heterogeneity - starting with homogeneous tumor samples and gradually increasing the amount of another tissue. Ordering the samples according to their degree of contamination by unrelated tissue allows the separation of genes associated with irrelevant contamination from those related to cancer progression.

Original languageEnglish
Pages (from-to)2301-2308
Number of pages8
JournalBioinformatics
Volume21
Issue number10
DOIs
StatePublished - 15 May 2005
Externally publishedYes

Bibliographical note

Funding Information:
This work was supported by the NIH under grant #5 P01 CA 65930-06. We thank P.B. Paty and W.L. Gerald for preparation of the colon cancer samples and acknowledge the use of the Gene Expression Core Facility of the Cancer Institute of New Jersey. We acknowledge the partial support by an EC Research Training Network (STIPCO), by the Ridgefield Foundation and by EC FP6 funding. This publication reflects the author’s views and not necessarily those of the EC. The Community is not liable for any use that may be made of the information contained herein. We thank U. Feige, I. Kanter, A. Natanzon, Y. Pilpel and R. Raz for useful discussions and their comments.

Fingerprint

Dive into the research topics of 'Sorting points into neighborhoods (SPIN): Data analysis and visualization by ordering distance matrices'. Together they form a unique fingerprint.

Cite this