Class discovery in gene expression data

A. Ben-Dor, N. Friedman*, Z. Yakhini

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

83 Scopus citations


Recent studies (Alizadeh et al, [1]; Bittner et al, [5]; Golub et al, [11]) demonstrate the discovery of putative disease subtypes from gene expression data. The underlying computational problem is to partition the set of sample tissues into statistically meaningful classes. In this paper we present a novel approach to class discovery and develop automatic analysis methods. Our approach is based on statistically scoring candidate partitions according to the over-abundance of genes that separate the different classes. Indeed, in biological datasets, an overabundance of genes separating known classes is typically observed. we measure overabundance against a stochastic null model. This allows for highlighting subtle, yet meaningful, partitions that are supported on a small subset of the genes. Using simulated annealing we explore the space of all possible partitions of the set of samples, seeking partitions with statistically significant overabundance of differentially expressed genes. We demonstrate the pe rformance of our methods on synthetic data, where we recover planted partitions. Finally, we turn to tumor expression datasets, and show that we find several highly pronounced partitions.

Original languageAmerican English
Number of pages8
StatePublished - 2001
Event5th Annual Internatinal Conference on Computational Biology - Montreal, Que., Canada
Duration: 22 May 200126 May 2001


Conference5th Annual Internatinal Conference on Computational Biology
CityMontreal, Que.


Dive into the research topics of 'Class discovery in gene expression data'. Together they form a unique fingerprint.

Cite this