TY - JOUR
T1 - Rich probabilistic models for gene expression
AU - Segal, Eran
AU - Taskar, Ben
AU - Gasch, Audrey
AU - Friedman, Nir
AU - Koller, Daphne
PY - 2001
Y1 - 2001
N2 - Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. Second, clustering methods cannot readily incorporate additional types of information, such as clinical data or known attributes of genes. To circumvent these shortcomings, we propose the use of a single coherent probabilistic model, that encompasses much of the rich structure in the genomic expression data, while incorporating additional information such as experiment type, putative binding sites, or functional information. We show how this model can be learned from the data, allowing us to discover patterns in the data and dependencies between the gene expression patterns and additional attributes. The learned model reveals context-specific relationships, that exist only over a subset of the experiments in the dataset. We demonstrate the power of our approach on synthetic data and on two real-world gene expression data sets for yeast. For example, we demonstrate a novel functionality that falls naturally out of our framework: predicting the "cluster" of the array resulting from a gene mutation based only on the gene's expression pattern in the context of other mutations.
AB - Clustering is commonly used for analyzing gene expression data. Despite their successes, clustering methods suffer from a number of limitations. First, these methods reveal similarities that exist over all of the measurements, while obscuring relationships that exist over only a subset of the data. Second, clustering methods cannot readily incorporate additional types of information, such as clinical data or known attributes of genes. To circumvent these shortcomings, we propose the use of a single coherent probabilistic model, that encompasses much of the rich structure in the genomic expression data, while incorporating additional information such as experiment type, putative binding sites, or functional information. We show how this model can be learned from the data, allowing us to discover patterns in the data and dependencies between the gene expression patterns and additional attributes. The learned model reveals context-specific relationships, that exist only over a subset of the experiments in the dataset. We demonstrate the power of our approach on synthetic data and on two real-world gene expression data sets for yeast. For example, we demonstrate a novel functionality that falls naturally out of our framework: predicting the "cluster" of the array resulting from a gene mutation based only on the gene's expression pattern in the context of other mutations.
UR - http://www.scopus.com/inward/record.url?scp=0035237805&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/17.suppl_1.S243
DO - 10.1093/bioinformatics/17.suppl_1.S243
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 11473015
AN - SCOPUS:0035237805
SN - 1367-4803
VL - 17
SP - S243-S252
JO - Bioinformatics
JF - Bioinformatics
IS - SUPPL. 1
ER -