Abstract
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine two sets of gene expression data measured across sets of tumor and normal clinical samples. One set consists of 2,000 genes, measured in 62 epithelial colon samples. The second consists of ≈100,000 clones, measured in 32 ovarian samples (unpublished, extension of data set described in [26]). We examine the use of scoring methods, measuring separation of tumors from normals using individual gene expression levels. These are then coupled with high dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the two data sets, employing SVM, AdaBoost and a novel clustering based classification technique. As tumor samples can differ from normal samples in their cell-type composition we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor vs normal classification, using sets of selected genes, with as well as without cellular contamination related members. These results are insensitive to the exact selection mechanism, over a certain range.
Original language | English |
---|---|
Pages | 54-64 |
Number of pages | 11 |
DOIs | |
State | Published - 2000 |
Externally published | Yes |
Event | RECOMB 2000: 4th Annual International Conference on Computational Molecular Biology - Tokyo, Jpn Duration: 8 Apr 2000 → 11 Apr 2000 |
Conference
Conference | RECOMB 2000: 4th Annual International Conference on Computational Molecular Biology |
---|---|
City | Tokyo, Jpn |
Period | 8/04/00 → 11/04/00 |