TY - GEN
T1 - Clustering algorithms optimizer
T2 - 3rd International Symposium Bioinformatics Research and Applications, ISBRA 2007
AU - Varshavsky, Roy
AU - Horn, David
AU - Linial, Michal
PY - 2007
Y1 - 2007
N2 - Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a datadriven framework that includes two interrelated steps. The first one is SVDbased dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.
AB - Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a datadriven framework that includes two interrelated steps. The first one is SVDbased dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.
KW - Bayesian Information Criterion (BIC)
KW - Optimal K-Means (OKM)
KW - Optimal Quantum Clustering (OQC)
KW - Principal Component Analysis (PCA)
KW - Quantum Clustering (QC)
KW - Singular Value Decomposition (SVD)
UR - http://www.scopus.com/inward/record.url?scp=34547518297&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-72031-7_8
DO - 10.1007/978-3-540-72031-7_8
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:34547518297
SN - 3540720308
SN - 9783540720300
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 85
EP - 96
BT - Bioinformatics Research and Applications - Third International Symposium, ISBRA 2007, Proceedings
PB - Springer Verlag
Y2 - 7 May 2007 through 10 May 2007
ER -