TY - JOUR
T1 - Efficient calculation of interval scores for DNA copy number data analysis
AU - Lipson, Doron
AU - Aumann, Yonatan
AU - Ben-Dor, Amir
AU - Linial, Nathan
AU - Yakhini, Zohar
PY - 2005
Y1 - 2005
N2 - Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize φ(I) = ∑ vi/√|I| over all subintervals / in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity O (nε-2) that outputs an interval for which φ(I) is at least Opt/α(ε), where Opt is the actual optimum and α(ε) → 1 as ε → 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.
AB - Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize φ(I) = ∑ vi/√|I| over all subintervals / in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity O (nε-2) that outputs an interval for which φ(I) is at least Opt/α(ε), where Opt is the actual optimum and α(ε) → 1 as ε → 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.
UR - http://www.scopus.com/inward/record.url?scp=26444600827&partnerID=8YFLogxK
U2 - 10.1007/11415770_6
DO - 10.1007/11415770_6
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:26444600827
SN - 0302-9743
VL - 3500
SP - 83
EP - 100
JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 9th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2005
Y2 - 14 May 2005 through 18 May 2005
ER -