Efficient calculation of interval scores for DNA copy number data analysis

Doron Lipson*, Yonatan Aumann, Amir Ben-Dor, Nathan Linial, Zohar Yakhini

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

134 Scopus citations

Abstract

DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray-based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA, or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These date are then further analyzed to map aberrations and boundaries and identify biologically significant structures. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real-valued vectors of signals. The simplest form of the optimization problem seeks to maximize φ(I) = Σvi/√|I| over all subintervals I in the input vector. We present and prove a linear time approximation scheme for this problem, namely, a process with time complexity O (nε-2) that outputs an interval for which φ(I) is at least Opt/α(ε), where Opt is the actual optimum and α(ε) → 1 as ε → 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.

Original languageEnglish
Pages (from-to)215-228
Number of pages14
JournalJournal of Computational Biology
Volume13
Issue number2
DOIs
StatePublished - Mar 2006

Keywords

  • Approximation
  • CGH
  • Cancer
  • Microarray analysis
  • Optimization

Fingerprint

Dive into the research topics of 'Efficient calculation of interval scores for DNA copy number data analysis'. Together they form a unique fingerprint.

Cite this