TY - GEN
T1 - Accurate profiling of microbial communities from massively parallel sequencing using convex optimization
AU - Zuk, Or
AU - Amir, Amnon
AU - Zeisel, Amit
AU - Shamir, Ohad
AU - Shental, Noam
PY - 2013
Y1 - 2013
N2 - We describe the Microbial Community Reconstruction (MCR) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with ∼ 106 species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.
AB - We describe the Microbial Community Reconstruction (MCR) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with ∼ 106 species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.
KW - Convex optimization
KW - Massively Parallel Sequencing
KW - Microbial Community Reconstruction
KW - Short reads
UR - http://www.scopus.com/inward/record.url?scp=84890412464&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-02432-5_31
DO - 10.1007/978-3-319-02432-5_31
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84890412464
SN - 9783319024318
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 279
EP - 297
BT - String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings
PB - Springer Verlag
T2 - 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013
Y2 - 7 October 2013 through 9 October 2013
ER -