Abstract
Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. Our method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of all known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a simple and efficient way for identifying bacterial species compositions in biological samples. All supplementary data and the MATLAB code are available at www.broadinstitute.org/∼orzuk/publications/BCS/ .
Original language | English |
---|---|
Pages (from-to) | 1723-1741 |
Number of pages | 19 |
Journal | Journal of Computational Biology |
Volume | 18 |
Issue number | 11 |
DOIs | |
State | Published - 1 Nov 2011 |
Externally published | Yes |
Keywords
- algorithms
- genomics
- machine learning
- sequence analysis
- sequences
- statistics