The emergence of massively parallel sequencing technology has revolutionized microbial profiling, allowing the unprecedented comparison of microbial diversity across time and space in a wide range of host-associated and environmental ecosystems. Although the high-throughput nature of such methods enables the detection of low-frequency bacteria, these advances come at the cost of sequencing read length, limiting the phylogenetic resolution possible by current methods. Here, we present a generic approach for integrating short reads from large genomic regions, thus enabling phylogenetic resolution far exceeding current methods. The approach is based on a mapping to a statistical model that is later solved as a constrained optimization problem. We demonstrate the utility of this method by analyzing human saliva and Drosophila samples, using Illumina single-end sequencing of a 750 bp amplicon of the 16S rRNA gene. Phylogenetic resolution is significantly extended while reducing the number of falsely detected bacteria, as compared with standard single-region Roche 454 Pyrosequencing. Our approach can be seamlessly applied to simultaneous sequencing of multiple genes providing a higher resolution view of the composition and activity of complex microbial communities.
Bibliographical noteFunding Information:
The Open University of Israel grant [IDD-12/02 to N.S.]; NIH [P50 GM068763 to P.J.T.]. Funding for open access charge: The Open University of Israel internal grant.