TY - JOUR
T1 - Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing
AU - Yassoura, Moran
AU - Kaplan, Tommy
AU - Fraser, Hunter B.
AU - Levin, Joshua Z.
AU - Pfiffner, Jenna
AU - Adiconis, Xian
AU - Schroth, Gary
AU - Luo, Shujun
AU - Khrebtukova, Irina
AU - Gnirke, Andreas
AU - Nusbaum, Chad
AU - Thompson, Dawn Anne
AU - Friedman, Nir
AU - Regev, Aviv
PY - 2009/3/3
Y1 - 2009/3/3
N2 - Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5′ and 3′ UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
AB - Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5′ and 3′ UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
KW - Computational biology
KW - Next generation sequencing
KW - RNAseq
KW - Saccharomyces cerevisiae
KW - Transcriptome profiling
UR - http://www.scopus.com/inward/record.url?scp=62549126083&partnerID=8YFLogxK
U2 - 10.1073/pnas.0812841106
DO - 10.1073/pnas.0812841106
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 19208812
AN - SCOPUS:62549126083
SN - 0027-8424
VL - 106
SP - 3264
EP - 3269
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 9
ER -