TY - JOUR
T1 - Annotation of the Drosophila melanogastereuchromatic genome
T2 - a systematic review
AU - Misra, Sima
AU - Crosby, Madeline A.
AU - Mungall, Christopher J.
AU - Matthews, Beverley B.
AU - Campbell, Kathryn S.
AU - Hradecky, Pavel
AU - Huang, Yanmei
AU - Kaminker, Joshua S.
AU - Millburn, Gillian H.
AU - Prochnik, Simon E.
AU - Smith, Christopher D.
AU - Tupy, Jonathan L.
AU - Whitfield, Eleanor J.
AU - Bayraktaroglu, Leyla
AU - Berman, Benjamin P.
AU - Bettencourt, Brian R.
AU - Celniker, Susan E.
AU - de Grey, Aubrey Dnj
AU - Drysdale, Rachel A.
AU - Harris, Nomi L.
AU - Richter, John
AU - Russo, Susan
AU - Schroeder, Andrew J.
AU - Shu, Sheng Qiang
AU - Stapleton, Mark
AU - Yamada, Chihiro
AU - Ashburner, Michael
AU - Gelbart, William M.
AU - Rubin, Gerald M.
AU - Lewis, Suzanna E.
N1 - Publisher Copyright:
© 2002, Misra et al., licensee BioMed Central Ltd.
PY - 2002/12
Y1 - 2002/12
N2 - Background: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. Results: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. Conclusions: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
AB - Background: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. Results: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. Conclusions: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
KW - Additional Data File
KW - Alternative Transcript
KW - Nest Gene
KW - Splice Junction
KW - cDNA Data
UR - http://www.scopus.com/inward/record.url?scp=0002636459&partnerID=8YFLogxK
U2 - 10.1186/gb-2002-3-12-research0083
DO - 10.1186/gb-2002-3-12-research0083
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 12537572
AN - SCOPUS:0002636459
SN - 1474-7596
VL - 3
JO - Genome Biology
JF - Genome Biology
IS - 12
M1 - research0083.1
ER -