TY - JOUR
T1 - Deeplasmid
T2 - Deep learning accurately separates plasmids from bacterial chromosomes
AU - Andreopoulos, William B.
AU - Geller, Alexander M.
AU - Lucke, Miriam
AU - Balewski, Jan
AU - Clum, Alicia
AU - Ivanova, Natalia N.
AU - Levy, Asaf
N1 - Publisher Copyright:
© 2022 The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2022/2/22
Y1 - 2022/2/22
N2 - Plasmids are mobile genetic elements that play a key role in microbial ecology and evolution by mediating horizontal transfer of important genes, such as antimicrobial resistance genes. Many microbial genomes have been sequenced by short read sequencers and have resulted in a mix of contigs that derive from plasmids or chromosomes. New tools that accurately identify plasmids are needed to elucidate new plasmid-borne genes of high biological importance. We have developed Deeplasmid, a deep learning tool for distinguishing plasmids from bacterial chromosomes based on the DNA sequence and its encoded biological data. It requires as input only assembled sequences generated by any sequencing platform and assembly algorithm and its runtime scales linearly with the number of assembled sequences. Deeplasmid achieves an AUC-ROC of over 89%, and it was more accurate than five other plasmid classification methods. Finally, as a proof of concept, we used Deeplasmid to predict new plasmids in the fish pathogen Yersinia ruckeri ATCC 29473 that has no annotated plasmids. Deeplasmid predicted with high reliability that a long assembled contig is part of a plasmid. Using long read sequencing we indeed validated the existence of a 102 kb long plasmid, demonstrating Deeplasmid's ability to detect novel plasmids.
AB - Plasmids are mobile genetic elements that play a key role in microbial ecology and evolution by mediating horizontal transfer of important genes, such as antimicrobial resistance genes. Many microbial genomes have been sequenced by short read sequencers and have resulted in a mix of contigs that derive from plasmids or chromosomes. New tools that accurately identify plasmids are needed to elucidate new plasmid-borne genes of high biological importance. We have developed Deeplasmid, a deep learning tool for distinguishing plasmids from bacterial chromosomes based on the DNA sequence and its encoded biological data. It requires as input only assembled sequences generated by any sequencing platform and assembly algorithm and its runtime scales linearly with the number of assembled sequences. Deeplasmid achieves an AUC-ROC of over 89%, and it was more accurate than five other plasmid classification methods. Finally, as a proof of concept, we used Deeplasmid to predict new plasmids in the fish pathogen Yersinia ruckeri ATCC 29473 that has no annotated plasmids. Deeplasmid predicted with high reliability that a long assembled contig is part of a plasmid. Using long read sequencing we indeed validated the existence of a 102 kb long plasmid, demonstrating Deeplasmid's ability to detect novel plasmids.
UR - http://www.scopus.com/inward/record.url?scp=85125009137&partnerID=8YFLogxK
U2 - 10.1093/nar/gkab1115
DO - 10.1093/nar/gkab1115
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 34871418
AN - SCOPUS:85125009137
SN - 0305-1048
VL - 50
SP - E17
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 3
ER -