The little known universe of short proteins in insects: A machine learning approach

Dan Ofer, Nadav Rappoport, Michal Linial

Research output: Chapter in Book/Report/Conference proceedingChapter


Modern genomics and proteomics technologies are turning out immense quantities of sequenced proteins. The only feasible way to assign functions to this flood of sequences is by applying state-of-the-art computational methods for automated functional annotation. We illustrate the significance of machine learning tools in identifying and annotating short bioactive proteins and peptides from insect genomes. Over 500,000 full-length proteins from insects are currently archived in databases, of which textasciitilde15 % are short proteins. Currently, most short sequences remain uncharacterized. We developed a platform to systematically identify the functional class of short toxin-like peptides in metazoa. We present data from eight representative genomes (140,000 proteins) that cover the main phylogenetic branches of Hexapoda. The platform is a trained machine-predictor that successfully identified textasciitilde800 toxin-like candidates, 250 of them predicted with high confidence. These proteins' functions include ion channel inhibition, protease inhibitors, antimicrobial peptides, and components of the innate immune system. Our systematic approach can be expanded to new genomes and other biological classes of proteins. Using similar methodologies, we illustrate the success of identifying overlooked neuropeptide precursors. The systematic discovery of insect neuropeptides and short toxin-like proteins allows developing new strategies for pest control and manipulating insects' behavior. The overlooked secreted short peptides are discussed with respect to their evolution and potential applications in biotechnology.
Original languageEnglish
Title of host publicationShort Views on Insect Genomics and Proteomics
Subtitle of host publicationInsect Genomics
EditorsChandrasekar Raman, Marian R. Goldsmith, Tolulope A. Agunbiade
Place of PublicationCham
PublisherSpringer International Publishing AG
Number of pages26
ISBN (Electronic)978-3-319-24235-4
ISBN (Print)978-3-319-24233-0, 978-3-319-37165-8
StatePublished - 2015

Publication series

NameEntomology in Focus (ENFO)
ISSN (Print)2405-853X
ISSN (Electronic)2405-8548


Dive into the research topics of 'The little known universe of short proteins in insects: A machine learning approach'. Together they form a unique fingerprint.

Cite this