TY - JOUR
T1 - ProtoBee
T2 - Hierarchical classification and annotation of the honey bee proteome
AU - Kaplan, Noam
AU - Linial, Michal
PY - 2006
Y1 - 2006
N2 - The recently sequenced genome of the honey bee (Apis mellifera) has produced 10,157 predicted protein sequences, calling for a computational effort to extract biological insights from them. We have applied an unsupervised hierarchical protein-clustering method, which was previously used in the ProtoNet system, to nearly 200,000 proteins consisting of the predicted honey bee proteins, the SWISS-PROT protein database, and the complete set of proteins of the mouse (Mus musculus) and the fruit fly (Drosophila melanogaster). The hierarchy produced by this method has been entitled ProtoBee. In ProtoBee, the proteins are hierarchically organized into 18,936 separate tree hierarchies, each representing a protein functional family. By using the mouse and Drosophila complete proteomes as reference, we are able to highlight functional groups of putative gene-loss events, putative novel proteins of unique functionality, and bee-specific paralogs. We have studied some of the ProtoBee findings and suggest their biological relevance. Examples include novel opsin genes and intriguing nuclear matches of mitochondrial genes. The organization of bee sequences into functional clusters suggests a natural way of automatically inferring functional annotation. Following this notion, we were able to assign functional annotation to about 70% of the sequences. ProtoBee is available at www.protobee.cs.huji. ac.il.
AB - The recently sequenced genome of the honey bee (Apis mellifera) has produced 10,157 predicted protein sequences, calling for a computational effort to extract biological insights from them. We have applied an unsupervised hierarchical protein-clustering method, which was previously used in the ProtoNet system, to nearly 200,000 proteins consisting of the predicted honey bee proteins, the SWISS-PROT protein database, and the complete set of proteins of the mouse (Mus musculus) and the fruit fly (Drosophila melanogaster). The hierarchy produced by this method has been entitled ProtoBee. In ProtoBee, the proteins are hierarchically organized into 18,936 separate tree hierarchies, each representing a protein functional family. By using the mouse and Drosophila complete proteomes as reference, we are able to highlight functional groups of putative gene-loss events, putative novel proteins of unique functionality, and bee-specific paralogs. We have studied some of the ProtoBee findings and suggest their biological relevance. Examples include novel opsin genes and intriguing nuclear matches of mitochondrial genes. The organization of bee sequences into functional clusters suggests a natural way of automatically inferring functional annotation. Following this notion, we were able to assign functional annotation to about 70% of the sequences. ProtoBee is available at www.protobee.cs.huji. ac.il.
UR - http://www.scopus.com/inward/record.url?scp=33750526885&partnerID=8YFLogxK
U2 - 10.1101/gr.4916306
DO - 10.1101/gr.4916306
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 17065614
AN - SCOPUS:33750526885
SN - 1088-9051
VL - 16
SP - 1431
EP - 1438
JO - Genome Research
JF - Genome Research
IS - 11
ER -