Fishing with (Proto) net - A principled approach to protein target selection

Michal Linial*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Structural genomics strives to represent the entire protein space. The first step towards achieving this goal is by rationally selecting proteins whose structures have not been determined, but that represent an as yet unknown structural superfamily or fold. Once such a structure is solved, it can be used as a template for modelling homologous proteins. This will aid in unveiling the structural diversity of the protein space. Currently, no reliable method for accurate 3D structural prediction is available when a sequence or a structure homologue is not available. Here we present a systematic methodology for selecting target proteins whose structure is likely to adopt a new, as yet unknown superfamily or fold. Our method takes advantage of a global classification of the sequence space as presented by ProtoNet-3D, which is a hierarchical agglomerative clustering of the proteins of interest (the proteins in Swiss-Prot) along with all solved structures (taken from the PDB). By navigating in the scaffold of ProtoNet-3D, we yield a prioritized list of proteins that are not yet structurally solved, along with the probability of each of the proteins belonging to a new superfamily or fold. The sorted list has been self-validated against real structural data that was not available when the predictions were made. The practical application of using our computational-statistical method to determine novel superfamilies for structural genomics projects is also discussed.

Original languageEnglish
Pages (from-to)542-548
Number of pages7
JournalComparative and Functional Genomics
Volume4
Issue number5
DOIs
StatePublished - Oct 2003

Keywords

  • 3D structure
  • Algorithm
  • Clustering
  • Hierarchical classification
  • Protein families
  • SCOP

Fingerprint

Dive into the research topics of 'Fishing with (Proto) net - A principled approach to protein target selection'. Together they form a unique fingerprint.

Cite this