It is a major problem in the study of protein structure to predict which proteins have new, currently unknown structural folds. In an attempt to address this problem we studied the location of all proteins with solved structures within the map of all known protein sequences provided by ProtoMap. The mutual distances in this map among solved structures are used to derive a probabilistic model from which we infer an estimate for the probability of an unsolved protein to have a new fold. The probabilities were based on data from SCOP release 1.37. The results were evaluated against the more recent SCOP pre-release 1.41. Our predicted probabilities for unsolved proteins to have a new fold are very well correlated with the proportion of new folds among recently released structures. Thus, information about the structure of proteins can be inferred from a global relational view of protein sequences. Finally, the same procedure was applied to estimate probabilities on the basis of SCOP 1.41. A list of the highest scoring proteins is provided: These are about 80 non-membranous proteins that belong to clusters with more than 5 proteins and achieve the highest probability to have a new fold. A rational selection for 3D determination of those targets is expected to accelerate the pace of new fold discovery.
|Number of pages
|Published - 2000
|RECOMB 2000: 4th Annual International Conference on Computational Molecular Biology - Tokyo, Jpn
Duration: 8 Apr 2000 → 11 Apr 2000
|RECOMB 2000: 4th Annual International Conference on Computational Molecular Biology
|8/04/00 → 11/04/00