Abstract
Many proteins have small-molecule binding pockets that are not easily detectable in the ligand-free structures. These cryptic sites require a conformational change to become apparent; a cryptic site can therefore be defined as a site that forms a pocket in a holo structure, but not in the apo structure. Because many proteins appear to lack druggable pockets, understanding and accurately identifying cryptic sites could expand the set of drug targets. Previously, cryptic sites were identified experimentally by fragment-based ligand discovery and computationally by long molecular dynamics simulations and fragment docking. Here, we begin by constructing a set of structurally defined apo-holo pairs with cryptic sites. Next, we comprehensively characterize the cryptic sites in terms of their sequence, structure, and dynamics attributes. We find that cryptic sites tend to be as conserved in evolution as traditional binding pockets but are less hydrophobic and more flexible. Relying on this characterization, we use machine learning to predict cryptic sites with relatively high accuracy (for our benchmark, the true positive and false positive rates are 73% and 29%, respectively). We then predict cryptic sites in the entire structurally characterized human proteome (11,201 structures, covering 23% of all residues in the proteome). CryptoSite increases the size of the potentially "druggable" human proteome from ∼40% to ∼78% of disease-associated proteins. Finally, to demonstrate the utility of our approach in practice, we experimentally validate a cryptic site in protein tyrosine phosphatase 1B using a covalent ligand and NMR spectroscopy. The CryptoSite Web server is available at http://salilab.org/cryptosite.
Original language | English |
---|---|
Pages (from-to) | 709-719 |
Number of pages | 11 |
Journal | Journal of Molecular Biology |
Volume | 428 |
Issue number | 4 |
DOIs | |
State | Published - 22 Feb 2016 |
Externally published | Yes |
Bibliographical note
Funding Information:The authors thank Hao Fan, Marcus Fischer, Nir London, Avner Schlessinger, and other members of Sali laboratory for their comments and feedback. P.C. is supported by a Howard Hughes Predoctoral Fellowship; T.J.R. is supported by a predoctoral fellowship from the National Institutes of Health (F31 CA180378) and the Krevans Fellowship; L.B. is supported by Bayer Science and Education Foundation; D.A.K. is supported by an A. P. Giannini Foundation Postdoctoral Research Fellowship; R.A.W. is supported by a National Science Foundation Graduate Research Fellowship; J.S.F. is supported by the National Institutes of Health (DP5 OD009180, R21 GM110580, and P30 DK063720) and National Science Foundation (STC-1231306); A.S. is supported by the National Institutes of Health (R01 GM083960, U54 RR022220, U54 GM094662, P01 AI091575, and U01 GM098256).
Funding Information:
The authors thank Hao Fan, Marcus Fischer, Nir London, Avner Schlessinger, and other members of Sali laboratory for their comments and feedback. P.C. is supported by a Howard Hughes Predoctoral Fellowship ; T.J.R. is supported by a predoctoral fellowship from the National Institutes of Health ( F31 CA180378 ) and the Krevans Fellowship ; L.B. is supported by Bayer Science and Education Foundation ; D.A.K. is supported by an A. P. Giannini Foundation Postdoctoral Research Fellowship ; R.A.W. is supported by a National Science Foundation Graduate Research Fellowship; J.S.F. is supported by the National Institutes of Health ( DP5 OD009180 , R21 GM110580 , and P30 DK063720 ) and National Science Foundation ( STC-1231306 ); A.S. is supported by the National Institutes of Health ( R01 GM083960 , U54 RR022220 , U54 GM094662 , P01 AI091575 , and U01 GM098256 ).
Keywords
- cryptic binding sites
- machine learning
- protein dynamics
- undruggable proteins