TY - JOUR
T1 - Design of multispecific protein sequences using probabilistic graphical modeling
AU - Fromer, Menachem
AU - Yanover, Chen
AU - Linial, Michal
PY - 2010
Y1 - 2010
N2 - In nature, proteins partake in numerous protein-protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
AB - In nature, proteins partake in numerous protein-protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
KW - Approximate inference
KW - Belief propagation
KW - Maximum-a-posteriori estimation
KW - Multistate protein design
KW - Probabilistic graphical models
KW - Structural promiscuity
KW - Structural sequence space
KW - Structural specificity
UR - http://www.scopus.com/inward/record.url?scp=77949894732&partnerID=8YFLogxK
U2 - 10.1002/prot.22575
DO - 10.1002/prot.22575
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 19842166
AN - SCOPUS:77949894732
SN - 0887-3585
VL - 78
SP - 530
EP - 547
JO - Proteins: Structure, Function and Bioinformatics
JF - Proteins: Structure, Function and Bioinformatics
IS - 3
ER -