ProBASS—a language model with sequence and structural features for predicting the effect of mutations on binding affinity

Sagara N.S. Gurusinghe, Yibing Wu, William DeGrado*, Julia M. Shifman*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Protein–protein interactions (PPIs) govern virtually all cellular processes, and a single mutation within a PPI can significantly impact protein functionality, potentially leading to diseases. While numerous approaches have emerged to predict changes in the free energy of binding due to mutations (ΔΔGbind), most lack precision. Recently, protein language models (PLMs) have shown powerful predictive capabilities by leveraging both sequence and structural data from protein complexes, yet they have not been optimized specifically for ΔΔGbind prediction. Results: We developed an approach, ProBASS (Protein Binding Affinity from Structure and Sequence), to predict the effects of mutations on ΔΔGbind using two most advanced PLMs, ESM2 and ESM-IF1, which incorporate sequence and structural features, respectively. We first generated embeddings for each PPI mutant from the two PLMs and then fine-tuned ProBASS by training on a large dataset of experimental ΔΔGbind values. When training and testing were done on the same PPI, ProBASS achieved correlations with experimental ΔΔGbind values of 0.83 ± 0.05 and 0.69 ± 0.04 for single and double mutations, respectively. Additionally, when evaluated on a dataset of 2,325 single mutations across 131 PPIs, ProBASS reached a correlation of 0.81 ± 0.02, substantially outperforming other PLMs in predictive accuracy. Our results demonstrate that refining pre-trained PLMs with extensive ΔΔGbind datasets across multiple PPIs is a successful approach for creating a precise and broadly applicable ΔΔGbind prediction model, facilitating future protein engineering and design studies. ProBASS’s accuracy could be further improved through training as more experimental data becomes available.

Original languageEnglish
Article numberbtaf270
JournalBioinformatics
Volume41
Issue number5
DOIs
StatePublished - 1 May 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025. Published by Oxford University Press.

Fingerprint

Dive into the research topics of 'ProBASS—a language model with sequence and structural features for predicting the effect of mutations on binding affinity'. Together they form a unique fingerprint.

Cite this