TY - JOUR
T1 - MutagenPred-GCNNs
T2 - A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints
AU - Li, Shimeng
AU - Zhang, Li
AU - Feng, Huawei
AU - Meng, Jinhui
AU - Xie, Di
AU - Yi, Liwei
AU - Arkin, Isaiah T.
AU - Liu, Hongsheng
N1 - Publisher Copyright:
© 2021, International Association of Scientists in the Interdisciplinary Areas.
PY - 2021/3
Y1 - 2021/3
N2 - An important task in the early stage of drug discovery is the identification of mutagenic compounds. Mutagenicity prediction models that can interpret relationships between toxicological endpoints and compound structures are especially favorable. In this research, we used an advanced graph convolutional neural network (GCNN) architecture to identify the molecular representation and develop predictive models based on these representations. The predictive model based on features extracted by GCNNs can not only predict the mutagenicity of compounds but also identify the structure alerts in compounds. In fivefold cross-validation and external validation, the highest area under the curve was 0.8782 and 0.8382, respectively; the highest accuracy (Q) was 80.98% and 76.63%, respectively; the highest sensitivity was 83.27% and 78.92%, respectively; and the highest specificity was 78.83% and 76.32%, respectively. Additionally, our model also identified some toxicophores, such as aromatic nitro, three-membered heterocycles, quinones, and nitrogen and sulfur mustard. These results indicate that GCNNs could learn the features of mutagens effectively. In summary, we developed a mutagenicity classification model with high predictive performance and interpretability based on a data-driven molecular representation trained through GCNNs.
AB - An important task in the early stage of drug discovery is the identification of mutagenic compounds. Mutagenicity prediction models that can interpret relationships between toxicological endpoints and compound structures are especially favorable. In this research, we used an advanced graph convolutional neural network (GCNN) architecture to identify the molecular representation and develop predictive models based on these representations. The predictive model based on features extracted by GCNNs can not only predict the mutagenicity of compounds but also identify the structure alerts in compounds. In fivefold cross-validation and external validation, the highest area under the curve was 0.8782 and 0.8382, respectively; the highest accuracy (Q) was 80.98% and 76.63%, respectively; the highest sensitivity was 83.27% and 78.92%, respectively; and the highest specificity was 78.83% and 76.32%, respectively. Additionally, our model also identified some toxicophores, such as aromatic nitro, three-membered heterocycles, quinones, and nitrogen and sulfur mustard. These results indicate that GCNNs could learn the features of mutagens effectively. In summary, we developed a mutagenicity classification model with high predictive performance and interpretability based on a data-driven molecular representation trained through GCNNs.
KW - Deep learning
KW - Graph convolutional networks
KW - Mutagenicity prediction
UR - http://www.scopus.com/inward/record.url?scp=85099852608&partnerID=8YFLogxK
U2 - 10.1007/s12539-020-00407-2
DO - 10.1007/s12539-020-00407-2
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 33506363
AN - SCOPUS:85099852608
SN - 1913-2751
VL - 13
SP - 25
EP - 33
JO - Interdisciplinary Sciences - Computational Life Sciences
JF - Interdisciplinary Sciences - Computational Life Sciences
IS - 1
ER -