TY - JOUR
T1 - Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints
AU - Feng, Huawei
AU - Zhang, Li
AU - Li, Shimeng
AU - Liu, Lili
AU - Yang, Tianzhou
AU - Yang, Pengyu
AU - Zhao, Jian
AU - Arkin, Isaiah Tuvia
AU - Liu, Hongsheng
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Reproductive toxicity endpoints are a significant safety concern in the assessment of the adverse effects of chemicals in drug discovery. Computational models that can accurately predict a chemical's toxic potential are increasingly pursued to replace traditional animal experiments. Thus, ensemble learning models were built to predict the reproductive toxicity of compounds. Our ensemble models were developed using support vector machine, random forest, and extreme gradient boosting methods and 9 molecular fingerprints calculated for a dataset containing 1823 chemicals. The best prediction performance was achieved by the Ensemble-Top12 model, with an accuracy (ACC) of 86.33 %, a sensitivity (SEN) of 82.02 %, a specificity (SPE) of 90.19 %, and an area under the receiver operating characteristic curve (AUC) of 0.937 in 5-fold cross-validation and ACC, SEN, SPE, and AUC values of 84.38 %, 86.90 %, 90.67 %, and 0.920, respectively, in external validation. We also defined the applicability domain (AD) of the ensemble model by calculating the Tanimoto distance of the training set. Compared with models in existing literature, our ensemble model achieves relatively high ACC, SPE and AUC values. We also identified several fingerprint features related to chemical reproductive toxicity. Considering the performance of model, we recommend using the Ensemble-Top12 model to predict reproductive toxicity in early drug development.
AB - Reproductive toxicity endpoints are a significant safety concern in the assessment of the adverse effects of chemicals in drug discovery. Computational models that can accurately predict a chemical's toxic potential are increasingly pursued to replace traditional animal experiments. Thus, ensemble learning models were built to predict the reproductive toxicity of compounds. Our ensemble models were developed using support vector machine, random forest, and extreme gradient boosting methods and 9 molecular fingerprints calculated for a dataset containing 1823 chemicals. The best prediction performance was achieved by the Ensemble-Top12 model, with an accuracy (ACC) of 86.33 %, a sensitivity (SEN) of 82.02 %, a specificity (SPE) of 90.19 %, and an area under the receiver operating characteristic curve (AUC) of 0.937 in 5-fold cross-validation and ACC, SEN, SPE, and AUC values of 84.38 %, 86.90 %, 90.67 %, and 0.920, respectively, in external validation. We also defined the applicability domain (AD) of the ensemble model by calculating the Tanimoto distance of the training set. Compared with models in existing literature, our ensemble model achieves relatively high ACC, SPE and AUC values. We also identified several fingerprint features related to chemical reproductive toxicity. Considering the performance of model, we recommend using the Ensemble-Top12 model to predict reproductive toxicity in early drug development.
KW - Ensemble
KW - Machine learning
KW - Molecular fingerprint
KW - Prediction models
KW - Reproductive toxicity
UR - http://www.scopus.com/inward/record.url?scp=85099034638&partnerID=8YFLogxK
U2 - 10.1016/j.toxlet.2021.01.002
DO - 10.1016/j.toxlet.2021.01.002
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 33421549
AN - SCOPUS:85099034638
SN - 0378-4274
VL - 340
SP - 4
EP - 14
JO - Toxicology Letters
JF - Toxicology Letters
ER -