Skip to main navigation Skip to search Skip to main content

Estimating the odds ratio from the output scores of machine learning models: possibilities and limitations

Research output: Contribution to journalArticlepeer-review

Abstract

Estimation of exposure–response association is central to epidemiologic research. Although the advantages of machine learning (ML) techniques for modeling complex relationships are well-recognized, their use in epidemiologic studies are limited mainly because they do not provide direct estimates of associations, such as odds ratios (ORs). We suggest eight hybrid estimators of the OR that are functions of the output from a classifier, or their probability-calibrated form, multiplied by an adjustment factor that is ‘borrowed’ from logistic regression (LR). We also suggest two estimators based on partial dependence functions. We applied these estimators to output from LR, random forest (RF) and gradient boosting (GB) models for investigating associations between (1) temperature and respiratory or cardiovascular admissions and (2) prenatal exposure to temperature and overweight among infants. Most (87%) of the estimates produced by GB were within the LR 95% CI, but for RF the results were mixed: 0%, 60% and 13% of the estimates were within this CI for the Respiratory, Cardiovascular and Infants data, respectively. Additionally, GB-based CIs for the uncalibrated estimates were narrower by 13–59% compared to the LR CIs. These findings may enhance the integration between ML and epidemiologic research by providing interpretable results.

Original languageEnglish
Article number8922
JournalScientific Reports
Volume16
Issue number1
DOIs
StatePublished - Dec 2026

Bibliographical note

Publisher Copyright:
© The Author(s) 2026.

Keywords

  • Artificial intelligence (AI)
  • Calibration
  • Environmental epidemiology
  • ML interpretability
  • Temperature

Fingerprint

Dive into the research topics of 'Estimating the odds ratio from the output scores of machine learning models: possibilities and limitations'. Together they form a unique fingerprint.

Cite this