Abstract
Estimation of exposure–response association is central to epidemiologic research. Although the advantages of machine learning (ML) techniques for modeling complex relationships are well-recognized, their use in epidemiologic studies are limited mainly because they do not provide direct estimates of associations, such as odds ratios (ORs). We suggest eight hybrid estimators of the OR that are functions of the output from a classifier, or their probability-calibrated form, multiplied by an adjustment factor that is ‘borrowed’ from logistic regression (LR). We also suggest two estimators based on partial dependence functions. We applied these estimators to output from LR, random forest (RF) and gradient boosting (GB) models for investigating associations between (1) temperature and respiratory or cardiovascular admissions and (2) prenatal exposure to temperature and overweight among infants. Most (87%) of the estimates produced by GB were within the LR 95% CI, but for RF the results were mixed: 0%, 60% and 13% of the estimates were within this CI for the Respiratory, Cardiovascular and Infants data, respectively. Additionally, GB-based CIs for the uncalibrated estimates were narrower by 13–59% compared to the LR CIs. These findings may enhance the integration between ML and epidemiologic research by providing interpretable results.
| Original language | English |
|---|---|
| Article number | 8922 |
| Journal | Scientific Reports |
| Volume | 16 |
| Issue number | 1 |
| DOIs | |
| State | Published - Dec 2026 |
Bibliographical note
Publisher Copyright:© The Author(s) 2026.
Keywords
- Artificial intelligence (AI)
- Calibration
- Environmental epidemiology
- ML interpretability
- Temperature
Fingerprint
Dive into the research topics of 'Estimating the odds ratio from the output scores of machine learning models: possibilities and limitations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver