Automated annotation of disease subtypes

Dan Ofer, Michal Linial*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Distinguishing diseases into distinct subtypes is crucial for study and effective treatment strategies. The Open Targets Platform (OT) integrates biomedical, genetic, and biochemical datasets to empower disease ontologies, classifications, and potential gene targets. Nevertheless, many disease annotations are incomplete, requiring laborious expert medical input. This challenge is especially pronounced for rare and orphan diseases, where resources are scarce. Methods: We present a machine learning approach to identifying diseases with potential subtypes, using the approximately 23,000 diseases documented in OT. We derive novel features for predicting diseases with subtypes using direct evidence. Machine learning models were applied to analyze feature importance and evaluate predictive performance for discovering both known and novel disease subtypes. Results: Our model achieves a high (89.4%) ROC AUC (Area Under the Receiver Operating Characteristic Curve) in identifying known disease subtypes. We integrated pre-trained deep-learning language models and showed their benefits. Moreover, we identify 515 disease candidates predicted to possess previously unannotated subtypes. Conclusions: Our models can partition diseases into distinct subtypes. This methodology enables a robust, scalable approach for improving knowledge-based annotations and a comprehensive assessment of disease ontology tiers. Our candidates are attractive targets for further study and personalized medicine, potentially aiding in the unveiling of new therapeutic indications for sought-after targets.

Original languageAmerican English
Article number104650
JournalJournal of Biomedical Informatics
Volume154
DOIs
StatePublished - Jun 2024

Bibliographical note

Publisher Copyright:
© 2024

Keywords

  • Disease ontology
  • Disease subtypes
  • Explainability
  • Machine learning
  • Medical language models
  • Ontology completion
  • Open Targets
  • Orphanet
  • Personalized medicine

Fingerprint

Dive into the research topics of 'Automated annotation of disease subtypes'. Together they form a unique fingerprint.

Cite this