Early Detection of Anorexia in Blog Posts Written in English

  • Yaakov HaCohen-Kerner*
  • , Natan Manor
  • , Michael Goldmeier
  • , Eytan Bachar
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This study concentrates on identifying girls with anorexia nervosa through English social media text analysis. A dataset was created comprising 100 blog posts authored by females who have anorexia and another 100 posts written by females likely without an eating disorder. A psychology professor who is an international expert on anorexia confirmed the collected posts. We perform an in-depth series of experiments that utilize multiple sets of textual features, different text classification models, including 5 machine learning techniques, 10 basic text preprocessing methods, 2 feature filtering methods, and parameter optimization procedures. The best accuracy result of 91.73% was obtained by the random forest machine learning method using a combination of 16 feature sets derived by a heuristic process of combining feature sets and parameter tuning. This result is 4.48% higher than the baseline (87.25%). Among the 16 feature sets, 10 are content-based, containing features that, to one degree or another, describe anorexic girls. A relatively high number of feature sets (6 out of 16) were style-based, while two were sentiment-based. A notable recurring observation across various classification studies, including the present study, is that traditional machine learning techniques tend to outperform deep learning methods. We also present a comparison of the results and findings of this study in English and those of a similar study performed by us using a dataset in Hebrew.

Original languageEnglish
Article number20
JournalACM Transactions on Knowledge Discovery from Data
Volume20
Issue number2
DOIs
StatePublished - Feb 2026

Bibliographical note

Publisher Copyright:
© 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Keywords

  • Mental disorders
  • Natural language processing
  • Supervised machine learning
  • Text analysis
  • Text processing

Fingerprint

Dive into the research topics of 'Early Detection of Anorexia in Blog Posts Written in English'. Together they form a unique fingerprint.

Cite this