Abstract
This study concentrates on identifying girls with anorexia nervosa through English social media text analysis. A dataset was created comprising 100 blog posts authored by females who have anorexia and another 100 posts written by females likely without an eating disorder. A psychology professor who is an international expert on anorexia confirmed the collected posts. We perform an in-depth series of experiments that utilize multiple sets of textual features, different text classification models, including 5 machine learning techniques, 10 basic text preprocessing methods, 2 feature filtering methods, and parameter optimization procedures. The best accuracy result of 91.73% was obtained by the random forest machine learning method using a combination of 16 feature sets derived by a heuristic process of combining feature sets and parameter tuning. This result is 4.48% higher than the baseline (87.25%). Among the 16 feature sets, 10 are content-based, containing features that, to one degree or another, describe anorexic girls. A relatively high number of feature sets (6 out of 16) were style-based, while two were sentiment-based. A notable recurring observation across various classification studies, including the present study, is that traditional machine learning techniques tend to outperform deep learning methods. We also present a comparison of the results and findings of this study in English and those of a similar study performed by us using a dataset in Hebrew.
| Original language | English |
|---|---|
| Article number | 20 |
| Journal | ACM Transactions on Knowledge Discovery from Data |
| Volume | 20 |
| Issue number | 2 |
| DOIs | |
| State | Published - Feb 2026 |
Bibliographical note
Publisher Copyright:© 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Keywords
- Mental disorders
- Natural language processing
- Supervised machine learning
- Text analysis
- Text processing
Fingerprint
Dive into the research topics of 'Early Detection of Anorexia in Blog Posts Written in English'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver