Detection of Anorexic Girls-In Blog Posts Written in Hebrew Using a Combined Heuristic AI and NLP Method

Yaakov Hacohen-Kerner*, Natan Manor, Michael Goldmeier, Eytan Bachar

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this study, we aim to detect in social media texts written in Hebrew girls who are suspected of being anorexic. We constructed a dataset containing 100 blog posts written by females who are probably anorexic, and 100 blog posts written by females who are likely to be non-anorexic. The construction of this dataset was supervised and approved by an international expert on anorexia. We tested several text classification (TC) methods, using various feature sets (content-based and style-based), five machine learning (ML) methods, three RNN models, four BERT models, three basic preprocessing methods, three feature filtering methods, and parameter tuning. Several insights were found as follows. A set of 50-word n-grams (mostly word unigrams) given by an expert was found as a good basic detector. A heuristic process based on the random forest ML method has overcome a combinatorial explosion and led to significant improvement over a baseline result at a level of text{P},{=}.01. Application of an iterative process that tests combinations of 'k out of text{n}' ' where text{n}',{ < } n (n is the number of feature sets) lead to a result of 90.63%, using a combination of 300 features from ten feature sets.

Original languageEnglish
Pages (from-to)34800-34814
Number of pages15
JournalIEEE Access
Volume10
DOIs
StatePublished - 2022

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Mental disorders
  • natural language processing
  • supervised machine learning
  • text analysis
  • text classification
  • text processing

Fingerprint

Dive into the research topics of 'Detection of Anorexic Girls-In Blog Posts Written in Hebrew Using a Combined Heuristic AI and NLP Method'. Together they form a unique fingerprint.

Cite this