Classification of Forum Questions About Depression with Machine Learning


  • Anja Stiplovšek Terglav University of Ljubljana



A strong link between language and thought processes reveals how mental issues impact the ways in which we express ourselves. Depression, as one of the most disabling mental illnesses, can alter mood, cognition and linguistic style – making language a potential diagnostic and therapeutic tool.

Absolutist thinking, characteristic of depression, is conveyed by phrases which denote totality [1][2]. Individuals affected by depression more frequently use first person singular pronouns, negative emotion words and verbs in past tense [2]. These linguistic markers have been shown significant for diagnosing depression [3].

Individuals who suffer from depression often use social media platforms to discuss their problems, get information and help. Machine learning models hold potential for detecting depression based on a large collection of text data that is generated on the internet.


For classification we will use questions posted on a Slovenian youth forum “To sem jaz”, which are already labelled by mental health experts. They continuously monitor incoming questions and offer advice.

We want to design a model, which will help classify the questions that need immediate attention (i.e. questions about depression, suicide and self-harm).


The questions will be filtered based on their public accessibility, their length and label. We will use CLASSLA pipeline to process non-standard Slovenian language – to break down pieces of text into smaller units, extract the root of the words and tag their classes (e.g. noun, verb). We will use different features such as Bag of words, N-grams and TF-IDF to construct different statistical models (e.g. support vector machines, naïve Bayes), which are most frequently used in studies with similar problems.

Based on how well we will be able to explain and interpret model predictions, on their evaluation metrics and in comparison with results of other similar studies, we will choose our final model.


Our final model will be able to classify questions about depression, which will be helpful to forum editorial board in making early and fast interventions for those who need it the most.


[1] M. Al-Mosaiwi and T. Johnstone, “In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation”, Clinical Psychological Science, vol. 6, no. 4, pp 529–542, 2018.

[2] M. Trotzek, S. Koitka and C. M. Friedrich, “Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences”.  IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 3, pp 588-601, 2020.

[3] D. Smirnova, “Language phenomenon in the diagnostic criteria of mild depression”, European Neuropsychopharmacology, vol. 23, no. 2, pp 354–355, 2013.