Detecting Suicide-Related Content from Bereaved Individuals on Twitter – A Machine Learning Approach


  • Anja Huber University of Vienna



Research suggests that suicide rates are associated with what and how individuals, news agencies and non-governmental organizations write about the topic of suicide on social media. The main idea of research in this field is that the right tone and content when writing about suicide may positively influence how people cope with their suicidal thoughts and may help to prevent an exacerbation of the individual situation and suicidal thoughts. To analyze these effects, it is necessary to understand what type of social media content relates to an increased or decreased suicide rate. The computational social science lab of Austria established a novel approach employing a machine learning algorithm that categorizes Twitter tweets related to suicide in an automated and efficient way. Currently, the running algorithm can distinguish six suicide related tweet categories. For this project, two already identified categories of interest were selected and a new algorithm that classifies posts of these categories was trained. The new categories should assign tweets from bereaved individuals (individuals who experienced loss of a close relation by suicide) that are either written in a positive tone indicating successful coping, or a negative tone indicating suffering, into the ‘bereaved coping’ or ‘bereaved negative’ group.


The analyzed data came from U.S. Twitter users and was posted between 2013 and 2020. In a first step, a high-level keyword search was employed to select data entries from a large-scale data set of twitter-posts to identify tweets that were supposedly posted by bereaved individuals. The result was a dataset of approximately 400 datapoints of bereaved stories with a new identified third tweet category of ‘neutral bereaved’. We then used descriptive methods comprising word clouds and word frequency analysis to better understand a possible distinction of the three bereaved categories.


The word frequency statistics, as well as the rather poor performance of the trained classifier have shown that all three categories are very similar and difficult to distinguish from one and each other. Both, for humans and the machine learning model. The latter could not achieve sufficient accuracy in prediction to be used in practice. For this reason, it is recommended to train an algorithm solely to distinguish bereaved tweets from other suicide related tweets, but not to perform an inner group classification. The suggested algorithm could then be used to classify a vast number of tweets in large-scale studies about the association between social media content and suicide rates.


[1] H. Metzler, H. Baginski, T. Niederkrotenthaler, and D. Garcia, “Detecting potentially harmful and protective suicide-related content on twitter: A machine learning approach,” arXiv:2112.04796 [cs], Dec. 2021, Accessed: Dec. 10, 2021. [Online]. Available: