Subjective or Objective? NLP-Based Model to Evaluate Emotional Tone Used in Mass Media

Authors

  • Dario Lamaj Comenius University Bratislava

Abstract

With the ever-growing influence of news on our lives, it has become more difficult than ever to distinguish between biased and unbiased information. The main responsibility of reporters is to present information as objectively as possible, although in reality it’s much more different. Natural Language Processing (NLP) has shown to be a very powerful tool to help with analysing language and it’s nuances. Therefore, we have tried to implement an NLP-based model for detecting the emotional tone of news articles as subjective or objective, which in turn might help the community to develop a better understanding of media biases and manipulation. 

We used a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model and the Subjectivity (SUBJ) dataset from the Natural Language Toolkit (NLTK) to fine-tune our model. We achieved a testing accuracy of 98% on the SUBJ dataset.

However powerful these tools may be, the biggest drawback is its usability on other languages, where available, pre-trained models are few or non-existent. According to Reporters without Borders (RSF), Albania has ranked the 96th in regards to press freedom for the year 2023 [2], which is the lowest ranking from all the Wester Balkans countries. The availability of annotated subjectivity datasets in the Albanian language is limited due to the lack of research in opinion mining. To overcome this challenge, we employed a machine translation approach similar to Přibáň & Steinberger [1]. Concretely, we utilised the "Helsinki-NLP/opus-mt-en-sq model" to translate English to Albanian, enabling us to annotate the SUBJ dataset with subjectivity labels.

In addition, we created a second database to analyse subjectivity in news articles. To construct this dataset, we performed web scraping on one of the most popular news websites in Albania, namely "Top Channel." The dataset comprises unique identifiers (ID), article titles, and the respective date and time of publication on the website.

Overall, our NLP-based approach provides an effective and efficient method for analysing the emotional tone of news articles. We believe this research can contribute to the development of more accurate and unbiased news media, which is essential for maintaining informed and democratic societies.

References

[1] P. Přibáň and J. Steinberger, "Czech Dataset for Cross-lingual Subjectivity Classification," in Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France: European Language Resources Association, 2022, pp. 1381-1391

[2] Reporters Without Borders, "Albania," https://rsf.org/en/country/albania (accessed May 10, 2023).

[3] H. Huo and M. Iwaihara, "Utilizing BERT pretrained models with various fine-tune methods for subjectivity detection," in Web and Big Data: 4th International Joint Conference, APWeb-WAIM 2020, Tianjin, China, September 18-20, Proceedings, Part II 4, Springer International Publishing, 2020, pp. 270-284.

Published

2023-06-05