Predicting Coreferences with Large Language Models


  • Marko Kunavar University of Ljubljana


The purpose of our project is to build a machine learning model that can accurately determine coreferences in Slovenian texts for tasks within the Winograd Schema Challenge (WSC) and WinoWhy. WSC was presented in 2012 by Hector Levesque as an upgrade and alternative to the Turing test from 1950. It consists of a set of tasks where the predictive model must correctly identify coreferences at the same rate as a human. The challenge is made up of pairs of sentences that differ by only one word, which has a different coreference in each sentence. This means that the model cannot infer the reference solely from the sentence structure. A model that successfully solves the WSC could be argued to be capable of processing natural language meanings and common-sense reasoning to some extent. WinoWhy is a set of tasks that, in addition to identifying the correct referent, require the solver to provide a meaningful explanation for their decision, which is intended to provide insight into the model's common-sense thinking abilities [1]. Explaining the decisions made by machine learning models is also important from ethical and legal perspectives; with the development of generative language models and artificial intelligence, the need for their regulation and control is growing. Increasing transparency in the operation of various UI models can increase trust in them and  thus accelerate their development. Due to the complexity of deep neural networks, we do not have insight into what is happening with certain information within them and why the network produces a particular output. With WinoWhy and similar tests, we are also exploring the possibilities of the network explaining its own decisions.

Our project is in the early stages; we are currently preparing a Slovenian training set consisting of WSC and WinoWhy tasks and their solutions, on which our language model will later learn to predict and provide reasons for its decisions. The tasks have been translated into Slovenian from a database [2], and we will add explanations to them.


[1] H. Zhang, X. Zhao, and Y. Song, “Winowhy: A deep diagnosis of essential commonsense knowledge for answering Winograd Schema Challenge,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. doi:10.18653/v1/2020.acl-main.508

[2] E. Davis, L. Morgenstern, and C. Ortiz , “The Winograd Schema Challenge,” Collection of winograd schemas, (accessed May 10, 2023).