LLMs: Training Set of Explanations for Coreference Resolution Task


  • Lea Košmrlj University of Ljubljana
  • Tamara Podolski University of Ljubljana
  • Matic Šardi University of Ljubljana
  • Aleš Žagar University of Ljubljana



The Winograd Schema Challenge (WSC) task [1], a task of coreference resolution in natural language, was designed as a test of commonsense reasoning in machine learning models and proposed as an alternative to the Turing test. The task cannot be solved based on word distribution alone but requires both semantic and contextual understanding of the text, which poses a challenge for large language models (LLMs) and as such serves as a crucial evaluation task for assessing natural language understanding of modern LLMs.

An example of a WSC: The trophy doesn’t fit in the brown suitcase because it’s too small [1]. What does it refer to? Solving the coreference requires knowledge about physical object characteristics and object usage.


To tackle the WSC task, interconnected approaches incorporating additional knowledge into neural models and explainable artificial intelligence approaches have been developed. Injecting external knowledge can facilitate task-solving, while classification-based explanations allow models to specify which knowledge they employ, thereby improving LLM transparency.

A Slovene version of the WSC task is needed for testing the natural language understanding capabilities of Slovene models, since Slovene is a less-resourced language [2].


We are building a dataset which consists of 735 Slovene WSC examples requiring annotation based on knowledge classification, accompanied by semantic explanations justifying the chosen knowledge type for each solution. We have developed an ontology of background knowledge, which includes the laws of nature, spatio-temporal relations, biology, human psychology, cultural customs, social norms, etc. Textual explanations (e.g. Suitcases can only hold items smaller than themselves.) provide reasoning, thus contributing to a better understanding of the model’s predictions and improving methods for their training and evaluation.


The main milestones of the project included developing a classification scheme with knowledge for explaining WSC tasks, developing command prompts and utilizing LLMs like GPT-4 for machine-generated textual explanations. Manual labeling of the examples in the dataset is yet to follow.


The project aims to extend the Slovene version of the WSC task to train LLMs, enabling testing of advanced natural language understanding abilities of models in Slovene, and improving their explainability. This effort also aims to promote understanding and trust in AI systems, which are pivotal for their transparent, reliable, and ethical development. The proposed multi-level classification of explanations will be a novelty that we intend to present in a high-quality publication. The upgraded WSC dataset will be freely available on the repository of CLARIN.SI.


[1] H. Levesque, E. Davis, L. Morgenstern, "The Winograd schema challenge," in Thirteenth international conference on the principles of knowledge representation and reasoning. Rome, Italy, 2012, p. 552–561.

[2] M. Ulčar and M. Robnik-Šikonja, “Sequence-to-sequence pretraining for a less-resourced Slovenian language,” Frontiers in Artificial Intelligence, vol. 6, Mar. 2023. doi:10.3389/frai.2023.932519.