User-Oriented Grammar Error Explanation for Slovenian with Large Language Models

Authors

  • Nika Marija Rojc University of Ljubljana

Abstract

Grammar error correction for Slovenian remains an underexplored area in natural language processing, especially compared to high-resource languages. Existing systems often rely on detailed linguistic frameworks, which, while comprehensive, tend to be highly technical and difficult for students and educators to interpret effectively. In contrast, recent advancements in large language models (LLMs) have demonstrated that neural approaches can outperform traditional rule-based systems in grammar correction, particularly when tailored to specific linguistic contexts [1].

Our research focuses on adapting LLMs to improve grammar correction for Slovenian learners, not only by providing accurate corrections but also by generating simplified, pedagogically meaningful explanations.

The study will utilize the Šolar corpus [2] as the primary data source and follow a three-phase methodology: (1) development of a user-oriented ontology of linguistic errors by mapping the existing complex taxonomy into a more accessible structure suitable for learners; (2) adaptation of a language model to detect common error types and generate student-friendly feedback and explanations; and (3) evaluation of the system’s effectiveness by comparing model-generated corrections and explanations with those provided by teachers.

The Slovenian language presents unique challenges due to its morphological richness, especially in areas such as dual forms, agreement, and the case system. These features complicate both error classification and model training. To mitigate the scarcity of annotated training data, we may also explore the generation of synthetic learner data by systematically introducing typical grammatical errors into otherwise correct Slovenian sentences.

Preliminary work highlights two key challenges. First, the lack of standardized benchmarks for Slovenian grammar correction complicates systematic evaluation. Second, the morphological complexity of Slovenian may require novel modeling strategies that go beyond simple fine-tuning of existing LLMs.

The expected outcomes of this research include insights that could benefit the development of language technology for Slovenian and other morphologically rich, low-resource languages, supporting more effective language learning and teaching.

References

[1] Y. Wang, Y. Wang, J. Liu, and Z. Liu, “A comprehensive survey of grammar error correction,” arXiv:2005.06600 [cs.CL], May 2020. doi: 10.48550/arXiv.2005.06600

[2] Š. Arhar Holdt, P. Lavrič, R. Roblek, and T. Goli, “Kategorizacija učiteljskih popravkov: Smernice za označevanje korpusa Šolar,” ver. 1.1, Aug. 12, 2022. [Online]. 

Published

2025-06-10