Evaluation of Machine Translated Riddle-Style Dataset into Slovenian Language – RiddleSense


  • Alja Šmauc University of Ljubljana



The topic of my project is about evaluating the quality and usefulness of the RiddleSense dataset (Lin et al., 2022), machine translated in Slovenian language. RiddleSense is a new multiple-choice question answering task, which includes a large dataset with 5700 examples of riddle-style questions and answers. The dataset requires complex commonsense skills for reasoning about creative and counterfactual questions (Lin et al., 2022). As the dataset exists only in English language, a machine translation can be used to translate it to other languages. An analysis of a small test set would be used to assess the usability of a machine translated dataset for this type of datasets.


We wrote the translation script in Pycharm, an integrated development environment, where the programing language Python was used. We used Pandas software library for data manipulation and analysis, which is important because it enables the input data in .jsonl format to be processed. The script first takes the dataset, where we translate the plain text with DeepL neural machine translator, and then we save the translated data on a hard drive in the same format and structure as in the input. At the end we evaluate a smaller test set, based on the percentage of human solvability.


As we looked into the translation it can be seen that some words are repeated, left out, not translated, translated out of the context, or have grammatical errors and the semantics can begin to differ with the original version. In some examples, where metaphors were originally used, the use of the metaphor is simply harder to detect in Slovenian, because the translation is too literal and the meaning is not clear anymore. Out of 20 translated examples, 14 can be easily solved by someone whose mother language is Slovenian. Based on these results, we expect the translated dataset to be harder to get answered at correctly as in the original version. An example of a more difficult to solve translation is ''Ko konj poboza macko, les zacne peti'', the answer is ''violina'' but it is hard to answer correctly because one rarely knows that violins, horses and cats have any correlation and in Slovenian it gets even harder. Another example of a word that was translated incorrectly is ''matches'', which has two meanings and it was translated with the wrong meaning. For further research, our translated dataset can be used for evaluating the reasoning of large pre-trained language models. These models have achieved the most promising results in the field of natural language processing, coming close to human performance.


[1] B. Lin, Z. Wu, Y. Yang, D. Lee and X. Ren, "RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge", ACL Anthology, 2022. [Online]. Available: https://aclanthology.org/2021.findings-acl.131. [Accessed: 09- May- 2022].