Identifying Motifs and Types in Folktales and Fairy Tales using Large Language Models
Abstract
Introduction
This research project is dedicated to exploring the social and political roles that fairy tales and folktales play across different societies. While the vast expanse of folktales across various cultures presented an insurmountable hurdle for a case-by-case analysis, the aim of this project is to train and employ large language models, to process and analyse large datasets of narrative texts efficiently, providing new insights into how these stories contribute to societal structures and how fairy tales function not only as entertainment but also as tools for nation-building, community bonding, discipline, and education [1]. The aim is to uncover both the commonalities and unique aspects of these narratives, by utilizing a comparative approach to examine variations of these tales, alongside structural and discursive analyses. Such approach underscores the importance of recognizing fairy tales and folktales as dynamic reflections of the communities that create and share them, rather than as mere relics of tradition.
Research goal and methods
The main objective is to devise an artificial intelligence (AI)-based methodology capable of conducting a comprehensive and efficient analysis of a vast corpus of folktales and fairy tales, primarily focusing on Slovenian folklore and its intercultural ties. To this end, we established a unified and accessible digital database of these narratives, with folktales classified according to their motifs. This collection will serve as the basis for training large language models to accurately detect motifs from various motif indexes. The proposed approach employs two distinct strategies for fine-tuning models, each tailored to the nature of the models' availability. The first strategy focuses on publicly available open-source models like LLaMa-2. For these models, we can perform direct fine-tuning, leveraging their open-source nature to modify and optimize their parameters specifically for our needs. For proprietary models such as GPT-4, direct fine-tuning is not feasible, instead, we will utilize in-context learning, chain-of-thought reasoning, and evolutionary computation. We will evaluate these models quantitatively through cross-validation and qualitatively through human error analysis. Additionally, we will design a user-friendly, open-source web application to allow folkloristic researchers and the public to detect types and motifs in texts.
State of the project
The project is still in the initial phase, with an established training dataset of folktales, classified according to their motifs and types. By developing and refining our AI-based methodologies, we aim to make significant contributions to the understanding of the dynamic roles that fairy tales and folktales play in shaping and reflecting societal values and structures, allowing us to see how similar stories have evolved differently in diverse cultural settings, providing insights into how specific elements are altered while the core motifs are preserved across languages [2].
References
[1] Zipes, The Irresistible Fairy Tale: The Cultural and Social History of a Genre. Princeton, NJ and Oxford: Princeton University Press, 2012.
[2] Abello, P. M. Broadwell, T. R. Tangherlini, and H. Zhang, “Disentangling the folklore hairball,” Fabula, vol. 64, no. 1–2, pp. 64–91, Jul. 2023. doi:10.1515/fabula-2023-0004