Development and Validation of a Paradigm for Eliciting Confabulation in Humans and LLMs
Abstract
Confabulation refers to the involuntary generation of false or distorted content perceived as memory, typically occurring when cognitive control and memory validation mechanisms fail. In humans, this is linked to deficits in monitoring and verification during memory retrieval [1]. In large language models (LLMs), similar confabulations emerge when models generate coherent but inaccurate responses in the absence of sufficient training data or contextual understanding [2]. While hallucinations are sensory experiences without external stimuli, confabulations involve the creation of false memories, often arising from disruptions in cognitive processing or the validation of recalled information, and are not based on sensory experiences.
This thesis proposes the development of a novel experimental paradigm designed to elicit confabulations in humans and to engage the cognitive processes responsible for suppressing them. Inspired by the Deese–Roediger–McDermott (DRM) paradigm [3], which is commonly used to study false memories through short-term associative word lists, the new design will move beyond short-term memory associations to probe long-term memory and metacognitive evaluation. Participants will be presented with ambiguous or misleading retrieval cues that require active verification and trigger processes such as source monitoring and confidence-based response selection. By identifying the conditions under which confabulations arise, this research aims to enhance the design of AI systems that better mimic human memory processes and provide insights into the cognitive mechanisms that protect against memory distortion.
The validity of the new paradigm will be tested in an empirical study with human participants. Metrics such as reaction times, accuracy, and cognitive load will be collected. Parallel tasks will be administered to LLMs using tailored prompts to probe analogous weaknesses. This dual-track approach enables direct comparison between biological and artificial systems in terms of error emergence, verification mechanisms, and susceptibility to confabulation.
The project integrates insights from cognitive psychology, AI research, and neuroscience. While human memory retrieval involves context-sensitive mechanisms that allow for flexible validation and correction of recalled content, current LLMs operate without such mechanisms, relying instead on patterns learned from textual data. This fundamental difference may contribute to their increased vulnerability to semantic errors, particularly in ambiguous or weakly specified contexts. Based on this, we hypothesize that LLMs may exhibit a higher tendency to confabulate when presented with less specific or more ambiguous prompts.
The resulting paradigm and comparative findings will contribute both a methodological tool for future research on memory distortions and a conceptual framework for evaluating cognitive alignment between humans and generative AI systems.
References
[1] A. Gilboa, and M. Moscovitch, “The Cognitive Neuroscience of Confabulation,“ Annual Review of Psychology, vol. 73, pp. 69–94, 2022.
[2] A. L. Smith, F. Greaves, and T. Panch, “Hallucination or confabulation? Neuroanatomy as metaphor in large language models,“ PLOS Digital Health, vol. 2, no. 11, e0000388, 2023.
[3] H. L. Roediger and K. B. McDermott, “Creating false memories: Remembering words not presented in lists,“ Journal of experimental psychology: Learning, Memory, and Cognition, vol. 21, no. 4, p. 803, 1995.
Published
Issue
Section
License
Copyright (c) 2025 Matic Šardi, Grega Repovš

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.