Enhancing Statistical Learning and Cross-Modal Object Representation in Virtual Reality

Authors

  • Jasper McLaughlin University of Vienna

Abstract

Understanding how humans perceive and mentally represent objects is central to cognitive science, as object representation forms the basis for meaningful interactions with our environment. Historically, research on object representation has primarily emphasized segmentation based on explicit visual and haptic boundary cues. Conversely, Lengyel et al. [1] recently proposed that object representation can also emerge purely from statistical regularities, without explicit boundary cues. Using a visuo-haptic statistical learning paradigm, the researchers demonstrated that humans could segment abstract shapes into object-like representations based solely on statistical information within a single sensory modality, and these representations generalize across modalities. However, these findings were obtained using laboratory experiments conducted on two-dimensional interfaces. This project investigates whether the immersive, embodied nature of Virtual Reality (VR) enhances unimodal statistical learning and cross-modal object representations.

Methodology

Approximately 30 participants with VR headsets will participate remotely via the Ouvrai  framework [2]. The study comprises two sub-experiments assessing both directions of visuo-haptic statistical learning. Each sub-experiment features a unimodal statistical exposure phase and one familiarity test per modality. In the visual experiment, participants observe three shape pairs consistently co-occurring within a 3x3 grid. To enhance engagement, five grid cells are initially covered, and participants shoot at the cells where they anticipate shapes underneath. Participants subsequently complete a visual test and a haptic test, during which they pull shapes apart without explicit pairing cues using VR controllers. In the haptic experiment, participants initially explore true and pseudo pairs by pulling shapes apart, receiving haptic feedback when breaking object pairs. Critically, the force required to separate shapes depends on whether a true pair must be broken. Participants subsequently complete a second haptic test and a visual test.

Outcomes measured include shooting accuracy and the percentage of correct responses in visual familiarity tests. During haptic familiarity tests, we measure the generated pulling force and the correlation between pulling force and the force required to break shape pairs. Additionally, detailed continuous psychophysical [3] movement data will be recorded.

Expected Results and Discussion

We anticipate enhanced unimodal statistical learning and stronger cross-modal transfer effects compared to findings by [1]. Specifically, we expect a higher accuracy rate in visual tests and stronger correlations between generated pulling force and required breakage force in haptic tests. Limitations include the requirement for participants to own VR equipment and inherent differences between controller-based haptic feedback and real-world interactions. Despite these limitations, this project demonstrates a novel way to investigate the concept of object representation and obtains rich behavioral data – setting a foundation for powerful computational models of human perception.

References

[1] G. Lengyel et al., “Unimodal statistical learning produces multimodal object-like representations,”eLife, vol. 8, May 2019. doi: 10.7554/elife.43942.

[2] E. Cesanek, S. Shivkumar, J. N. Ingram, and D. M. Wolpert, “Ouvrai: Opening access to remote VR studies of human behavioral neuroscience,” bioRxiv (Cold Spring Harbor Laboratory), May 2023. doi: 10.1101/2023.05.23.542017.

[3] D. Straub and C. A. Rothkopf, “Putting perception into action with inverse optimal control for continuous psychophysics,” eLife, vol. 11, Sep. 2022. doi: 10.7554/elife.76635.

Published

2025-06-10