Modeling of Early Development of Cooperation in a Robotic Simulator

Authors

  • Dario Lamaj Comenius University Bratislava

Abstract

The ability to cooperate is what put humans on top of the animal kingdom. Even before developing the ability to speak, infants show behaviours such as joint attention, turn taking and helping, which are the first steps to form complex social interactions. By understanding how these abilities develop has been a key concern of cognitive sciences as well as robotics and artificial intelligence. Early cooperation between caregivers and infants emerges well before language of advanced theory-of-mind abilities. Such collaborative behaviour (turn taking, response contingency and intention recognition) instead rests on simple sequence learning and predictive processes mechanisms [1], [2]. In this thesis we will test the hypothesis that a robotic system can acquire and deploy these skills to engage in joint tasks, only by observing human demonstrations. Our work will adopt a grounded, sensorimotor approach inspired by infant learning. In a simulated block world, a robot watches a human stack coloured blocks and builds internal models of the demonstrated sequences. These models serve two purposes, recognition and prediction and collaborative completion. The robot is expected not only to learn these sequences but to act collaboratively by continuing a task initiated by a human partner.

Our modular cognitive architecture integrates deep-learning visual models for object and action recognition, mirror neuron inspired mappings that translate perceived human actions into the robot's motor commands, reinforcement learning visuomotor policies for precise block manipulation and RNN-based sequence encoders that underpin predictive and completion capabilities. The robot first learns to recognise objects and actions in its environment. It then observes repeated demonstrations of tasks and is trained to predict subsequent steps. Through this, it develops internal representations that enable it to detect incomplete sequences. In cooperative scenarios, the human begins a task and pauses, and the robot must infer the intended outcome and complete the task appropriately.

We hope the results will demonstrate that the robot can learn to reproduce observed action sequences and can take over incomplete tasks initiated by a human. Importantly, response-contingent behaviour—where the robot acts when the human pauses—emerges from predictive sequence learning alone. These findings will support the central hypothesis: early cooperative behaviours can arise from general-purpose cognitive mechanisms without requiring explicit mental state modelling. This research contributes both to our understanding of early social cognition and to the design of more intuitive human-robot interaction systems. It suggests that relatively simple learning architectures, grounded in perception and action, are sufficient for enabling basic collaboration. Future work will expand the system’s cognitive capabilities and validate the model in a physical setting.

References

[1] C. Heyes, "Who knows? Metacognitive social learning strategies," Trends in Cognitive Sciences, vol. 20, no. 3, pp. 204–213, 2006. doi: 10.1016/j.tics.2015.12.007.

[2] M. Gratier, E. Devouche, B. Guellai, R. Infanti, E. Yilmaz, and E. Parlato-Oliveira, "Early development of turn-taking in vocal interaction between mothers and infants," Front. Psychol., vol. 6, 1167, 2015. doi: 10.3389/fpsyg.2015.01167.

Published

2025-06-10