Robotic Action Planning via Self-Supervised Trajectory Modeling
Abstract
Introduction
In robotics, action planning refers to a commonly researched problem of generating a sequence of actions that will lead a robotic agent from an initial state to the desired final state, thus completing some manipulation task. While this problem has been extensively studied within the reinforcement learning paradigm, recent advances (e.g., [1]) demonstrate that planning can be performed by supervised sequence modeling – i.e., a trajectory (a sequence of intermediate states and actions transitioning them) leading to the task completion can be generated by a sequence model trained on previously observed trajectories. Such a fully supervised approach, however, has several limitations: it requires a large number of observed trajectories to train thoroughly, and the model does not learn based on whether the trajectories can successfully reach a goal, but tries to reproduce observed trajectories.
In the present research, we build on this approach and propose a cognitively inspired self-supervised training scheme for a trajectory model (TM) performing simple action planning for a robotic arm, eliminating the mentioned drawbacks.
Methods
In our scheme, we implement the TM as a recurrent neural network, receiving an initial and final state as input and outputting a sequence of intermediate states, which lead to the final state and task completion. Subsequently, we leverage a pre-trained inverse (IM) and forward model (FM) [2] to post-process and evaluate this preliminary trajectory.
As the implementations of their cognitive analogs [3], the pre-trained FM and IM capture the knowledge about the environment in which the task execution occurs since the FM and IM were pre-trained on numerous observations of the environment's behavior. The FM and IM then process a generated preliminary trajectory and produce a rectified version according to their expectation of the realistic environment's behavior. If the preliminary trajectory is unrealistic, its rectified version will significantly differ, and thus, it is possible to use this difference as an error signal for the TM to guide its learning.
Preliminary Results
The feasibility of the proposed system was tested on a simple task involving movement planning of the robotic arm's end-effector from one point in space to a randomly sampled final point. The results indicate that the trained TM is capable of producing a completely novel trajectory, accurately arriving at the desired final state in most cases, avoiding the limitations mentioned in the Introduction. This implies that the presented self-supervised method is feasible for simple action planning, and its extension to more complex manipulation problems should be researched.
References
[1] M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” in Advances in NIPS, vol. 34, pp. 1273–1286, 2021.
[2] M. Cibula, M. Kerzel, and I. Farkaš, “Learning low-level causal relations using a simulated robotic arm,” in Artificial Neural Networks and Machine Learning – ICANN 2024, 2024, pp. 285–298. doi: 10.1007/978-3-031-72359-9_21.
[3] D. M. Wolpert and M. Kawato, “Multiple paired forward and inverse models for motor control,” Neural Networks, vol. 11, no. 7–8, pp. 1317–1329, 1998. doi: 10.1016/s0893-6080(98)00066-5.
Published
Issue
Section
License
Copyright (c) 2025 Miroslav Cibula

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.