Learned Helplessness Simulation Using Reinforcement Learning Algorithms
According to WHO, about 3.8% of the world’s population is affected by major depressive disorder (MDD), however, the mechanism underlying this serious mental illness remains unsolved. A significant amount of evidence testifies to the impairment of the decision-making process in people and animals suffering from depression . That is, the inability to select actions that maximize reward and minimize punishment leads to the loss of behavioural control and decrease in responsiveness to reward. That, in turn, may result in such a phenomenon as learned helplessness which is one of the key concepts of MDD . Neuroscientific experiments determined the monoamine structures that might play a key role in modulation of the action selection process and learning in the basal ganglia, which are the major regions of reinforcement learning in the brain. It is suggested that disturbance of these structures may induce depression. In order to investigate how neuromodulators might contribute to such conditions as learned helplessness, computational models are generated. One of the most renowned methods is Reinforcement Learning (RL) that provides a framework in which goal- directed behaviours can be understood . The goal of this project is to simulate a RL agent’s behaviour exhibiting learned helplessness by manipulation of the parameters representing four major neurotransmitters.Methods
The simulation is based on classical experiments with rats which The simulation is based on classical experiments with rats which are exposed to inescapable shocks. However, in our environment, the goal of a rat is to go through a maze to get to the final state with a maximum reward. While progressing through the maze, the rat is exposed to a series of inescapable shocks, which frequency and magnitude are the parameters of the environment. We use the computational theory which proposes the role of neuromodulators as metaparameters in RL algorithms . Dopamine activity represents the Temporal Difference error, which predicts a long-term future reward and select an action. Serotonin controls how far an agent looks to predict a reward. Noradrenaline controls randomness of action selection, and acetylcholine modulates the balance between the storage and update of memory of state and action values .Expected Results
By manipulation of the metaparameters, we expect to simulate in the agent a learned helplessness behaviour when instead of escaping from the punishment the agent chooses not to select any actions. We also aim to analyse how neuromodulators induce learned helplessness.References
 K. Doya, “Metalearning and neuromodulation”, Neural networks, vol. 5, no. 4-6, pp. 495 – 506, 2002.
 Y. Niv, “Reinforcement learning in the brain”, Journal of Mathematical Psychology, vol. 53, no. 3, pp. 139 – 154, 2009.
 Q. Huys, J. Vogelstein and P. Dayan, “Psychiatry: Insights into depression through normative decision-making models”, Advances in neural information processing systems, vol.21, 2008.