Project

Title

State representation in reward based learning -- from spiking neuron models to psychophysics

English title State representation in reward based learning -- from spiking neuron models to psychophysics
Applicant Gerstner Wulfram
Number 122697
Funding scheme Sinergia
Research institution EPFL - IC - ISIM - LCN
Institution of higher education EPF Lausanne - EPFL
Disciplines Information Sciences
Start/End 01.01.2009 - 31.08.2012
Approved amount 1'481'614.00

Keywords (11)

computational neuroscience, spiking neurons, learning, reinforcement learning, decision making, perceptual learning, psychophysics, memory, behavior, neurons, synaptic plasticity

Lay Summary (English)

Lead
Lay summary
Human and animals learn by changing the strength of connections between neurons. Suppose we have to learn to navigate through a complex maze - a labyrinth often found in the gardens of old castles. In this case we need to learn that at the first bifurcation we need to turn left, at the second bifurcation right, at the crossing a sharp left-turn and so on.
Suppose now that different locations correspond to different `place' neurons and turning left and right to two other populations of neurons.
If we strengthen the connection between the neurons coding for the first bifurcation, and those cells coding for left-turn, then the combination `turn left at first bifurcation' becomes more likely.
We study in this project in models whether there is an optimal way of changing the connections upon a succesful action - and how this could be implemented in biologically plausible neuronal models.
Direct link to Lay Summary

Responsible applicant and co-applicants

Employees

Publications

Publication
Different types of feedback change decision criterion and sensitivity differently in perceptual learning
Aberg KC, Herzog MH (2012), Different types of feedback change decision criterion and sensitivity differently in perceptual learning, in JOURNAL OF VISION, 12(3), 1-11.
Perceptual learning of motion discrimination by mental imagery
Tartaglia EM, Bamert L, Herzog MH, Mast FW (2012), Perceptual learning of motion discrimination by mental imagery, in JOURNAL OF VISION, 12(6), 1-10.
Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams
Kompella VR, Luciw M, Schmidhuber J (2012), Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams, in NEURAL COMPUTATION, 24(11), 2994-3024.
Spike-based Decision Learning of Nash Equilibria in Two-Player Games
Senn Walter, Friedrich J. (2012), Spike-based Decision Learning of Nash Equilibria in Two-Player Games, in PLoS Comput Biol., 8(9), e1002691-e1002691.
Personality traits in rats predict vulnerability and resilience to developing stress-induced depression-like behaviors, HPA axis hyper-reactivity and brain changes in pERK1/2 activity
Castro JE, Diessler S, Varea E, Marquez C, Larsen MH, Cordero MI, Sandi C (2012), Personality traits in rats predict vulnerability and resilience to developing stress-induced depression-like behaviors, HPA axis hyper-reactivity and brain changes in pERK1/2 activity, in PSYCHONEUROENDOCRINOLOGY, 37(8), 1209-1223.
Perceptual learning, roving and the unsupervised bias
Herzog MH, Aberg KC, Fremaux N, Gerstner W, Sprekeler H (2012), Perceptual learning, roving and the unsupervised bias, in VISION RESEARCH, 61, 95-99.
About similar characteristics of visual perceptual learning and LTP
Aberg KC, Herzog MH (2012), About similar characteristics of visual perceptual learning and LTP, in VISION RESEARCH, 61, 100-106.
Vulnerability of conditional NCAM-deficient mice to develop stress-induced behavioral alterations
Bisaz R, Sandi C (2012), Vulnerability of conditional NCAM-deficient mice to develop stress-induced behavioral alterations, in STRESS-THE INTERNATIONAL JOURNAL ON THE BIOLOGY OF STRESS, 15(2), 195-206.
Paradoxical Evidence Integration in Rapid Decision Processes
Ruter J, Marcille N, Sprekeler H, Gerstner W, Herzog MH (2012), Paradoxical Evidence Integration in Rapid Decision Processes, in PLOS COMPUTATIONAL BIOLOGY, 8(2), 1-10.
Gradient estimation in dendritic reinforcement learning
Schiess Mathieu, Urbanczik Robert, Senn Walter (2012), Gradient estimation in dendritic reinforcement learning, in The Journal of Mathematical Neuroscience, 2(2), 1-19.
Intrinsically Motivated NeuroEvolution for Vision-Based Reinforcement Learning
Cuccu G, Luciw M, Schmidhuber J, Gomez F (2011), Intrinsically Motivated NeuroEvolution for Vision-Based Reinforcement Learning, in 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 1-7.
Variational Learning for Recurrent Spiking Networks
Jimenez Rezende Danilo, Wierstra Daan, Gerstner Wulfram (2011), Variational Learning for Recurrent Spiking Networks, in NIPS 2011 Proceedings, 1-9.
Evidence for a Role of Oxytocin Receptors in the Long-Term Establishment of Dominance Hierarchies
Timmer M, Cordero MI, Sevelinges Y, Sandi C (2011), Evidence for a Role of Oxytocin Receptors in the Long-Term Establishment of Dominance Hierarchies, in NEUROPSYCHOPHARMACOLOGY, 36(11), 2349-2356.
Social memories in rodents: Methods, mechanisms and modulation by stress
van der Kooij MA, Sandi C. (2011), Social memories in rodents: Methods, mechanisms and modulation by stress, in Neurosci Biobehav Rev, 36(7), 1762-1773.
A Peptide Mimetic Targeting Trans-Homophilic NCAM Binding Sites Promotes Spatial Learning and Neural Plasticity in the Hippocampus
Kraev I, Henneberger C, Rossetti C, Conboy L, Kohler LB, Fantin M, Jennings A, Venero C, Popov V, Rusakov D, Stewart MG, Bock E, Berezin V, Sandi C (2011), A Peptide Mimetic Targeting Trans-Homophilic NCAM Binding Sites Promotes Spatial Learning and Neural Plasticity in the Hippocampus, in PLOS ONE, 6(8), 1-13.
Neural mechanisms and computations underlying stress effects on learning and memory
Luksys G, Sandi C (2011), Neural mechanisms and computations underlying stress effects on learning and memory, in CURRENT OPINION IN NEUROBIOLOGY, 21(3), 502-508.
Spatio-Temporal Credit Assignment in Neuronal Population Learning
Friedrich J, Urbanczik R, Senn W (2011), Spatio-Temporal Credit Assignment in Neuronal Population Learning, in PLOS COMPUTATIONAL BIOLOGY, 7(6), 1-13.
Glucocorticoids act on glutamatergic pathways to affect memory processes
Sandi C (2011), Glucocorticoids act on glutamatergic pathways to affect memory processes, in TRENDS IN NEUROSCIENCES, 34(4), 165-176.
Slow Feature Analysis
Wiskott Laurenz, Berkes Pietro, Franzius Mathias, Sprekeler Henning, Wilbert Niko (2011), Slow Feature Analysis, in Scholarpedia, 6(4), 5282-5282.
Stress during Adolescence Increases Novelty Seeking and Risk-Taking Behavior in Male and Female Rats
Toledo-Rodriguez M., Sandi C. (2011), Stress during Adolescence Increases Novelty Seeking and Risk-Taking Behavior in Male and Female Rats, in Front Behav Neurosci, 5(17), 1-10.
Does Perceptual Learning Suffer from Retrograde Interference?
Aberg KC, Herzog MH (2010), Does Perceptual Learning Suffer from Retrograde Interference?, in PLOS ONE, 5(12), 1-6.
Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity
Fremaux N, Sprekeler H, Gerstner W (2010), Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity, in JOURNAL OF NEUROSCIENCE, 30(40), 13326-13337.
Learning under stress: the inverted-U-shape function revisited
Salehi B., Cordero MI., Sandi C. (2010), Learning under stress: the inverted-U-shape function revisited, in Learn Mem, 17(10), 522-530.
Learning Spike-Based Population Codes by Reward and Population Feedback
Friedrich J, Urbanczik R, Senn W (2010), Learning Spike-Based Population Codes by Reward and Population Feedback, in NEURAL COMPUTATION, 22(7), 1698-1717.
Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
Vasilaki E, Fremaux N, Urbanczik R, Senn W, Gerstner W (2009), Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail, in PLOS COMPUTATIONAL BIOLOGY, 5(12), 1-17.
Code-Specific Policy-Gradient Rules for Spiking Neurons
Sprekeler Henning, Hennequin Guillaume, Gerstner Wulfram (2009), Code-Specific Policy-Gradient Rules for Spiking Neurons, in Advances in Neural Information Processing Systems , 22, 1741-1749.
Interleaving bisection stimuli - randomly or in sequence - does not disrupt perceptual learning, it just makes it more difficult
Aberg KC, Herzog MH (2009), Interleaving bisection stimuli - randomly or in sequence - does not disrupt perceptual learning, it just makes it more difficult, in VISION RESEARCH, 49(21), 2591-2598.
Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning
Luksys G, Gerstner W, Sandi C (2009), Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning, in NATURE NEUROSCIENCE, 12(9), 1180-1180.
Modeling perceptual learning: Why mice do not play backgammon
Tartaglia E., Aberg K.C, Herzog M.H. (2009), Modeling perceptual learning: Why mice do not play backgammon, in Learning & Perception, 1(1), 155-163.
Reinforcement learning in populations of spiking neurons
Urbanczik R, Senn W (2009), Reinforcement learning in populations of spiking neurons, in NATURE NEUROSCIENCE, 12(3), 250-252.
Code-specific synaptic plasticity improves learning
Kneissler J., Urbankczik R., Senn W. (accepted), Code-specific synaptic plasticity improves learning, in The Journal of Neuroscience.
Human learning in non-Markovian decision making
Clarke J., Friedrich A., Senn W., Tartaglia E., Mechesotti S., Herzog M. (accepted), Human learning in non-Markovian decision making, in PLoS Computation Biology.

Associated projects

Number Title Start Funding scheme
108102 The role of the neural cell adhesion molecule in stress-induced cognitive and neural disturbances 01.04.2005 Projektförderung (Abt. I-III)
113364 Theory and Practice of Reinforcement Learning 01.02.2007 Projektförderung (Abt. I-III)
147636 Learning from delayed and sparse feedback 01.12.2013 Sinergia
135710 Stress and the Social Brain: The role of neuropeptides and synapse-specific neuroplasticity molecules 01.04.2011 Projektförderung (Abt. I-III)
133094 Dendritic pointers and time multiplexing as cortical binding mechanisms 01.05.2011 Projektförderung (Abt. I-III)
114404 Top-down and bottom-up processes in perceptual learning 01.01.2007 ProDoc (Forschungsmodul, FM)
145004 In vivo fast-scan cyclic voltammetry detection of neurotransmitters: A focus on dopamine 01.01.2013 R'EQUIP
133853 A phenogenomic approach to identify novel determinants of mitochondrial function 01.10.2011 R'EQUIP
117975 Coding Characteristics of Neuron Models 01.10.2007 Projektförderung (Abt. I-III)

Abstract

Reward-based learning encompasses a broad class of algorithms in the field of machine learning that allow to optimize the behavior of an agent (e.g. of a real or simulated robot) so as to maximize the total expected reward. These algorithms describe learning in machines that is reminiscent of learning in animals or humans as studied in animal behavior (e.g. conditioning) or human psychophysics. Learning in humans or animals in turn is thought to be related to changes in synaptic connections between neurons in the brain. Hence the question arises whether models of synaptic plasticity on the level of spiking neurons can be connected to formal `reinforcement' learning models in machine learning and to human psychophysics and animal behavior. This project combines the expertise from two laboratories in computational neuroscience (EPFL-LCN/Wulfram Gerstner and Univ. Berne/Walter Senn) who have both previously worked on spike-based models of synaptic plasticity, with the machine learning expertise of the Schmidhuber group at IDSIA (Lugano) who have a long-standing track record in formal models of reinforcement learning, with the psychophysics laboratory of Michael Herzog (EPFL-LPSY) who has a long tradition in human vision and perceptual learning, and with the rodent behavior expertise of Carmen Sandi (EPFL-BMI).

About