Ventral Tegmental Area — Reward and Punishment

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Nature 482, 85–88 (02 February 2012)

Neuron-type-specific signals for reward and punishment in the ventral tegmental area

Jeremiah Y. Cohen, Sebastian Haesler, Naoshige Uchida, Linh Vong & Bradford B. Lowell

Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA

Division of Endocrinology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts 02215, USA

[paraphrase]

Dopamine has a central role in motivation and reward. Dopaminergic neurons in the ventral tegmental area (VTA) signal the discrepancy between expected and actual rewards (i.e., reward prediction error); how they compute such signals is unknown. We recorded the activity of VTA neurons while mice associated different odour cues with appetitive and aversive outcomes. We found three types of neuron based on responses to odours and outcomes: approximately half of the neurons (type I, 52%) showed phasic excitation after reward-predicting odours and rewards in a manner consistent with reward prediction error coding; the other half of neurons showed persistent activity during the delay between odour and outcome that was modulated positively (type II, 31%) or negatively (type III, 18%) by the value of outcomes. Whereas the activity of type I neurons was sensitive to actual outcomes (i.e., when the reward was delivered as expected compared to when it was unexpectedly omitted), the activity of type II and type III neurons was determined predominantly by reward-predicting odours. We ‘tagged’ dopaminergic and GABAergic neurons with the light-sensitive protein channelrhodopsin-2 and identified them based on their responses to optical stimulation while recording. All identified dopaminergic neurons were of type I and all GABAergic neurons were of type II. These results show that VTA GABAergic neurons signal expected reward, a key variable for dopaminergic neurons to calculate reward prediction error.

Dopaminergic neurons fire phasically (100–500 ms) after unpredicted rewards or cues that predict reward. Their response to reward is reduced when a reward is fully predicted. Furthermore, their activity is suppressed when a predicted reward is omitted. From these observations, previous studies hypothesized that dopaminergic neurons signal discrepancies between expected and actual rewards (i.e., they compute reward prediction error (RPE)), but how dopaminergic neurons compute RPE is unknown.

Dopaminergic neurons make up about 55–65% of VTA neurons; the rest are mostly GABAergic inhibitory neurons. Many addictive drugs inhibit VTA GABAergic neurons, which increases dopamine release (called disinhibition), a potential mechanism for reinforcing the effects of these drugs. Despite the known role of VTA GABAergic neurons inhibiting dopaminergic neurons in vitro, little is known about their role in normal reward processing. One obstacle has been the difficulty of identifying different neuron types with extracellular recording techniques. Conventionally, spike waveforms and other firing properties have been used to identify presumed dopaminergic and GABAergic neurons, but this approach has been questioned recently. We thus aimed to observe how dopaminergic and GABAergic neurons process information about rewards and punishments.

We classically conditioned mice with different odour cues that predicted appetitive or aversive outcomes. The possible outcomes were big reward, small reward, nothing, or punishment (a puff of air delivered to the animal’s face). Each behavioural trial began with a conditioned stimulus (CS; an odour, 1 s), followed by a 1-s delay and an unconditioned stimulus (US; the outcome). Within the first two behavioural sessions, mice began licking towards the water-delivery tube in the delay before rewards arrived, indicating that they quickly learned the CS–US associations). The lick rate was significantly higher preceding big rewards than small ones (paired t-tests between lick rates for big versus small rewards for each session, P < 0.05 for each mouse).

We also found many neurons with firing patterns distinct from typical dopaminergic neurons. These neurons showed persistent excitation during the delay before rewards, in response to reward-predicting odours.

To classify these response profiles, we used principal component analysis (PCA) followed by unsupervised, hierarchical clustering. This yielded three clusters of neurons that were separated according to (1) the magnitude of activity during the delay between CS and US, and (2) the magnitude of responses to the CS or US.

To identify dopaminergic neurons, we expressed channelrhodopsin-2 (ChR2), a light-gated cation channel, in dopaminergic neurons.

Our data set of identified dopaminergic neurons allows us to characterize their diversity. We observed that some were excited by reward, some were excited by a reward-predicting CS, and some were excited by both.

VTA GABAergic neurons form synapses preferentially onto dendrites of dopaminergic neurons, whereas other inhibitory inputs form synapses onto their somata. Dendritic inhibition is thought to be weaker than somatic ‘shunting’ inhibition but appears well suited for deriving graded outputs by ‘arithmetically’ combining excitatory and inhibitory inputs.

A major effect of drugs of addiction is inhibition of VTA GABAergic neurons. VTA GABAergic neurons are involved in computation of RPE. Inhibition of GABAergic neurons by addictive drugs could lead to sustained RPE even after the learned effects of drug intake are well established, thereby resulting in sustained reinforcement of drug taking. Understanding local circuits in VTA in the context of learning theory may thus provide crucial insights into normal as well as abnormal functions of reward circuits.

[end of paraphrase]

Return to — Fear ---- Pleasure