Dopamine Ramps in Striatum Motivate Distant Rewards

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Dopamine Ramps in Striatum Motivate Distant Rewards

Nature 500,575–579(29 August 2013)

Prolonged dopamine signalling in striatum signals proximity and value of distant rewards

McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

Mark W. Howe, Patrick L. Tierney & Ann M. Graybiel

Department of Psychiatry and Behavioral Sciences, and Department of Pharmacology, University of Washington, Seattle, Washington 98195, USA

Stefan G. Sandberg & Paul E. M. Phillips

[paraphrase]

Predictions about future rewarding events have a powerful influence on behaviour. The phasic spike activity of dopamine-containing neurons, and corresponding dopamine transients in the striatum, are thought to underlie these predictions, encoding positive and negative reward prediction errors. However, many behaviours are directed towards distant goals, for which transient signals may fail to provide sustained drive. Here we report an extended mode of reward-predictive dopamine signalling in the striatum that emerged as rats moved towards distant goals. These dopamine signals, which were detected with fast-scan cyclic voltammetry (FSCV), gradually increased or—in rare instances—decreased as the animals navigated mazes to reach remote rewards, rather than having phasic or steady tonic profiles. These dopamine increases (ramps) scaled flexibly with both the distance and size of the rewards. During learning, these dopamine signals showed spatial preferences for goals in different locations and readily changed in magnitude to reflect changing values of the distant rewards. Such prolonged dopamine signalling could provide sustained motivational drive, a control mechanism that may be important for normal behaviour and that can be impaired in a range of neurologic and neuropsychiatric disorders.

The spike activity patterns of midbrain dopamine-containing neurons signal unexpected and salient cues and outcomes, and the dynamics of these phasic neural signals have been found to follow closely the principles of reinforcement learning theory. In accordance with this view, selective genetic manipulation of the phasic firing of dopamine neurons alters some forms of learning and cue-guided movements. Episodes of transient dopamine release in the ventral striatum have been detected with FSCV, and these also occur in response to primary rewards and, after learning, to cues predicting upcoming rewards. Thus, dopamine transients in the striatum share many features of the phasic spike activity of midbrain dopamine neurons.

Classic studies of such dopamine transients have focused on Pavlovian and instrumental lever-press tasks, in which rewards were within arm’s reach. However, in many real-life situations, animals must move over large distances to reach their goals. These behaviours require ongoing motivational levels to be adjusted flexibly according to changing environmental conditions. The importance of such control of ongoing motivation is reflected in the severe impairments suffered in dopamine deficiency disorders, including Parkinson’s disease. In addition, in pioneering experimental studies, dopamine signalling has been implicated in controlling levels of effort, vigour and motivation during the pursuit of goals in maze tasks. It has been unclear how phasic dopamine signalling alone could account for persistent motivational states. We adapted chronic FSCV to enable prolonged measurement of real-time striatal dopamine release as animals learned to navigate towards spatially distant rewards.

We measured dopamine levels in the dorsolateral striatum (DLS) and ventromedial striatum (VMS) as rats navigated mazes of different sizes and shapes to retrieve rewards. The rats were trained first on an associative T-maze task to run and to turn right or left as instructed by tones to receive a chocolate milk reward at the indicated end-arms (n = 9). Unexpectedly, instead of mainly finding isolated dopamine transients at the initial cue or at goal-reaching, we primarily found gradual increases in the dopamine signals that began at the onset of the trial and ended after goal-reaching. These ‘ramping’ dopamine responses, identified in session averages by linear regression (Pearson’s R > 0.5, P < 0.01), were most common in the VMS (75% of sessions) but were also present at DLS recording sites (42% of sessions).

Classic studies of dopamine neuron firing and striatal dopamine release have largely focused on transient responses associated with unpredicted rewards and reward-predictive cues. Here we demonstrate that, in addition to such transient dopamine responses, prolonged dopamine release in the striatum can occur, changing slowly as animals approach distant rewards during spatial navigation. These dopamine signals seem to represent the relative spatial proximity of valued goals, perhaps reflecting reward expectation. It remains unclear whether these signals represent goal proximity on the basis of environmental cues, effort, or internally scaled estimates of distance. However, the brain possesses mechanisms for representing both allocentric spatial context and relative distance from landmarks, which could, in principle, be integrated with dopaminergic signalling to produce such extended dopamine signals.

Transient dopaminergic responses to learned reward-predictive cues have been proposed to initiate motivated behaviours, but with this mode of signalling alone, it is difficult to account for how dopamine acts to maintain and direct motivational resources during prolonged behaviours. The ramping dopamine signals that we describe here, providing continuous estimates of how close rewards are to being reached, and weighted by the relative values of the rewards when options are available, seem ideally suited to maintain and direct such extended energy and motivation.

[end of paraphrase]

Nature 500, 533–535 (29 August 2013)

Dopamine ramps up

Yael Niv

Princeton Neuroscience Institute and the Psychology Department, Princeton University, Princeton, New Jersey 08544, USA.

[paraphrase]

Dopamine is a molecule that is broadcast throughout the brain and is involved in processes ranging from decision-making to schizophrenia, as well as most forms of addiction. The authors measured levels of dopamine in the striatum of rats while the animals ran through mazes for food rewards. The striatum is the area that contains the highest dopamine concentration in the brain. It is involved in action selection at all levels, from choosing which limb to move to selecting a goal to work towards. In a series of elegant experiments, Howe et al. established that dopamine concentration gradually ramps up as rats run towards a reward, and that the slope of the ramps relates to the amount of anticipated reward and the effort required to obtain it.

Why are these dopamine ramps so relevant? Dopamine-secreting (dopaminergic) neurons are special because they are thought to fire in unison, broadcasting a single all-important message widely. As such, and because dopamine is implicated in a bewildering number of disorders, neuroscientists have long been keen to understand the dopamine signal in the intact brain and how it can be restored when things go wrong. These efforts have led to a well-established theory, which may unfortunately be at odds with these dopamine ramps.

The central theory of dopaminergic firing comes from theoretical neuroscientists who noticed that the firing patterns of dopaminergic neurons bear an eerie resemblance to a key component in computational algorithms of reinforcement learning called a reward prediction error. Reward prediction errors quantify 'surprise' — the difference between the rewards we expect and those we get in reality.

Imagine you drink coffee routinely. One day, while shopping, you find coffee beans you particularly like, which were not previously available in your town. According to the theory, and verified experimentally in laboratory settings, your dopaminergic neurons will fire to signal a positive reward prediction error due to the increase in your future expected (coffee) rewards. Such bursts of dopamine release affect learning in the striatum8, strengthening actions that preceded a positive prediction error — you will now be more likely to return to this shop. The flip side, a negative prediction error, occurs when a reward is below expectation — for instance, if you sip your coffee and find that the milk has gone sour.

To the brain, new information that causes you to change your predictions and an actual reward that is at odds with your predictions are equivalent. In both cases, bursts or pauses in dopaminergic firing will notify the brain of the prediction error. Consequently, through dopamine-dependent learning, future predictions will become more accurate, and actions that led to better-than-expected outcomes will become more common.

The prediction-error theory is compelling because it is normative — it explains the role of dopamine within a prescriptive framework of how one should adapt behaviour to earn more rewards. However, it also has shortcomings. For instance, it fails to address dopamine's effects on action vigour: Parkinson's disease, caused by the death of dopaminergic neurons, results in slowing down of all actions, and the nickname for amphetamine (which mimics elevated dopamine levels) is 'speed'.

Luckily, a straightforward extension of the theory fills this gap, suggesting that the background concentration of dopamine (termed tonic dopamine, to differentiate it from the phasic bursts and pauses that signal prediction errors) indicates the overall rate of rewards. Thus, tonic levels of dopamine quantify the cost of time, and should affect how much time we spend on each action. According to this theory, phasic and tonic dopaminergic signalling convey distinct but related information, affecting learning and vigour, respectively.

[end of paraphrase]

Return to — Motivation