Scientific Understanding of Consciousness |
Orbitofrontal Cortex Supports Learning Using Inferred But Not Cached Values
Science 16 November 2012: Vol. 338 no. 6109 pp. 953-956 Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values Joshua L. Jones, Guillem R. Esber, Michael A. McDannald, Aaron J. Gruber, Alex Hernandez, Aaron Mirenzi, Geoffrey Schoenbaum 1Department of Anatomy and Neurobiology, University of Maryland School of Medicine, 20 Penn Street, Baltimore, MD 21201, USA. 2Behavioral Neurophysiology Research Section, Cellular Neurobiology Research Branch, National Institute on Drug Abuse Intramural Research Program, 251 Bayview Boulevard, Baltimore, MD 21201, USA. 3Department of Neuroscience, University of Lethbridge, Lethbridge, Alberta T1K 3M4, Canada. [paraphrase] Computational and learning theory models propose that behavioral control reflects value that is both cached (computed and stored during previous experience) and inferred (estimated on the fly on the basis of knowledge of the causal structure of the environment). The latter is thought to depend on the orbitofrontal cortex. Yet some accounts propose that the orbitofrontal cortex contributes to behavior by signaling “economic” value, regardless of the associative basis of the information. We found that the orbitofrontal cortex is critical for both value-based behavior and learning when value must be inferred but not when a cached value is sufficient. The orbitofrontal cortex is thus fundamental for accessing model-based representations of the environment to compute value rather than for signaling value per se. Computational and learning theory accounts have converged on the idea that reward-related behavioral control reflects two types of information. The first is derived from habits, policies, or cached values. These terms reflect underlying associative structures that incorporate a precomputed value stored during previous experience with the relevant cues. Behaviors based on this sort of information are fast and efficient but do not take into account changes in the value of the expected reward. This type of information contrasts with the second category, referred to as goal-directed or model-based, in which the value is inferred from knowledge of the associative structure of the environment, including how to obtain the expected reward, its unique form and features, and current value. The associative model is stored, but a precomputed value is not. Rather, the value is computed or inferred on the fly when it is needed. Whereas behavior based on inferred value is slower, it can be more adaptive and flexible. The calculation of economic value is often assigned to the orbitofrontal cortex (OFC), a prefrontal area heavily implicated in value-guided behavior. Yet behavioral studies across species implicate this region broadly, not in value-guided decisions per se, but rather in behaviors that require a new value to be estimated after little or no direct experience. Further, the OFC is often involved in a behavior that depends on whether learning is required, even when that learning does not involve changes in value. These data seem to require the OFC to perform one function—anticipating outcomes, in some settings—whereas it performs another, calculating economic value, in others. However, an alternative hypothesis is that the OFC performs the same function in all settings and specifically contributes to value-guided behavior and learning when value must be inferred or derived from model-based representations. We tested this hypothesis in rats using sensory preconditioning and blocking. These findings demonstrate that the OFC is involved in value-based behavior when the value must be inferred from an associative model of the task but not when the same behavior can be based on a value cached or stored in cues during past experience. The OFC may only be necessary for economic decision-making insofar as the value required reflects inferences or judgments. Data implicating the OFC in the expression of transitive inference or willingness to pay may reflect such a function, because, in each setting, the revealed preferences are expressed after little or no experience with the imagined outcomes. Limited experience, a defining feature of economic decision-making, would minimize the contribution of cached values and so bias subjects to rely on model-based information for the values underlying their choices. [end of paraphrase]
Return to — Frontal LobesReturn to — Learning |