Prefrontal Cortex Human Reasoning

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Prefrontal Cortex Human Reasoning

Science 27 June 2014: Vol. 344 no. 6191 pp. 1481-1486

Foundations of human reasoning in the prefrontal cortex

Maël Donoso, et.al.

INSERM, Laboratoire de Neurosciences Cognitives (U960), 29 rue d'Ulm, 75005 Paris, France.

Départment d'Etudes Cognitives (DEC), Ecole Normale Supérieure, 29 rue d'Ulm, 75005 Paris, France.

Centre de Neuro-imagerie de Recherche (CENIR), Université Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France.

Brown University, Providence, RI 02912, USA.

[paraphrase]

The prefrontal cortex (PFC) subserves reasoning in the service of adaptive behavior. Little is known, however, about the architecture of reasoning processes in the PFC. Using computational modeling and neuroimaging, we show here that the human PFC has two concurrent inferential tracks:

(i) one from ventromedial to dorsomedial PFC regions that makes probabilistic inferences about the reliability of the ongoing behavioral strategy and arbitrates between adjusting this strategy versus exploring new ones from long-term memory,

and (ii) another from polar to lateral PFC regions that makes probabilistic inferences about the reliability of two or three alternative strategies and arbitrates between exploring new strategies versus exploiting these alternative ones.

The two tracks interact and, along with the striatum, realize hypothesis testing for accepting versus rejecting newly created strategies.

Human reasoning subserves adaptive behavior and has evolved facing the uncertainty of everyday environments. In such situations, probabilistic inferential processes (i.e., Bayesian inferences) make optimal use of available information for making decisions. Human reasoning involves Bayesian inferences accounting for human responses that often deviate from formal logic. Bayesian inferences also operate in the prefrontal cortex (PFC) and guide behavioral choices. Everyday environments, however, are changing and open-ended, so that the range of uncertain situations and associated behavioral strategies (i.e., internal maps linking stimuli, actions, and expected outcomes) becomes potentially infinite. In such environments, probabilistic inferences involve Dirichlet process mixtures and rapidly yield intractable computations. This computational complexity problem constitutes a fundamental constraint on the evolution of higher cognitive functions and raises the issue of the actual nature of inferential processes implemented in the PFC.

A model of reasoning processes in the human PFC

To address this issue, we proposed a model that describes human reasoning, as it guides behavior, as a computationally tractable, online algorithm approximating Dirichlet process mixtures. The algorithm combines forward Bayesian inferences operating over a few concurrent behavioral strategies stored in long-term memory with hypothesis testing for possibly updating this inferential buffer with new strategies formed from long-term memory. The algorithm notably serves to arbitrate between (i) staying with the ongoing behavioral strategy and possibly learning external contingencies, (ii) switching to other learned strategies, and (iii) forming new behavioral strategies.

For integrating online Bayesian inferences and hypothesis testing, the algorithm’s key feature is inferring the absolute reliability of every monitored strategy: namely, the posterior probability that the current situation matches the situation the strategy has learned, given both action outcomes (and possibly contextual cues), and the possibility that no match occurs with any monitored strategies. To estimate these probabilities, the model assumes that, in the latter case, action outcomes expected from the monitored strategies are equiprobable. Thus, every monitored strategy may appear as being either reliable (i.e., more likely matching than not matching the current situation) or unreliable (the converse). When a strategy is reliable, the others are necessarily unreliable, so that the algorithm is an exploitation state. The reliable strategy is the actor, namely, the unique strategy for selecting and learning the actions that maximize rewards (typically through reinforcement learning), whereas the other monitored strategies are treated as counterfactual. When all monitored strategies become unreliable, the algorithm then switches into an exploration state corresponding to hypothesis testing: A new strategy is formed as a weighted mixture of strategies stored in long-term memory, then probed and monitored as actor. If the strategy is a priori unreliable, this probe actor learns, so that the algorithm may subsequently return to the exploitation state in two ways. Either one counterfactual strategy becomes reliable, while the probe actor remains unreliable: The former is then retrieved as actor, and the latter is rejected (disbanded). Or the probe actor becomes reliable, while counterfactual strategies remain unreliable. The probe actor is then confirmed: It remains the actor, the new strategy is simply consolidated into long-term memory, and the repertoire of stored strategies is expanded. In case the inferential buffer has further reached its capacity limit, the counterfactual strategy used the least recently as actor is then discarded from the buffer (but remains stored in long-term memory).

Consistent with the capacity limit of human working memory, human decisions are best predicted when the inferential buffer is limited to two or three concurrent counterfactual strategies (8). We then hypothesized that the human PFC implements this algorithm. We expected anterior PFC regions to form the inferential buffer and more posterior PFC regions in association with basal ganglia to drive actor learning, selection, and creation on the basis of hypothesis testing. The model predicts that anterior PFC regions concurrently infer the absolute reliability of actor and counterfactual strategies that the algorithm builds online. More posterior PFC regions then detect when, in the inferential buffer, actor strategies become unreliable for creating probe actors, as well as when counterfactual strategies become reliable for retrieving them as actor (and possibly rejecting probe actors). In basal ganglia, the ventral striatum subserves reinforcement learning and is predicted to detect when, in the inferential buffer, probe actors become reliable for confirming them in long-term memory.

To test these predictions, we used functional magnetic resonance imaging (fMRI) and scanned 40 healthy participants, while they were responding to successively presented digits and searching for three-digit combinations by trial and error.

Prefrontal foundations of human reasoning

The predicted algorithmic transitions associated with hypothesis testing and accounting for participants’ behavior occurred within the frontal lobes in the expected PFC and striatal regions. Moreover, the anterior PFC encoded the predicted absolute reliability signals associated with the concurrent behavioral strategies the algorithm creates, learns, tests, and retrieves for driving action. These results support the hypothesis that the proposed algorithm describes reasoning PFC processes guiding adaptive behavior. Accordingly, the frontal lobes implement two concurrent inferential tracks. First, a medial track comprising the vmPFC-pgACC, dACC, and ventral striatum makes inferences about the actor strategy that, through reinforcement learning, selects and learns the actions maximizing reward. Whereas the vmPFC-pgACC infers the actor’s absolute reliability, the dACC detects when it becomes unreliable for triggering exploration—i.e., the formation of a new strategy from long-term memory to serve as actor. The ventral striatum then detects when this new actor strategy becomes reliable, which terminates exploration and confirms it in long-term memory. Second, a lateral track comprising the FPC and mid-LPC makes inferences about two or three alternative strategies stored in long-term memory. Whereas the FPC concurrently infers the absolute reliability of these counterfactual strategies from action outcomes, the mid-LPC detects when one becomes reliable for retrieving it as actor.

This medial-lateral segregation stems from the model core notion of absolute reliability, which yields to distinguishing between switching away from ongoing behavior (the actor becomes unreliable) versus switching to another behavioral strategy stored in long-term memory (one counterfactual strategy becomes reliable). In this protocol, the two events never coincided, which would have required alternating between only two recurrent situations associated with two distinct strategies (the actor unreliability then implies the reliability of the alternative strategy). The dACC thus triggers switching away from ongoing behavior with the formation of new behavioral strategies, whereas the mid-LPC enables the switch to counterfactual strategies. The model may thus explain dACC activations observed in detecting unexpected action outcomes, switching to exploratory behaviors and starting new behavioral tasks, and LPC activations in retrieving task sets. Consistent with the model prediction, moreover, the dACC and mid-LPC coactivate when participants switch back and forth between only two alternative behaviors.

The model further indicates that the coupling between the medial and lateral track realizes hypothesis testing bearing upon new behavioral strategies created from long-term memory. Serving as a probe actor initially set as being unreliable, newly created strategies are disbanded when the mid-LPC detects that one counterfactual strategy has become reliable for retrieving it as actor. However, the ventral striatum adjusts probe actors to external contingencies through reinforcement learning and detects when probe actors eventually become reliable. In that event, the ventral striatum confirms probe actors in long-term memory as additional, subsequently recoverable, strategies. The interplay between the dACC, mid-LPC, and ventral striatum thus controls switches in and out of exploration periods corresponding to hypothesis testing of newly created strategies. Accordingly, every decision to create new strategies may be subsequently revised according to new information, which is critical in optimal adaptive processes operating in open-ended environments for dealing with the intrinsic nonparametric nature of strategy creation.

Hypothesis testing derives from inferences about the absolute reliability of actor and two or three counterfactual strategies, which involved the vmPFC-pgACC and FPC, respectively. The dissociation supports the distinction between the notion of actor and a counterfactual strategy and accords with the vmPFC-pgACC and FPC involvement in monitoring ongoing and unchosen courses of action, respectively. Strategy absolute reliability measures to which extent the strategy is applicable to the current situation—i.e., current external contingencies and those learned by the strategy result from the same latent cause. The vmPFC-pgACC thus infers to which extent the latent cause determining current action outcomes remains unchanged. The FPC infers to which extent the latter result from two or three previously identified latent causes. Latent causes are abstract constructs resulting from hypothesis testing implemented through the interplay between the dACC, mid-LPC, and ventral striatum. Latent causes organize long-term memory as a repertoire of behavioral strategies treated as separable entities. By detecting the reliability or unreliability of monitored strategies, the dACC, mid-LPC, and ventral striatum then appear to implement true or false exclusive judgments about possible causes of observed contingencies for selecting appropriate behavioral strategies. The model thus describes how the PFC forms a unified inferential system subserving reasoning in the service of adaptive behavior. Among the prefrontal regions, the FPC is likely specific to humans, which suggests that the ability to jointly infer multiple possible causes of observed contingencies and, consequently, to test new causal hypotheses emerging from long-term memory is unique to humans.

[end of paraphrase]

Return to — Bayesian Inference in Brain Functionality

Return to — Prefrontal Cortex