Bayesian Models in the Mind

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Bayesian Models in the Mind

Science 11 March 2011: Vol. 331 no. 6022 pp. 1279-1285

How to Grow a Mind: Statistics, Structure, and Abstraction

Joshua B. Tenenbaum¹, Charles Kemp², Thomas L. Griffiths³, and Noah D. Goodman⁴

¹Department of Brain and Cognitive Sciences, Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.

²Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

³Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA.

⁴Department of Psychology, Stanford University, Stanford, CA 94305, USA.

(paraphrase)

In coming to understand the world—in learning concepts, acquiring language, and grasping causal relations—our minds make inferences that appear to go far beyond the data available. How do we do it? This review describes recent approaches to reverse-engineering human learning and cognitive development and, in parallel, engineering more humanlike machine learning systems. Computational models that perform probabilistic inference over hierarchies of flexibly structured representations can address some of the deepest questions about the nature and origins of human thought: How does abstract knowledge guide learning and reasoning from sparse data? What forms does our knowledge take, across different domains and tasks? And how is that abstract knowledge itself acquired?

The Challenge: How Does the Mind Get So Much from So Little?

For scientists studying how humans come to understand their world, the central challenge is this: How do our minds get so much from so little? We build rich causal models, make strong generalizations, and construct powerful abstractions, whereas the input data are sparse, noisy, and ambiguous—in every way far too limited. A massive mismatch looms between the information coming in through our senses and the outputs of cognition.

Consider the situation of a child learning the meanings of words. Any parent knows, and scientists have confirmed, that typical 2-year-olds can learn how to use a new word such as “horse” or “hairbrush” from seeing just a few examples. We know they grasp the meaning, not just the sound, because they generalize: They use the word appropriately (if not always perfectly) in new situations. Viewed as a computation on sensory input data, this is a remarkable feat. Within the infinite landscape of all possible objects, there is an infinite but still highly constrained subset that can be called “horses” and another for “hairbrushes.” How does a child grasp the boundaries of these subsets from seeing just one or a few examples of each? Adults face the challenge of learning entirely novel object concepts less often, but they can be just as good at it.

Cognitive Development

Generalization from sparse data is central in learning many aspects of language, such as syntactic constructions or morphological rules. It presents most starkly in causal learning: Every statistics class teaches that correlation does not imply causation, yet children routinely infer causal links from just a handful of events, far too small a sample to compute even a reliable correlation! Perhaps the deepest accomplishment of cognitive development is the construction of larger-scale systems of knowledge: intuitive theories of physics, psychology, or biology or rule systems for social structure or moral judgment. Building these systems takes years, much longer than learning a single new word or concept, but on this scale too the final product of learning far outstrips the data observed.

Philosophers have inquired into these puzzles for over two thousand years, most famously as “the problem of induction,” from Plato and Aristotle through Hume, Whewell, and Mill to Carnap, Quine, Goodman, and others in the 20th century. Only recently have these questions become accessible to science and engineering by viewing inductive learning as a species of computational problems and the human mind as a natural computer evolved for solving them.

The proposed solutions are, in broad strokes, just what philosophers since Plato have suggested. If the mind goes beyond the data given, another source of information must make up the difference. Some more abstract background knowledge must generate and delimit the hypotheses learners consider, or meaningful generalization would be impossible. Psychologists and linguists speak of “constraints;” machine learning and artificial intelligence researchers, “inductive bias;” statisticians, “priors.”

This article reviews recent models of human learning and cognitive development arising at the intersection of these fields. What has come to be known as the “Bayesian” or “probabilistic” approach to reverse-engineering the mind has been heavily influenced by the engineering successes of Bayesian artificial intelligence and machine learning over the past two decades and, in return, has begun to inspire more powerful and more humanlike approaches to machine learning.

Broad Application of Bayesian Principles

Over the past decade, many aspects of higher-level cognition have been illuminated by the mathematics of Bayesian statistics: our sense of similarity, representativeness, and randomness; coincidences as a cue to hidden causes; judgments of causal strength and evidential support; diagnostic and conditional reasoning; and predictions about the future of everyday events.

Bayesian Inference in Human Cognition

The claim that human minds learn and reason according to Bayesian principles is not a claim that the mind can implement any Bayesian inference. Only those inductive computations that the mind is designed to perform well, where biology has had time and cause to engineer effective and efficient mechanisms, are likely to be understood in Bayesian terms. In addition to the general cognitive abilities just mentioned, Bayesian analyses have shed light on many specific cognitive capacities and modules that result from rapid, reliable, unconscious processing, including perception, language, memory, and sensorimotor systems. In contrast, in tasks that require explicit conscious manipulations of probabilities as numerical quantities—a recent cultural invention that few people become fluent with, and only then after sophisticated training—judgments can be notoriously biased away from Bayesian norms.

At heart, Bayes’s rule is simply a tool for answering question 1: How does abstract knowledge guide inference from incomplete data? Abstract knowledge is encoded in a probabilistic generative model, a kind of mental model that describes the causal processes in the world giving rise to the learner’s observations as well as unobserved or latent variables that support effective prediction and action if the learner can infer their hidden state. Generative models must be probabilistic to handle the learner’s uncertainty about the true states of latent variables and the true causal processes at work. A generative model is abstract in two senses: It describes not only the specific situation at hand, but also a broader class of situations over which learning should generalize, and it captures in parsimonious form the essential world structure that causes learners’ observations and makes generalization possible.

Bayesian inference gives a rational framework for updating beliefs about latent variables in generative models given observed data. Background knowledge is encoded through a constrained space of hypotheses H about possible values for the latent variables, candidate world structures that could explain the observed data. Finer-grained knowledge comes in the “prior probability” P(h), the learner’s degree of belief in a specific hypothesis h prior to (or independent of) the observations. Bayes’s rule updates priors to “posterior probabilities” P(h|d) conditional on the observed data.

The posterior probability is proportional to the product of the prior probability and the likelihood P(d|h), measuring how expected the data are under hypothesis h, relative to all other hypotheses h′ in H.

Conclusions

We have outlined an approach to understanding cognition and its origins in terms of Bayesian inference over richly structured, hierarchical generative models. Although we are far from a complete understanding of how human minds work and develop, the Bayesian approach brings us closer in several ways. First is the promise of a unifying mathematical language for framing cognition as the solution to inductive problems and building principled quantitative models of thought with a minimum of free parameters and ad hoc assumptions. Deeper is a framework for understanding why the mind works the way it does, in terms of rational inference adapted to the structure of real-world environments, and what the mind knows about the world, in terms of abstract schemas and intuitive theories revealed only indirectly through how they constrain generalizations.

Most importantly, the Bayesian approach lets us move beyond classic either-or dichotomies that have long shaped and limited debates in cognitive science. Powerful abstractions can be learned surprisingly quickly, together with or prior to learning the more concrete knowledge they constrain. Structured symbolic representations need not be rigid, static, hard-wired, or brittle. Embedded in a probabilistic framework, they can grow dynamically and robustly in response to the sparse, noisy data of experience.

(end of paraphrase)