Scientific Understanding of Consciousness
Visual Cortex, Invariant Object Representation
Science 12 September 2008: Vol. 321. no. 5895, pp. 1502 – 1507
Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex
Nuo Li and James J. DiCarlo
McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
Object recognition is challenging because each object produces myriad retinal images. Responses of neurons from the inferior temporal cortex (IT) are selective to different objects, yet tolerant ("invariant") to changes in object position, scale, and pose. How does the brain construct this neuronal tolerance? We report a form of neuronal learning that suggests the underlying solution. Targeted alteration of the natural temporal contiguity of visual experience caused specific changes in IT position tolerance. This unsupervised temporal slowness learning (UTL) was substantial, increased with experience, and was significant in single IT neurons after just 1 hour. Together with previous theoretical work and human object perception experiments, we speculate that UTL may reflect the mechanism by which the visual stream builds and maintains tolerant object representations.
When presented with a visual image, primates can rapidly (<200 ms) recognize objects despite large variations in object position, scale, and pose. This ability likely derives from the responses of neurons at high levels of the primate ventral visual stream. But how are these powerful "invariant" neuronal object representations built by the visual system? On the basis of theoretical and behavioral work, one possibility is that tolerance ("invariance") is learned from the temporal contiguity of object features during natural visual experience, potentially in an unsupervised manner. Specifically, during natural visual experience, objects tend to remain present for seconds or longer, while object motion or viewer motion (e.g., eye movements) tends to cause rapid changes in the retinal image cast by each object over shorter time intervals (hundreds of ms). The ventral visual stream could construct a tolerant object representation by taking advantage of this natural tendency for temporally contiguous retinal images to belong to the same object. If this hypothesis is correct, it might be possible to uncover a neuronal signature of the underlying learning by using targeted alteration of those spatiotemporal statistics.
Our results show that targeted alteration of natural, unsupervised visual experience changes the position tolerance of IT neurons as predicted by the hypothesis that the brain uses a temporal contiguity learning strategy to build that tolerance in the first place. Several computational models show how such strategies can build tolerance, and such models can be implemented by means of Hebbian-like learning rules that are consistent with spike-timing–dependent plasticity. One can imagine IT neurons using almost temporally coincident activity to learn which sets of its afferents correspond to features of the same object at different positions. The time course and task independence of UTL are consistent with synaptic plasticity, but our data do not constrain the locus of plasticity, and changes at multiple levels of the ventral visual stream are likely.
We do not yet know if UTL reflects mechanisms than are necessary for building tolerant representations. But these same experience manipulations change the position tolerance of human object perception—producing a tendency to, for example, perceive one object to be the same identity as another object across a swap position. Moreover, given that the animals had a lifetime of visual experience to potentially build their IT position tolerance, the strength of UTL is substantial (5 spikes/s change per hour)—just 1 hour of UTL is comparable to attentional effect sizes and is more than double that observed in previous IT learning studies over much longer training intervals. We do not yet know how far we can extend this learning, but just 2 hours of (highly targeted) unsupervised experience begins to reverse the object preferences of IT neurons. This discovery reemphasizes the importance of plasticity in vision by showing that it extends to a bedrock property of the adult ventral visual stream—position-tolerant object selectivity.
(end of paraphrase)
Return to — Vision
Return to — Perceptual Categorization