Visual Cortex, Invariant Object Representation

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Visual Cortex, Invariant Object Representation

Science 12 September 2008: Vol. 321. no. 5895, pp. 1502 – 1507

Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex

Nuo Li and James J. DiCarlo

McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA

(paraphrase)

Object recognition is challenging because each object produces myriad retinal images. Responses of neurons from the inferior temporal cortex (IT) are selective to different objects, yet tolerant ("invariant") to changes in object position, scale, and pose. How does the brain construct this neuronal tolerance? We report a form of neuronal learning that suggests the underlying solution. Targeted alteration of the natural temporal contiguity of visual experience caused specific changes in IT position tolerance. This unsupervised temporal slowness learning (UTL) was substantial, increased with experience, and was significant in single IT neurons after just 1 hour. Together with previous theoretical work and human object perception experiments, we speculate that UTL may reflect the mechanism by which the visual stream builds and maintains tolerant object representations.

When presented with a visual image, primates can rapidly (<200ms) recognize objects despite large variations in object position,scale, and pose. This ability likely derives from theresponses of neurons at high levels of the primate ventral visualstream. But how are these powerful "invariant" neuronalobject representations built by the visual system? On the basisof theoretical and behavioral work, onepossibility is that tolerance ("invariance") is learned fromthe temporal contiguity of object features during natural visualexperience, potentially in an unsupervised manner. Specifically,during natural visual experience, objects tend to remain present for seconds or longer, while object motion or viewer motion(e.g., eye movements) tends to cause rapid changes in the retinalimage cast by each object over shorter time intervals (hundredsof ms). The ventral visual stream could construct a tolerantobject representation by taking advantage of this natural tendencyfor temporally contiguous retinal images to belong to the sameobject. If this hypothesis is correct, it might be possibleto uncover a neuronal signature of the underlying learning byusing targeted alteration of those spatiotemporal statistics.

Our results show that targeted alteration of natural, unsupervisedvisual experience changes the position tolerance of IT neuronsas predicted by the hypothesis that the brain uses a temporalcontiguity learning strategy to build that tolerance in thefirst place. Several computational models show how such strategiescan build tolerance, and such models can be implementedby means of Hebbian-like learning rules that are consistentwith spike-timing–dependent plasticity. One can imagineIT neurons using almost temporally coincident activity to learnwhich sets of its afferents correspond to features of the sameobject at different positions. The time course and task independenceof UTL are consistent with synaptic plasticity, butour data do not constrain the locus of plasticity, and changesat multiple levels of the ventral visual stream are likely.

We do not yet know if UTL reflects mechanisms than are necessaryfor building tolerant representations. But these same experiencemanipulations change the position tolerance of human objectperception—producing a tendency to, for example, perceiveone object to be the same identity as another object acrossa swap position. Moreover, given that the animals had alifetime of visual experience to potentially build their ITposition tolerance, the strength of UTL is substantial (5 spikes/schange per hour)—just 1 hour of UTL is comparable to attentionaleffect sizes and is more than double that observed in previousIT learning studies over much longer training intervals.We do not yet know how far we can extend this learning, butjust 2 hours of (highly targeted) unsupervised experience beginsto reverse the object preferences of IT neurons. Thisdiscovery reemphasizes the importance of plasticity in vision by showing that it extends toa bedrock property of the adult ventral visual stream—position-tolerantobject selectivity.

(end of paraphrase)

Visual Cortex, Invariant Object Representation

Return to — Vision

Return to — Perceptual Categorization