Each object can cast an infinite number of different images on the retina and understanding how the brain tolerates this image variation is the key to solving object recognition. The responses of neurons at the top of the ventral visual stream (inferior temporal cortex, IT) exhibit tolerant object selectivity. How do IT neurons attain this tolerance (“invariance”)?
One powerful idea is that temporal contiguity of natural visual experience can instruct tolerance (Foldiak, 1991): because objects remain present for many seconds, whereas object or viewer motion cause changes in each object’s retinal image over shorter time intervals, the ventral stream could construct tolerance by learning to associate neuronal representations that occur closely in time. We recently found a neuronal signature of such learning in IT: temporally contiguous experience with different object images at different retinal positions can robustly reshape (“break”) IT position tolerance, producing a tendency to confuse the identities of temporally coupled objects across their manipulated positions (Li & DiCarlo 2008). A similar manipulation can induce the same pattern of confusion in the position tolerance of human object perception (Cox et al. 2005).
Does this IT neuronal learning reflect a canonical unsupervised learning algorithm the ventral stream relies on to achieve tolerance to all types of image variation (e.g. object size and pose changes)? To begin to answer this question, we here extend our previous position tolerance paradigm to object size changes. Non-human primates were exposed to an unsupervised, altered visual world in which we temporally coupled the experience of two object images of different sizes at each animal’s center of gaze: (e.g.) a small image of one object (P, neuronally preferred object) was consistently followed by a large image of a second object (N), rendering the small image of P temporally contiguous with the large image of N.
We found that this unsupervised experience manipulation robustly reshapes IT size tolerance over a period of hours. Specifically, unlike experienced controls, we found a change in neuronal selectivity (P-N) across the manipulated objects and their manipulated sizes, producing a tendency to confuse those object identities across those sizes. This change in size tolerance grew gradually stronger with increasing experience, and the rate of learning was similar to our previous work on position tolerance. We speculate that these converging results reflect an underlying canonical learning mechanism by which the ventral visual system acquires and maintains its tolerant object representations.