Does the visual system use natural experience to construct size invariant object representations?

TitleDoes the visual system use natural experience to construct size invariant object representations?
Publication TypeConference Proceedings
Year of Publication2010
AuthorsLi, N, DiCarlo, JJ
Conference NameComputation and Systems Neuroscience (COSYNE)
Date Published02/2010
Conference LocationSalt Lake City, Utah, USA

Object recognition is challenging because each object produces myriad retinal images. Responses of neurons at the top of the ventral visual stream (inferior temporal cortex, IT) exhibit object selectivity that is unaffected by the image changes. How do IT neurons attain this tolerance ("invariance")? One powerful idea is that temporal contiguity of natural visual experience can instruct tolerance (e.g. Foldiak, Neural Computation, 1991): because objects remain present for many seconds, whereas object or viewer motion cause changes in each object’s retinal image over shorter time intervals, the ventral stream could construct tolerance by learning to associate neuronal representations that occur closely in time. We recently found a neuronal signature of such learning in IT: temporally contiguous experience with different object images at different retinal positions can robustly reshape ("break") IT position tolerance, producing a tendency for IT neurons to confuse the identities of those temporally coupled objects across their manipulated positions (Li & DiCarlo, Science, 2008). A similar manipulation can induce the same pattern of confusion in the position tolerance of human object perception (Cox, Meier, Oertelt, DiCarlo. Nat Neurosci, 2005). Does this IT neuronal learning reflect a canonical unsupervised learning algorithm the ventral stream relies on to achieve tolerance to all types of image variation (e.g. object size and pose changes)? To begin to answer this question, we here extend our position tolerance paradigm to object size changes. Unsupervised non-human primates were exposed to an altered visual world in which we temporally coupled the experience of two object images of different sizes at each animal’s center of gaze: (e.g.) a small image of one object (P, neuronally preferred object) was consistently followed by a large image of a second object (N), rendering the small image of P temporally contiguous with the large image of N. We made IT neuronal selectivity measurements before and after the animals received ~2 hours of experience in the unsupervised, altered visual world. Consistent with our results on position tolerance, we found that this size experience manipulation robustly reshapes IT size tolerance over a period of hours. Specifically, unlike experienced controls, we found a change in neuronal selectivity (P-N) across the manipulated objects and their manipulated sizes, producing a tendency to confuse those object identities across those sizes. This change in size tolerance is specific to the manipulated objects, grew gradually stronger with increasing experience, and the rate of learning was similar to position tolerance learning (~5 spikes/s per hour of exposure). Finally, in a separate experiment, we examine how temporal direction of the experience affects the learning: do temporally-early images teach temporally-later ones, or vice-versa? We found greater learning for the temporally-later images, suggesting a Hebbian-like learning mechanism (e.g. Sprekeler & Gerstner, COSYNE, 2009; Wallis & Rolls, Prog Neurobiol, 1997). We speculate that these converging results on IT position and size tolerance plasticity reflect an underlying unsupervised cortical learning mechanism by which the ventral visual stream acquires and maintains its tolerant object representations.