|Title||Comparing novel object learning in humans, models, and monkeys|
|Publication Type||Journal Article|
|Year of Publication||2019|
|Authors||Lee, MJ, DiCarlo, JJ|
|Journal||Journal of Vision|
Humans readily learn to identify novel objects, and it has been hypothesized that plasticity in visual cortex supports this behavior. Contributing to this view are reports of experience-driven changes in the properties of neurons at many levels of visual cortex, from V1 to inferotemporal cortex (IT). Here, we ask if object learning might instead be explained by a simple model in which a static set of IT-like visual features is followed by a perceptron learner. Specifically, we measured human (268 subjects; 170,000+ trials) and nonhuman primate (NHP; 2 subjects, 300,000+ trials) behavior across a battery of 29 visuomotor association tasks that each required the subject to learn to discriminate between a pair of synthetically-generated, never-before-seen 3D objects (58 distinct objects). Objects were rendered at varying scales, positions, and rotations; superimposed on naturalistic backgrounds; and presented for 200 msec. We then approximated the visual system’s IT response to each image using models of ventral stream processing (i.e. specific deep neural networks trained on ImageNet categorization), and we applied a reward-based, perceptron learner to the static set of features produced at the penultimate layer of each model. We report that our model is sufficient to explain both human and NHP rates of learning on these tasks. Additionally, we show humans, NHPs, and this model share the same pattern of performance over objects, but that NHPs reach criterion performance ~10× as slowly as humans (human t = 139, NHP t = 1149), suggesting humans have similar but more rapid learning mechanisms than their NHP cousins in this domain. Taken together, these results suggest the possibility that object learning is mediated by plasticity in a small population of “readout” neurons that learn and execute weighted sums of activity across an upstream sensory population representation (IT) that is largely stable.
|Short Title||Journal of Vision|