A unified neuronal population code fully explains human object recognition.

TitleA unified neuronal population code fully explains human object recognition.
Publication TypeConference Paper
Year of Publication2012
AuthorsMajaj, NJ, Hong, H, Solomon, EA, DiCarlo, JJ
Conference NameComputation and Systems Neuroscience (COSYNE)
Date Published02/2012
Conference LocationSalt Lake City, Utah, USA

Our goal is to understand the neuronal mechanisms that underlie human visual object recognition (OR). While previous work has argued for qualitative links between neuronal responses in the ventral visual stream and human shape judgements, no study has asked which, if any, neuronal responses are quantitatively sufficient to explain broad domain human OR performance. The shift from qualitative to quantitative hypotheses requires a framework to link neuronal responses to behavior (“unified code”). Here we ask: is there a common neuronal basis (e.g., in IT cortex) and a simple (e.g., linear) transformation that will predict all of human OR performance? We first defined OR operationally by obtaining human psychophysical measurements using images that explore shape similarity and identity preserving image variation, resulting in OR benchmarks that span a range of difficulty. Using the same visual images, we measured neuronal responses in V4 and IT in two monkeys. We implemented 14 unified codes based on those neuronal data and computed cross-validated neuronal discriminability indices (d’s) to compare to the human d’s. The dynamic range across those d’s sets a high bar for when a putative code is sufficient to explain behavior: it is not sufficient for a code to perform well (high d’) or to match one d’. Instead, a sufficient unified code must also emergently predict the entire pattern of behavior over all tasks. Remarkably, we found a few unified IT-based codes that meet this high bar. Interestingly, many other IT codes and all V4 codes are insufficient. While humans outperform computer vision systems on many of our OR tasks, their abilities reliably depend on the images tested. These dependencies in human performance are fully explained by a simple, unified reading of monkey ventral stream neurons, a feat unmatched by any computer vision system we tested