The research goal of our laboratory is to understand the mechanisms underlying visual object recognition. Specifically we seek to understand how sensory input is transformed by the brain from an initial representation (essentially a photograph on the retina), to a new, remarkably powerful form of representation – one that can support our seemingly effortless ability to solve the computationally difficult problem of object recognition. We are particularly focused on patterns of neuronal activity in the highest levels of the ventral visual stream (primate inferior temporal cortex, IT) that likely directly underlie recognition. At these high levels, individual neurons can have the remarkable response property of being highly selective for object identity, even though each object’s image on the retinal surface is highly variable – for example, due to changes in object position, distance, pose, lighting and background clutter. Understanding the creation of such neuronal responses by transformations carried out along the ventral visual processing stream is the key to understanding visual recognition.

To approach these very difficult problems, the work of our laboratory is directed along three main lines: 1) characterize the computational usefulness of patterns of IT neuronal activity for supporting immediate visual object recognition, 2) test and develop computational theories of how visual input is transformed along the ventral processing stream from a pixel-wise representation, to a powerful representation in IT, 3) understand the spatial organization of this representation. Our primary research approaches are: neurophysiology in awake, behaving non-human primates, functional brain imaging (fMRI), human psychophysics, and computational modeling. Across all of these endeavors we aim to develop innovative methods and tools to facilitate this work in our laboratory and others. Our approaches are often synergistic with those of other MIT laboratories, and this has greatly enhanced our progress.

The “Invariance” Problem

We are focussed on the crux problem of visual object recognition, which is called the “invariance” problem. This problem results from the fact that each object category can present an essentially infinite number of images to us – due to changes in object position, distance, pose, lighting, background, deformation, and exemplar variation. Yet somehow the brain is able to determine that all of these images still contain the same object category (e.g. all contain a “car” in the examples below).

The “Invariance” Problem

The ability to solve the “invariance” problem is what separates humans from machines [1],[2] and its solution in the primate brain results from very clever algorithms that are likely fundamental to the way our cortex processes sensory information.

Elucidating Neuronal Object Codes

How well does the ventral visual stream solve object representation?

How much improvement does each processing stage add to that performance?

The Quest for Underlying Algorithms

Which computational ideas are at work in the ventral visual stream?

Building and testing computational algorithms that implement those ideas at scale

This work is being carried out in collaboration with The Visual Neuroscience Group, led by Dr. David Cox, at Harvard’s Rowland Institute.

The Circuits that Implement those Algorithms

Is the ventral stream divided into different spatial regions to support different tasks?

Do such divisions allow targeted interaction with perception?

Do such divisions allow deeper insight into the underlying algorithms at work?