|Title||Correlation-based spatial layout of deep neural network features generates ventral stream topography|
|Publication Type||Conference Proceedings|
|Year of Publication||2020|
|Authors||Margalit, E, Lee, H, Marques, T, DiCarlo, JJ, Yamins, DLK|
|Conference Name||Computation and Systems Neuroscience (COSYNE)|
|Conference Location||Denver, CO|
The primate visual system is organized into functional maps, including pinwheel-like arrangements of orientationtuned neurons in primary visual cortex (V1) and patches of category-selective neurons in higher visual cortex. Recent work has demonstrated that deep convolutional neural networks (DCNNs) trained for object recognition are good descriptors of neural representations throughout the ventral pathway, with early, intermediate, and late cortical brain areas best predicted by corresponding layers of the DCNN. Despite this success, DCNNs have no inherent spatial layout for features at a given retinotopic location, and thus, make no predictions regarding many of the characteristic topographic phenomena observed in the brain beyond retinotopy itself, e.g., pinwheels and patches. Cortical map formation has been modeled using self-organizing maps that leverage principles of wiring-length minimization and local correlations of unit responses to produce topographic structure. However, these methods rely on simplified feature parameterizations that limit their ability to accommodate more realistic descriptions of neuron response properties, especially in higher visual areas. Here, we augment DCNNs by assigning model units spatial positions in a 2D “cortical sheet” and introduce a novel algorithm to arrange units so that local response correlations are maximized. Applying this algorithm to a categorization-optimized DCNN, we find that layouts generated from earlier layers recapitulate core features of V1 orientation, spatial frequency, and color preference maps, while those generated from later layers naturally exhibit category-selective clusters. Because this wide range of apparently disparate phenomenology is produced by the same underlying principle, our results suggest that the functional architecture of the visual system can be explained by two fundamental constraints: the need to perform visual tasks and the pressure to minimize biophysical costs such as wiring length. Our framework for spatially mapping DCNNs integrates biophysical and representational phenomenology, allowing a more unified understanding of the visual system’s functional architecture.