Object recognition is challenging because each object produces myriad retinal images. Responses of neurons from the inferior temporal cortex {(IT)} are selective to different objects, yet tolerant ("invariant") to changes in object position, scale, and pose. How does the brain construct this neuronal tolerance? We report a form of neuronal learning that suggests the underlying solution. Targeted alteration of the natural temporal contiguity of visual experience caused specific changes in {IT} position tolerance. This unsupervised temporal slowness learning {(UTL)} was substantial, increased with experience, and was significant in single {IT} neurons after just 1 hour. Together with previous theoretical work and human object perception experiments, we speculate that {UTL} may reflect the mechanism by which the visual stream builds and maintains tolerant object representations.