Like humans, adult non-human primates can learn to categorize visual objects. Prior work shows that neurons in the inferior temporal (IT) cortex, critical for object recognition, modestly increase their selectivity to objects from learned categories. Do these neural changes underlie the behavioral performance gains (“learning”), and if so – how? While the field now has relatively accurate models of image-driven IT responses, we still lack a similar computational understanding of adult IT plasticity. To address this, we measured neural activity across the IT cortex in two groups of monkeys: one group (“naive”) was only trained to fixate passively on images; the other group (“trained”) also learned to discriminate object categories. First, consistent with previous studies, we observed a significant increase (63%) in object-category selectivity in IT responses of trained compared to naïve monkeys. Next, this selectivity increase led to a more (37%) categorical representation at the population level (as assessed by an RDM analysis), and also enhanced (19%) the IT population activity-based linear decoding accuracy for the learned object categories. Lastly, these changes in trained responses also improved the predictions of image-level behavioral error patterns. How do these observed changes in IT lead to improvements in behavior? We present a systems-level perspective by casting the monkey’s category training as an extension of contemporary artificial neural networks (ANNs). Interestingly, we observed that for various finetuned ANNs (with different architectures, pre-training objectives, and category learning schemes), the untrained IT-matched ANN-layer showed macaque-IT-like increases in category information after training. Akin to IT, specific ANN-IT representations were also more predictive of monkey behavior after training. In sum, we provide empirical evidence of moderate, behaviorally relevant plasticity in adult IT upon category learning and introduce a computational framework to simulate these changes, enabling us to formulate testable hypotheses about the representational reconfigurations induced by category learning.