A potentially organizing goal of the brain and cognitive sciences is to accurately explain domains of human intelligence as executable, neurally mechanistic models. Years of research have led to models that capture experimental results in individual behavioral tasks and individual brain regions. We here advocate for taking the next step: integrating experimental results from many laboratories into suites of benchmarks that, when considered together, push mechanistic models toward explaining entire domains of intelligence, such as vision, language, and motor control. Given recent successes of neurally mechanistic models and the surging availability of neural, anatomical, and behavioral data, we believe that now is the time to create integrative benchmarking platforms that incentivize ambitious, unified models. This perspective discusses the advantages and the challenges of this approach and proposes specific steps to achieve this goal in the domain of visual intelligence with the case study of an integrative benchmarking platform called Brain-Score.