Establishing Good Benchmarks and Baselines for Face Recognition

TitleEstablishing Good Benchmarks and Baselines for Face Recognition
Publication TypeConference Paper
Year of Publication2008
AuthorsPinto, N, DiCarlo, JJ, Cox, DD
Conference NameEuropean Conference on Computer Vision-Faces in 'Real-Life' Images Workshop
Date Published10/2008
Conference LocationMarseille, France

Progress in face recognition relies critically on the creation of test sets against which the performance of various approaches can be evaluated. A good set must capture the essential elements of what makes the problem hard, while conforming to practical scale limitations. However, these goals are often deceptively difficult to achieve. In the related area of object recognition, Pinto et al. [2] demonstrated the potential dangers of using a large, uncontrolled natural image set, showing that an extremely rudimentary vision system (inspired by the early stages of visual processing in the brain) was able to perform on par with many state-of-the-art vision systems on the popular Caltech101 object set [3].  At the same time, this same rudimentary system was easily defeated by an ostensibly "simpler" synthetic recognition test designed to better span the range of real world variation in object pose, position, scale, etc. These results suggested that image sets that look "natural" to human observers may nonetheless fail to properly embody the problem of interest, and that care must be taken to establish baselines against which performance can be judged. Here, we repeat this approach for the "LabeledFaces in the Wild" (LFW) dataset [1], and for a collection of standard face recognition tests. The goal of the present work is not to compete in the LFW challenge, per se, but to provide a baseline against which the performance of other systems can be judged. In particular, we found that our rudimentary "baseline" vision system was able to achieve 68% correct performance on the LFW challenge, substantially higher than a pure chance" baseline. We argue that this value might serve as a more useful baseline against which to evaluate absolute performance and argue that the LFW set, while perhaps not perfect, represents an improvement over other standard face sets.