So, has it gotten any better?

Not much, according to our latest experiment.

Curious as to if and how quickly face recognition is improving, Comparitech decided to conduct a similar study almost two years later. We also added UK politicians into the mix, for a total of 1,959 lawmakers.

Results

We split the results between US and UK politicians. But before we discuss results, let’s first review the fulcrum on which all of these tests pivot: confidence thresholds.

Confidence thresholds

The ACLU used Rekognition’s default settings, which set the confidence threshold at 80 percent.

Raising the confidence threshold inevitably leads to fewer false positives (incorrectly matching two photos of different people), but also more false negatives (failure to match two photos of the same person). Unfortunately, we can’t measure the latter in this experiment. More on that later.

US

The US data set was comprised of photos of 430 Representatives and 100 Senators.

At an 80 percent confidence threshold, Rekognition incorrectly matched an average of 32 US Congresspersons to mugshots in the arrest database. That’s four more than the ACLU’s experiment two years ago.

UK

Our UK data set consists of 1,429 politicians: 632 Members of Parliament and 797 Members of the House of Lords. We matched them against the same arrest photos as the US politicians.

At an 80 percent confidence threshold, Rekognition misidentified an average of 73 politicians to mugshots in the arrest database.

The rate of false positives was lower for UK politicians (5 percent) than for US ones (13 percent), which might suggest UK politicians look substantially different than their US counterparts, at least according to Rekognition.

When we raised the confidence threshold to 95 percent, there were no incorrect matches.

Racial bias

Our results support this finding. Out of the 12 politicians who were misidentified at a confidence threshold of 90 percent or higher, six were not white (as shown in the image at the top of this article). That means half of misidentified people were people of color, even though non-whites only make up about one-fifth of US Congress and one-tenth of UK parliament.

Methodology

We used publicly available photos of 430 US Representatives, 100 US Senators, 632 members of UK Parliament, and 797 members of the House of Lords.

In some instances, a single politician was misidentified more than once against multiple mugshots. This counts as a single false positive.

This spreadsheet contains all of the politicians who matched at or above 70 percent confidence, their photos, and the confidence at which Rekognition matched them.

Why you shouldn’t trust face recognition accuracy statistics

Be skeptical any time a company invested in face recognition peddles metrics about how well it works. The statistics are often opaque and sometimes downright misleading.

Here’s an example of how statistics about face recognition accuracy can be twisted. In the UK, the Met police force claimed its face recognition technology only makes a mistake in one of every 1,000 cases. They reached this number by dividing the number of incorrect matches by the total number of people whose faces were scanned. This inflates the accuracy rating by including true negatives—the vast majority of images that were not matched at all.

In contrast, independent researchers at the University of Essex found the technology had an error rate of 81 percent when they divided the number of incorrect matches by the total number of reported matches. The University’s report is much more in line with how most people would reasonably judge accuracy, disregarding true negatives and focusing on the rate at which reported matches are correct.

A later report found the Met police used live face recognition to scan 8,600 people’s faces without consent in London. The results were in line with the University of Essex’s findings: one correct match leading to an arrest, and seven false positives.

False negatives

Even more seldom reported is the rate of false negatives: two images of the same person that should have been matched, but weren’t. As a hypothetical example of this error in practice, a face recognition-equipped camera at an airport would fail to trigger an alert upon seeing a person it should have recognized. Another form of false negative would be failing to recognize that a face exists in an image at all.

In order to measure the rate of false negatives, we would have to populate our mugshot database with some real—but not identical—photos of the politicians. Because our aim was to recreate the ACLU’s test, this was beyond the scope of our experiment.

Real world use cases

Let’s also consider what we’re comparing: two sets of headshots. One contains police mugshots and the other doctored portraits, but both offer clear views of each person’s face at eye level, facing the camera.

Real world use cases are much different. Let’s take CCTV surveillance for example. Police want to scan faces at an intersection and match them against a criminal mugshot database. Here’s just a few factors that further muddy claims of how well face recognition performs in such a real world setting:

  • How far away is the camera from the subject?
  • At what angle is the camera pointed at the subject?
  • What direction is the subject facing?
  • Is the subject obscured by other humans, objects, or weather?
  • Is the subject wearing makeup, a hat, or glasses, or have they recently shaved?
  • How good is the camera and lens? Is it clean?
  • How fast is the subject moving? Are they blurry?

All of these factors and more affect face recognition accuracy and performance. Even the most advanced face recognition software available can’t make up for poor quality or obscured images.

Putting too much faith in face recognition can lead to false arrests. In April 2019, for example, a student sued Apple after the company’s face recognition software falsely linked him to thefts at several Apple stores, leading to his arrest.