Statistics

Notes On AI Accuracy (Facial Recognition)

Here’s a tweet (addressed to Judea Pearl), “If you could get rid of all the spurious non causal correlations in a machine-learning model, you would be left with only the invariant ones which would in turn allow you to understand causality relationships.”

Let’s rephrase that: If you could identify only causal connections in a model, you could understand cause.

True. But of the same value as saying If we could only create a model that predicts perfectly, we’d have a model that makes perfect predictions.

Identifying which connections are in the causal path of some observable, and which are connections are merely spurious, is I think what AI researchers would admit is their holy grail.

They’ll never find it.

At least, not in any general sense.

The first problem is the notions of cause many are using isn’t right. The second is that grasping a causal power is an activity of the intellect, and machines don’t have intellects.

Oh, sure, they’ll be able to program machines to automate tasks that people first know how to do, and where the causal powers of things are at least roughly understood. License plate readers do a reasonable job, as does, to a lesser extent, facial recognition. A lot of human minds, or rules derived from human reasoning, goes into these algorithms. Nothing wrong with that. Indeed, it’s just the right thing. But it’s not AI discovering the cause, it’s non-artificial intelligence.

There are myriad causes of a pixel firing a certain strength on a CCD. And for its neighbor pixels firing or remaining quiescent. One of the causes is the shape of the letters in the reflected light off a plate or face. Others are dirt, rain scatter, etc., etc., etc. You can’t know with certainty you have only the reflection isolated, and the other stuff eliminated. You’ll be left with a model—a predictive model, thank the Lord—in the end.

Well, it’s the same in any model. The hope is that only the right “stuff” is measured to predict the observable. But if we knew we only had the right stuff, then we’d know the cause of the observable, and then we don’t have AI, we have physics. In other words, AI is just statistical modeling. But predictive statistical modeling, which is good.

Facial recognition is big. I’m not up on facial recognition tests in real environments. I mean I don’t know how accurate they are in non-laboratory conditions. The picture heading this post gives some indication (from here; only a database test, I believe). There are reports like this:

The closely watched NIST results released last November concluded that the entire industry has improved not just incrementally, but “massively.” It showed that at least 28 developers’ algorithms now outperform the most accurate algorithm from late 2013, and just 0.2 percent of all searches by all algorithms tested failed in 2018, compared with a 4 percent failure rate in 2014 and 5 percent rate in 2010.

That 0.2% is fantastic, of course. That sounds like a planned database test, and not a field test. Any algorithm ought to do good in database tests, or it’s not worth talking about in public.

A field test is harder. Say, point a camera at the airport security line and see how many bad guys on the wanted list are discovered. Accuracy can’t be assessed unless you have actors playing the roles, since in a real line you won’t know who you missed; only false positives are recorded. You’re back in dirty license plate territory. Seems in real tests the accuracy is closer to coin flipping.

Here, too, the idea of skill is paramount. Probably not a lot of bad guys go through any given line. Suppose it’s 1 in a 1000. So if you guessed for every single person “Not a bad guy”, you’d be right 999 out of a 1000, an accuracy of 99.9%. Wonderful!

No, the model—and it is a model—stinks. Any fancy-dancy AI algorithm must beat 99.9% or it’s useless. If it can’t beat it, then we’re better off guessing “Not a bad guy.”

Of course, you might be willing to accept some extra false positives in exchange for not missing real bad guys. Better to inconvenience a few travelers than let some bad guy fly to Hollywood and harass the indigenous populants. There are formal ways you can account for this asymmetry (ahem).

Anyway, performance is surely not what’s reported in the press.

We’ll discuss much more about AI/stats and cause later.

To support this site using credit card or PayPal click here

Categories: Statistics

21 replies »

  1. The real problem is teaching computers to understand that faces are 3D objects, and that light and shadow are things. And different haircuts. And beard growth. And mascara. And hats. And that black people exist, and have faces.

  2. A lot of human minds, or rules derived from human reasoning, goes into these algorithms [followed by a lot of stuff about license plate reading]

    But what if the rules were developed by the algorithm? Who needs cause?

    Here’s an algortihm that can generate new images of distracted drivers. It was only shown images of distracted drivers from the Kaggle State Farm Distracted Driver Detection dataset (see article for link to the dataset).
    https://towardsdatascience.com/generative-adversarial-networks-gans-for-beginners-82f26753335e
    As it can generate new images, it must have modeled them. The new images aren’t merely noisy copies of the training set but genuine new images. Granted they don’t differ much from the training set but then it won’t produce pictures of distracted cats either.

    The very same code (slightly modified to account for the difference in image attributes like number of pixels, etc. — called its shape in Tensorflow) could have been used to generate images of handwritten digits if it was given a dataset containing samples of them (such as the MINST dataset: https://en.wikipedia.org/wiki/MNIST_database)

  3. “A field test is harder. Say, point a camera at the airport security line and see how many bad guys on the wanted list are discovered”

    Trials of the UK’s Metropolitan Police system show a false positive rate of 98%..

    https://www.theregister.co.uk/2019/05/16/police_face_recognition/

    https://www.theregister.co.uk/2019/02/06/met_police_cop_to_just_one_successful_arrest_during_latest_facial_recog_trial/

    “Freedom of Information responses show that the force has only correctly identified two people using the kit – neither of whom was a criminal.”

    And an unwillingness to take the tech to places where police/public relations are not totally cordial…

    https://www.theregister.co.uk/2018/05/24/met_police_wont_use_facial_recognition_tech_at_notting_hill_this_year/

  4. If you’re going to badmouth the use of AI you may be going about it the wrong way. AI is highly desired for some of the same reasons that annoying phone menus are adopted—means fewer people need to be hired.

    Since AI is showing up in places like self-driving autos, instead of arguing the philosophical argument for why AI is imperfect, everybody would be better off if the attack came from the legal liability perspective — because certain shortcomings were foreseeable, the firm adopted anyway, property & people were dashed, injured, killed as a result.

    Most could not care less about philosophy if they tried to care less, but pose a threat to reputation or cash flow and they will pay attention.

  5. I recall seeing a recent post (in one of those PC publications) that all of these facial recognition systems are sexist/racist because the AIs are identifying black women as men, or bad at identifying people taking hormones, etc.

    https://www.vox.com/future-perfect/2019/4/19/18412674/ai-bias-facial-recognition-black-gay-transgender

    What’s funny is the algorithms have rediscovered some of what we all know to be true – people of different group backgrounds have different behaviors and features, and when you make predictions based on group behavior you get results that humans often instinctively recognize (but over-certainty is still a problem of course). The fact that algorithms are (re)discovering that blacks commit more crime, for example, is a hate fact that cannot be left to fester!

    Before Heartiste was depersoned he had a post about facial recognition mistakenly identifying black women as men. Much wailing and gnashing of teeth about the need to fix it, instead of saying, huh, seems like maybe the features used to identify masculinity/femininity are different across blurred race boundaries…

  6. A cause is a letter, etc.

    Well, sure but I asked who needs it?
    You don’t need it to recognize letters when you see them.

    There are algorithms that can produce textual descriptions of images which include what is happening in the image (e.g., person riding a horse, child playing in a sandbox). All without any understanding of the world beyond what can bee found in images.

    There are even algorithms which can mimic artistic style as being shown samples. There’s a similar one I came across which can mimic clothing styles. And then another which can identify the mood of a facial photo (happy, sad, etc.), Again, only from models derived from images.

  7. Nate,

    One of the problems with facial recognition of blacks is a result from lower contrast levels in the image. Plus, what is being called facial recognition isn’t much more sophisticated than fingerprint cataloging — a shortcut at best. Expect that to improve over time. Don’t confuse what has been done with what can be done.

  8. I have heard so many time—You look just liked X. One time I had a dentist whom I never met before cowering in the corner because he thought I was his EX (no pun intended). I not only looked like her, but my handwriting on the intake form was similar to hers. The human eye is the finest detector of the human form, and even humans make mistakes.

  9. Briggs: A cause is a letter, etc.

    DAV: Well, sure but I asked who needs it?
    You don’t need it to recognize letters when you see them.

    Since “it”=”cause”=”a letter”
    “You don’t need [a letter] to recognize letters when you see [letters]”

    Of course, the scanner may register an H shape, but it won’t tell you whether it means the sound “aitch”, the sound “en” or the sound “mi” or that there is a hospital in the neighborhood. Or that a symbol previously unseen ?? is a letter or not.

  10. Any fancy-dancy AI algorithm must beat 99.9% or it’s useless. If it can’t beat it, then we’re better off guessing “Not a bad guy.”

    How good are people who do this? I doubt that anyone reaches that level of accuracy. (See below). Why does it have to be better than a human?

    https://www.apa.org/action/resources/research-in-action/eyewitness

    Lining up suspects in front of a one-way mirror and allowing eyewitnesses to choose which one is the perpetrator is standard police procedure. Yet DNA evidence has repeatedly revealed the limitations of this technique: Many prison inmates whose convictions hinged on eyewitness identification were later proven innocent by DNA testing.

  11. Of course, the scanner may register an H shape, but it won’t tell you whether it means the sound “aitch”, the sound “en” or the sound “mi” or that there is a hospital in the neighborhood. Or that a symbol previously unseen ?? is a letter or not.

    Same for a human not familiar with English. And by “unseen” I assume you meant one without example in the training set. The first step in what you’ve described starts with recognition.

  12. Since “it”=”cause”=”a letter”
    “You don’t need [a letter] to recognize letters when you see [letters]”

    Silly substitution. You only need to know what patterns are associated those found the current image. To do this, you don’t need to know anything else (such as: it’s a letter) although context can be useful in the presence of noise.

  13. “If you could get rid of all the spurious non causal correlations”
    Is there such a thing as a causal correlation? I know the doctors believe in causal associations because I have read that in medical studies.

  14. Is there such a thing as a causal correlation?

    Yes. If X causes Y then X and Y are necessarily correlated.

  15. But Dave, I remember reading that correlation is not causation.

    Almost true. While correlation is necessary for causation, correlation by itself does not necessarily imply causation — particularly with two variables. An example, the price of rum is correlated with New England preacher’s s salaries. They only appear linked because both have a common cause: inflation.

  16. “…grasping a causal power is an activity of the intellect, and machines don’t have intellects.”

    Not it’s only activity. Why can’t investigations and research not just carry on without calling time out on AI research?

    It’s biomimicry in another form.

  17. Somewhat OT, but one thing I’ve noticed as I’ve got older is that I increasingly have the feeling when I see strangers that I’ve met them before. Has this happened to anyone else?

  18. Also intended to say that the quote above is restating the problem in another way.
    The ‘intelligence’ is ‘artificial’, so it would be like other artificial things. Since one claim is ‘intelligence’ and the other is ‘intellect’, the argument’s not properly clearly made that it can’t be done.
    Perhaps the real argument is about the word ‘learning’, and how it happens. It’s easy to see how a tool could b set up to learn within certain boundaries.

    Swordfish,
    Yes, many times, not faces, though.

  19. Briggs, it’s not true that a model must be 99.9% accurate in order to be useful.

    I can think of three scenarios off the top of my head.

    1. Any scenario where it’s only important to know that Person A is NOT Person B. In this case an error rate of 1/10 might even be acceptable (depending on the expected utility).

    2. In a semi-automatic mode of operation; I have 1000 people coming through a line and a database of 50 persons-of-interest. A model could be used to narrow the field of choice (or optimize the review process ergonomically) for a human operator.

    3. As a verification step where the model is augmented with extra information; like an employee ID card for example. In this case a failure rate of 1/100 might be acceptable.

Leave a Reply

Your email address will not be published. Required fields are marked *