William M. Briggs

Statistician to the Stars!

Page 148 of 693

Naomi Oreskes Plays Dumb On Statistics And Climate Change

Our author, thinking thoughts.

This post is one that has been restored after the hacking. All original comments were lost.

Remember how I said, again and again—and again—that everybody gets statistics wrong? Here’s proof fresh from the newspaper “of record”, which saw fit to publish prominently an odd article by Naomi Oreskes, who wrote:

Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.

This is false, but it’s false in a way everybody thinks is true. I hate harping (truly, I do), but “significance” is this. An ad hoc function of data and parameter inside a model produces a p-value less than the magic number. Change the function or the model and, for the same data, “significance” comes and goes. Far from being “scant”, that 1 in 20 is trivially “discovered” given the barest effort and creativity on the part of researchers. As regular readers know, time and again nonsensical results are claimed real based on 1 in 20.

That’s the small error. The big one is where she says scientists “will accept a causal claim” when wee p-values are found. It isn’t Oreskes that’s wrong. Scientists will accept a causal claim in the presence of p-values. Problem is they should not. A wee p-value does not prove causality. A non-wee p-value does not—it absolutely DOES NOT—say that the results “occurred by chance”. No result in the history of the universe was caused by or occurred by chance. Chance or randomness are not causes. They are states of knowledge, not physical forces.

If I thought it’d do any good, I’d scream those last four sentences. It won’t. You’re too far away. Do it for me.

Oreskes goes on to discuss “Type 1” and “Type 2” errors (statistical terminology is usually dismal like this). “Type 1” is the false positive, accepting that which is false as true. Sociologists, educationists, psychologists, any person with “studies” in their title, and similar folk know this one well. It is their bread and butter. “Type 2” is the false negative, not accepting that which is true. Die-hard consensus lovers in changing fields know this one intimately.

By far, and without even a skosh of a scintilla of a doubt, false positives are the larger problem. Most new ideas are wrong, for the same reason most mutations are bad. We can certainly understand somebody holding to a mistaken consensus. like those who disbelieved in continental drift, or those who believe the world will end in heat death unless the government is given orders of magnitude more control over peoples’ lives. Going against the flow is rarely encouraged. But if you’re rewarded for coming up with “unique” and politically favorable findings, as indeed scientists are, trumpets will be incorrectly sounded all too often.

Yet Oreskes embraces false positives for the good they will do.

When applied to evaluating environmental hazards, the fear of gullibility can lead us to understate threats. It places the burden of proof on the victim rather than, for example, on the manufacturer of a harmful product. The consequence is that we may fail to protect people who are really getting hurt.

She next aptly uses the word “dumb” to describe thinking about this situation. No better term. Look: the manufacturer is guilty because it has made a harmful product. The poor victim can’t have justice for fear of false positives. Yet how do we know the manufacturer is guilty? According to Oreskes’s logic: because it is a manufacturer! That’s dumb thinking all right.

She then asks:

What if we have evidence to support a cause-and-effect relationship? Let’s say you know how a particular chemical is harmful; for example, that it has been shown to interfere with cell function in laboratory mice. Then it might be reasonable to accept a lower statistical threshold when examining effects in people, because you already have reason to believe that the observed effect is not just chance.

So we know this chemical boogers up some mice. Is that proof it does the same in men? No, sir, it is not. Especially when we consider that the mice might have been fed a diet of nothing but the chemical in order to “prove” the chemical’s harmful effects.

And she misunderstands, again, the nature of probability. We want to know the probability that, given this chemical, a man will fall ill. That can be answered. But simply loosening the p-value requirement does nothing to help to answer it. Lowering an evidential standard which is already a wide-open door can only mislead. You also notice the mistake about the observed effect being “just chance.”

This is what the United States government argued in the case of secondhand smoke. Since bystanders inhaled the same chemicals as smokers, and those chemicals were known to be carcinogenic, it stood to reason that secondhand smoke would be carcinogenic, too. That is why the Environmental Protection Agency accepted a (slightly) lower burden of proof: 90 percent instead of 95 percent.

Yes. The EPA misled itself then us. What we wanted, but did not get, was, given a person inhales this known amount of secondhand smoke (of this and such quality), what is the probability the person develops cancer? What we got were crappy p-values and preconceptions passed off as causes. We remain ignorant of the main question.

Sigh. It’s reality and probability deniers like Oreskes and the EPA that give science a bad name.

Despite all evidence, Oreskes claims scientists are fearful of embracing false positives. Why?

The answer can be found in a surprising place: the history of science in relation to religion. The 95 percent confidence limit reflects a long tradition in the history of science that valorizes skepticism as an antidote to religious faith.

Dear Lord, no. No no no. No. Not even no. If this were a private blog, I’d tell you the real kind of no, the sort Sergeant Montoya taught me in basic training. No. That rotten 95-percent “confidence” came from Fisher and quickly transmogrified into pure magic. That level is religion. It is in no way an antidote to it, nor was it ever meant to be. Good grief!

I stopped reading after this, having been reduced to an incoherent sputtering volcanic mass. This person, this misinformed and really quite wrong person, is feted, celebrated, and rewarded for being wrong, being wrong in the direction politically desired, while folks like Yours Truly are out in the cold for being impolitely right. Hard to take sometimes.

Update The post at our friend’s D.G. Mayo (an unrepentant frequentist) on this subject is worth reading.

Real Climate Temperature “Trend” Article Gets It Wrong (Like So Many Do)

This post is one that has been restored after the hacking. All original comments were lost.

Everything that can go wrong with a time series analysis has gone wrong with the post “Recent global warming trends: significant or paused or what?” at Real Climate. So many classic mistakes are made that I hesitate to show them all. But it’ll be worth it to do so. Be sure to read to the end where I ascribe blame.

The model is not the data

Here is the author’s Figure 2, which is the “HadCRUT4 hybrid data, which have the most sophisticated method to fill data gaps in the Arctic with the help of satellites”. Keep that “data gaps” phrase in the back of your mind; for now, let it pass.

Fig. 2 from Real Climate

The caption reads “Global temperature 1998 to present” and (from Fig. 1) “monthly values (crosses), 12-months running mean (red line) and linear trend line with uncertainty (blue)”.

Supposing no error or misunderstandings in the data (for now), those light gray crosses are the temperatures. They are the most important part of this plot. But you can’t tell because the data has, in effect, been replaced by a model. Two models, actually, both of which because they are so boldly and vividly colored take on vastly more importance than mere reality.

The data happened, the models did not. That blue line did not occur; neither has the red line anything to do with reality. These are fictions; fantasies; phantasms. The red line claims nothing; no words are devoted to it except to announce its presence; it is a mystery why it is even there. It is a distraction, a visual lie. Well, fib. There is no reason in the world to condense reality in this fashion. We already know how reality happened.

The blue line is an animal of different stripe. It seems to say something about a trend.

A trend is not a trend is not a trend

Look only at the crosses (which is very difficult to do). Has the as-defined-above global temperature increased since 1998? Yes. That is to say, no. Rather, yes. Well, it depends on what is meant by increased.

I’ve talked about this dozens of times (see the Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series for a terrific example), but there is no mystery whether or not a given set of (assumed-error-free) data has or hasn’t a trend. To tell, two things must be in place: (1) a definition of trend or increase and (2) a check to see whether the definition has been met.

There is no single “scientific” definition of an increasing trend: there possibilities are legion. One might be that the data during the second half of the time period has a higher arithmetic mean than the first half. Another is that the last point in time is higher than the first point. Another is that there are more values in the second half (last quarter, or whatever) higher than some constant than in the first half (quarter, etc.). It could be that each successive point must be equal to or greater than the previous points. And there are many more possibilities.

It doesn’t matter which definition you pick, though what you pick should relate to the decisions to be made about the data: once the definition is in hand all you have to do is look. The trend will be there or it won’t; i.e. the criterion implied by the definition will have been realized or it won’t have been. That’s all there is to it.

In particular, no “tests” of “statistical significance” need or should be announced. The trend will or won’t be there, full stop; indeed, a statistical test at this point is dangerous. It is apt to mislead—as it has misled the author of the graph.

The model is not the data

You see the blue line. Accompanying it are two light-blue curves. What could those be? The line itself we know is a fiction. It is what did not happen. The crosses happened. The blue line is a “smoother”, in this case a regression line. Its purpose is to replace the data will something which is not the data. Why? Well, so that thing-that-did-not-happen can be spoken of in statistical language, here a grammar of obfuscation.

We’ll get to the light-blue curves, but first examine the title of the plot “Trend: 0.116 +/- 0.137oC/decade 2σ”. This seems to indicate that the author has decided on a—not the—definition of a trend and discovered its value. That definition is the value of the parameter in a simple linear regression with one parameter as an “intercept”, another attached to time as a linearly increasing value, and a third for the spread (the σ). The parameter attached to time is called “the trend”. (Never mind that this trend changes depending on the starting and stopping point chosen, and that good choices make good stories.)

Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137” business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context.

Just as those light-blue curves are. They indicate nothing at all. The blue line didn’t happen; neither did the curves. The curves have nothing to say about the data, either. The data can speak for themselves.

The author on some level appears to understand this, which causes him to speak of “confidence intervals.”

Have no confidence in confidence intervals

(Note: it is extremely rare that anybody gets the meaning of a confidence interval correct. Every frequentist becomes an instant Bayesian the moment he interprets one. If you don’t know what any of that means, read this first. Here I’ll assume the Bayesian interpretation.)

The light-blue curves and the plus-or-minuses above had to do with confidence intervals. The author, like most authors, misunderstands them. He says

You see a warming trend (blue line) of 0.116oC per decade, so the claim that there has been no warming is wrong. But is the warming significant? The confidence intervals on the trend (+/- 0.137) suggest not — they seem to suggest that the temperature trend might have been as much as +0.25 oC, or zero, or even slightly negative. So are we not sure whether there even was a warming trend?

That conclusion would be wrong — it would simply be a misunderstanding of the meaning of the confidence intervals. They are not confidence intervals on whether a warming has taken place — it certainly has. These confidence intervals have nothing to do with measurement uncertainties, which are far smaller.

Rather, these confidence intervals refer to the confidence with which you can reject the null hypothesis that the observed warming trend is just due to random variability (where all the variance beyond the linear trend is treated as random variability). So the confidence intervals (and claims of statistical significance) do not tell us whether a real warming has taken place, rather they tell us whether the warming that has taken place is outside of what might have happened by chance.

This is a horrible confusion. First, “significant” has no bearing on reality. If the temperature, for instance, had increased by (say) 20 degrees, that would have been significant in its plain-English sense. It would have been hot! Statistical significance has no relation to plain-English significance. In particular, many things are statistically “significant” which are in actuality trivial or ignorable. Statistical significance is this: that a certain parameter in the model, when input into an ad hoc function set equal to a predetermined value, produces a p-value smaller than the magic number. Significance thus relies on two things (1) a statistic, many of which are possible for this model, and (2) a model. Two statistics in the same model can produce one instance of “significance” and one instance of “non-significance”, as can simply switching models.

Here the author decided a linear regression trend was the proper model. How does he know? Answer: he does not. The only—as in only—way to know if this model is any good is to use it to forecast values past 2014 and then see if it has skill (this is a formal term which I won’t here define). To prove he is fiddling and not applying a model he has deduced, look into his article, where he applies different models with different starting dates, all of which give different blue lines. Which is correct? Perhaps none.

Second, the author says that a trend “certainly was” in the data. This is true for his definition of a trend (see the Netherlands post again). It isn’t true for other definitions. But the author needed no statistical test to show his version of a trend obtained.

Third, the real error is in the author’s failing to comprehend that statistical models have nothing to do with causes. He claimed his test was needed to rule out whether the data was caused by (or was “due to”) “random variability“. This term is nonsensical. It quite literally has no meaning. Randomness, as I’ve said thousands of times, cannot cause anything. Instead, something caused each and every temperature datum to take the value it did.

Don’t skimp your thinking on this. Prove to yourself how bizarre the author’s notion is. He drew a straight line on the data and asked whether “random variability” caused the temperature. Yes, it did, says his statistical test: his test did not reach statistical significance. Even the author said the blue line was a chimera (he didn’t erase it, though). He asks us to believe nothing, because randomness is not a physical thing, caused the temperature.

Now another model, or another statistic inside his model, might have produced a wee p-value, which would have rejected the “null hypothesis” that nothing caused the data (as it did in his Fig. 1). Very well. Suppose that was the case. What then caused the data if it wasn’t “random variability”? The statistical model itself couldn’t have. The straight line didn’t. Physical forces did. Does anybody anywhere believe that physical forces are causing the data to increase at precisely the same rate year on year, as in a straight line? Answer: no, that’s bizarre.

So, significance or not, the statistical model is useless for the purposes to which the author put it. The only proper use is to forecast new data. The model then says, and only says, “Given my model, here is the uncertainty I have in future data.” We can then check whether the model has any value.

If the author believes in his creation, I invite him to put his money where his model is. The Chicago Mercantile Exchange deals in heating and cooling degree day futures (which are simple functions of temperature), then he can make a fortune if his model really does have skill.

But before he does that, he’ll have to do a bit more work. Remember those “data gaps”?

Mind the gaps

Suppose global average temperature (GAT) where defined as “The numerical average of the yearly average values at locations A, B, …, and Z”. This is comprehensible and defensible, at least mathematically. Whether is has any use to any decision maker is a question I do not now answer except to say: not much.

As long as locations A-Z, and the manner at which the temperatures were computed at each location remained constant, then nothing said above need be changed one whit. But—and this is a big but—if the locations or manner in which the measurements were taken, we must necessarily become less certain than we were before. That “necessarily” is inescapable. Something like that is the case here. The HadCRUT4 data are not constant: locations change as do the way the measurements are taken (the algorithm used to produce the measurements has changed, and more).

For example, suppose one of the locations (say, D) dropped out this year. That makes any comparison with the GAT this year with previous years impossible. It’s apples and oranges. It’s like, though in a smaller way, saying last year the GAT used locations Cleveland and Vera Cruz and this year only Vera Cruz.1 Hey. We never said how many locations we had to have, right?

Well, we might estimate what the temperature was at D before we form our GAT. That’s acceptable. But—and this is where the bigness of the but comes in—we have to carry forward everywhere the uncertainty which accompanies this guess. We can no longer say that this year’s GAT is X, we must say it is X +/- Y, where the Y is the tricky bit, the bit most authors get wrong.

To guess the temperature at location D requires a statistical model. That model will be some fancy mathematical function with a parameter (or parameters) associated with temperature. We don’t know the value of this parameter, but there are techniques to guess it. We can even form a confidence interval around this guess. And then we can take this guess and the confidence interval and use it as the proxy for D—and then go on to compute the GAT, which is now X +/- Y.

Sound good? It had better not, because it’s wrong. Who in the world cares about some non-existent parameter! We wanted a guess of the temperature at D, not some lousy parameter! That means we have to form the predictive confidence (really, credible) interval around the guess at D, which is also necessarily larger than the interval around the guess of the parameter. That larger interval can be plugged into the formula for the GAT, which will produce this year (again) an X +/- Y.

The crosses aren’t the data after all

That means those crosses, because of the way the HadCRUT4 hybrid data were stitched together, aren’t the data like we thought they were. Instead of crosses, we should be looking at fuzzy intervals. We are not certain what the value of the GAT was in any year. Adding in this uncertainty, as is or should be mandatory, would make the picture appear blurry and unclear—but at least it would be honest.

Above I said the uncertainty we had in the GAT must be carried forward everywhere. I meant this sincerely. The author of the blue regression line thus cannot be sure that his value of a trend really is real. Neither can anybody. The author’s confidence intervals, which are wrong anyway, are not based on the true and complete uncertainty. And that means his model, if he does choose to use it to predict new data, will have prediction intervals that are too narrow.

And that means that he’s much more likely to lose his money, which I’m sure he’ll be putting in HDD futures. (Didn’t James Hansen make a fortune on these, Gav?)

Ego te absolvo

Why pick on this article? Well, it is one of many all of which use and compound the same statistical slip ups. The point, ladies and gentleman, is that bad statistics have so badly skewed our view of reality that our dear leaders have turned this once scientific field into yet another political playground.

The blame for this boondoogle lies with—wait for it—me. Yes, me. It is I and other professional statisticians who are responsible for the gross misunderstandings like those we saw above which plague science. I cannot blame Real Climate. The author there really did think he was doing the right thing. Mea maxima culpa. I absolve the author.

Our textbooks are awful; the errors which you see nearly everywhere are born there and are allowed to grow without check. Professors are too busy proving yet another mathematical theorem and have forgotten what their original purpose was; and when you ask them questions they answer in jargon and math. Probably the more egregious fault is how we let students escape from our classrooms misunderstanding causality. We really do write blush-worthy things like “due to random chance” or “the result wasn’t statistically significant so A and B aren’t related.”

How to fix this mess is an open question. All suggestions welcomed.

———————————————————————————————

1The reason your mind reels from this example is that you understand that the temperatures, and the physics that drove those temperatures, are different in nature at those two cities. The problem with statistical modeling is it encourages you to cease thinking of causality, the real goal of science, and in terms of ritual. Wave this mathematical wand and look at the entrails of the data. See any wee p-values? Then your faith has been rewarded. If not, not.

Summary Against Modern Thought: That God Is Goodness Itself

This may be proved in three ways. The first...

This may be proved in three ways. The first…

See the first post in this series for an explanation and guide of our tour of Summa Contra Gentiles. All posts are under the category SAMT.

This post is one that has been restored after the hacking. All original comments were lost.

Previous post.

A short exercise (I’m traveling) showing God is not just good, like your breakfast taco might have been, but goodness itself.

Chapter 38: That God is Goodness Itself

1 FROM the above we are able to conclude that God is His own goodness.

2 For to be in act is for every thing its own good. Now, God is not only being in act, but is His own being, as proved above.[1] Therefore He is goodness itself and not merely good.

3 Further. The perfection of a thing is its goodness, as we have shown above.[2] Now the perfection of the divine being does not consist in something added thereto, but in its being perfect in itself, as proved above.[3] Therefore God’s goodness is not something added to His essence, but His essence is His goodness.

Notes God is pure act, actuality itself, which is to say, being itself. God has no potentiality. God existence and essence (as was showed earlier) are one. Potentiality (we learned last week) is to have the tendency to imperfection,rather it is the presence of imperfection (think about any real instantiation of a a circle), while being in act is a kind of perfection. Since God is pure act, He is perfect, which is a good, and thus goodness itself.

4 Again. Any good that is not its own goodness is good by participation. Now that which is by participation presupposes something antecedent to itself, from which it derives the nature of goodness. But it is not possible to continue thus to infinity: since in final causes there is no proceeding to infinity, for the infinite is inconsistent with finality: and the good has the nature of an end. We must therefore come to some first good, that is good not by participation in relation to something else, but by its essence. Now this is God. Therefore God is His own goodness.

Notes Perhaps another way to put this is that there must be an ultimate reference. If Goodness Itself isn’t God, then the good is a matter of dispute, mere opinion. And not even mere opinion, because I could have the opinion that good is not a matter of opinion. You cannot even say one thing is better, i.e. more good, than another. Goodness disappears without God. All goodness. There is nothing but brute fact. Which is absurd. Therefore God must be the ultimate comparator.

Transitivity can exist in real choices (A is better or more good then B, B better than C, but C better than A, as perceived by you), but the idea that one thing can be better than another also exists. Again, you can say that a good interocitor is one which is long. A is longer than B, which seems like good is quantitative. But it is the idea that the good exists which is at base. Long interocitors are good, and longer ones better, by definition. But none will be on infinite length. The same idea of “flaw” is present in every material thing. Only God is without this “flaw.”

5 Again. That which is can participate something, but being itself can participate nothing: because that which participates is potentiality, whereas being is act. Now, God is being itself, as we have proved.[4] Therefore He is good not by participation, but essentially.

6 Moreover. In every simple thing, being and that which is are one: for if they be distinct, there is no longer simplicity.[5] Now, God is absolutely simple, as we have proved. Therefore that He is good is not distinct from Himself. Therefore He is His own goodness.

Notes A circle hewn of wood can participate in circleness. But that the circle exists, rather its existence, is act, and to be in act is to exist, and act doesn’t participate in being, it is in being. And we earlier showed that God is being itself, and simple. Don’t forget that “simple” is a technical word here. It means lacking potentiality.

————————————————————————-

[1] Ch. xxii.
[2] Ch. xxxvii.
[3] Ch. xxviii.
[4] Ch. xxii.
[5] Ch. xviii.

On The Uselessness Of Lie Detector (And Medical) Screenings, And The Ames Spy Case

This post is one that has been restored after the hacking. All original comments were lost.

Have you seen the television series The Assets? Dramatization of the Sandy Grimes-Jeanne Vertefeuille book Circle of Treason: A CIA Account of Traitor Aldrich Ames and the Men He Betrayed. Highly recommended.

In the movie, we twice see Ames hooked to a “polygraph”, which is to say “lie detector”, a device which (as we’ll see) should always be written in scare quotes. Ames is pictured as being nervous, fretting he wouldn’t pass because, of course, he was a spy for the Soviet Union, that happy place where Equality by law reigned supreme. Skip it.

The set designers did a good job reproducing the equipment of the time: it looked a lot like they showed. I know this because I was in the service in those years in a super-secret field (cryptography) which required that I, too, be hooked up and tested.

Television being television, shortcuts are taken, but the mood isn’t too far off. The examiner comes into the room and the attempted intimidation begins. An Expert Is Here! He fastens tight things around your chest, arms, hands. You are told to sit perfectly still—movement will disrupt the test! You, the test subject, feel (a) like an idiot, and (b) guilty.

The examiner doesn’t jump right into are-you-a-spy questions. No. He instead wants to prove to you the machine works, so that you don’t dare conceal a lie. I recall once the man asked me to pick a number between (I think) one and ten. He asked me which. Six. He says, “I’m going to ask you if your number was one, two, and so forth. Each time you must say no, even when I reach your number.”

“…Is your number five?” No. “Is your number six?” No. Etc.

At the conclusion of this scientific demonstration, the examiner shows you some squiggles on a piece of paper. “See here? That’s when you said no to six. These lines indicate you’re lying.” If you have any brains, you know it’s at this point you’re supposed to marvel at both the examiner’s and the machine’s perspicacity. “Wow. That’s cool.”

And then it’s off to the spy questions, the wording of which is well realized in the movie. You can’t help, helpless as you are, staring at a blank wall (the examiner never lets you see him during the test), trying not to breathe “abnormally”, to feel that, hey, maybe I am a spy.

I wasn’t.

When the test is over, it isn’t. Invariably, there is a long pause. And then a sigh from the examiner. “Sergeant Briggs…we have a little problem with one of the questions. Can you help me with that?” Which question he doesn’t say. But, and this is true, at this point many crack and begin to confess. Whereas any with an IQ greater than the median knows to say, “Golly. I don’t know.”

If you do that, the game for the examiner is up. He’s forced to pick one of the questions and ask something specific. “It was when I asked about selling information. There was a slight indication.” And you say, “Wow. Really? I have no idea.” Back and forth a couple of times like that, with you playing the happy, cooperative, friendly fool, and you’re done.

Just like Aldrich Ames. Who always passed his tests. Ted Koppel asked Ames about this, and Ames scoffed (properly, in my view) calling them “sorcery.” I’m unable to discover the second half of the video in which Ames makes this statement, but here is a letter he wrote on the same subject.

The polygraph is asserted to have been a useful tool in counterintelligence investigations. This is a nice example of retreating into secret knowledge: we know it works, but it’s too secret to explain. To my own knowledge and experience over a thirty year career this statement is a false one. The use of the polygraph (which is inevitably to say, its misuse) has done little more than create confusion, ambiguity and mistakes. I’d love to lay out this case for you, but unfortunately I cannot — it’s a secret too.

Most people in the intelligence and CI business are well aware of the theoretical and practical failings of the polygraph, but are equally alert to its value in institutional, bureaucratic terms and treasure its use accordingly. This same logic applies to its use in screening potential and current employees, whether of the CIA, NSA, DOE or even of private organizations.

Deciding whether to trust or credit a person is always an uncertain task, and in a variety of situations a bad, lazy or just unlucky decision about a person can result not only in serious problems for the organization and its purposes, but in career-damaging blame for the unfortunate decision-maker. Here, the polygraph is a scientific godsend: the bureaucrat accounting for a bad decision, or sometimes for a missed opportunity (the latter is much less often questioned in a bureaucracy) can point to what is considered an unassailably objective, though occasionally and unavoidably fallible, polygraph judgment. All that was at fault was some practical application of a “scientific” technique, like those frozen O-rings, or the sandstorms between the Gulf and Desert One in 1980.

I’ve seen these bureaucratically-driven flights from accountability operating for years, much to the cost of our intelligence and counterintelligence effectiveness. The US is, so far as I know, the only nation which places such extensive reliance on the polygraph. (The FBI, to its credit in a self-serving sort of way, also rejects the routine use of the polygraph on its own people.) It has gotten us into a lot of trouble.

Ames said the CIA believed. Which is true. Why do they believe? Because lie detectors sometimes “work”, in the sense that some confess. But people confess to interrogators all the time, which is no proof the machine works.

There is instead ample proof that Ames was right and that lie detectors are no better than eye-of-newt witchcraft. So why are they still around?

Now most people are not spies. Something far north of 99% screened are innocent. (This should remind you of mammographies and prostrate cancer screenings.) If the examiner says everyone is not a spy, then he will be right north of 99% of the time.

The examiner may then boast to himself, to CIA, to Congress, to God Himself, that his machine has an accuracy rate higher than 99%! Sure, he missed a handful of fellows, but nobody bats 1.000. You just can’t beat 99%!

Yes, you can. This is why we need the idea of skill, which measure improvement over naive guesses like “everybody’s innocent.” I’ve written about the use of these skill scores in medicine (most women don’t have breast cancer, most men don’t have prostate cancer), but they have yet to gain any traction.

Oh, until global warming came around, meteorologists and climatologists used to judge their models with skill scores. I wonder why they stopped?

« Older posts Newer posts »

© 2016 William M. Briggs

Theme by Anders NorenUp ↑