McKibben’s Folly
Suppose it is true that we have E = “A six-sided object, just one side of which is labeled 6, and when tossed only one side will show.” We want the probability of R1 = “A 6 shows.” That is, we want
(2) Pr( R1 | E ) = 1/6.
This result only follows from the evidence in E to R1. It has nothing to do with any die or dice you might have. We are in French-speaking-cat land (see Part I and links within) and only interested in what follows from given information. In particular, we don’t need in E information about “fairness” or “randomness” or “weightedness” or anything else.
(For those seeking depth, (2) is true given our knowledge of logic.)
Now suppose I want to know Rn = “A 6 shows n times.” This is just
(3) Pr( Rn | E & n) = (1/6)n,
where if want a number we have to change the equation/expression/proposition. Anyway, let n be large, as large as you like: (3) grows smaller as n grows larger. Except in the limit, (3) remains a number greater than 0, but if n is bugambo big, (3) is tiny, small, wee. Indeed, let n = 126 and (3) becomes 8.973 x 10-99, a number which is pretty near 0, but still of course larger than 0. Right, Mr McKibben?
Since we assume in (3) that E is true (and n is given), the result is of very little interest. All we have are changing and decreasing numbers as n increases. End of story.
Now suppose we turn equation (3) around and ask
(4) Pr( E | Rn & n) = ?
In words, given we have seen a 6 show in n consecutive tosses what is the chance E is true? That is, given we have seen, or we assume we have seen a 6 show in n consecutive tosses, what is the chance that we have a a six-sided object, just one side of which is labeled 6, and when tossed only one side will show? On this reading, “a 6 shows in n consecutive tosses” implies at least part of E: although not completely. There isn’t information that the object is 6-sided, or that just one side can show, but there is evidence that a 6 is there.
In other words, “Rn & n” gives us very little information except that E is possible; therefore
(4′) 0 < Pr( E | Rn & n) < 1
and that is the best we can do. And this is so no matter how large n is, no matter how small (3) is. What follows from this is that E is not proved true or false no matter what (3); indeed, in (3) we assume E is true. Again: E cannot be proved true or false whatever (3) is or whatever n is. And this is it. The end.
Yet somehow E “feels false” if n is large. It is not false, as we have just proved, but it might feel that way. And that’s because we reason something like this: given our B = “experience at other times and places with data and situations that vaguely resembles the data and situation I saw this time, simple models like E turned out to be false and other, more complicated models, turned out to be true.” That is,
(5) Pr( E | B ) = small.
This is well enough, though vague. It has nothing to do with (4) however: (4) is entirely different. Plus, (5) says absolutely nothing about any rival “theories” to E. And then there are these interpretations:
(6) Pr( E | The only theory under consideration is E ) = 1,
and
(7) Pr( E | E can’t be true [because of (3)]) = 0.
Equation (7) is true, but circular; equation (6) is also true because unless we have specified a rival theory for how the data arose, we have no choice but to believe E (this is key). Equation (7), which contains a fallacy, is what many have in mind when judging E.
Now let M = “A 6 shows on roll 1, a 6 shows on roll 2, …, a 6 shows on roll n.” Then
(8) Pr( M | Rn & n) = 1.
Our “model” M is obviously true, though (8) is also circular. But we can “uncircle” it by changing M to M’ = “A 6 always shows.” Then
(8′) Pr( M’ | Rn & n) = 1.
Of course, if at the n+1-th roll a 6 does not show, we have falsified M’ (but not M; we’d have to write Pr(M’|Rn & n & not-six on n+1) = 0).
M and M’ are very good models because they explain the data perfectly. But equations (8) or (8′) have nothing to do with (4) or (4′). To judge between E and M, we’d have to start with a statement which assigns a prior belief (before seeing the data R) which of these models were true; then after seeing the data we can update the probability either model is true. But we cannot, using just R, say “E is false” or “M is true.”
The main point, if it is not already obvious, is that any observation will not prove a model or belief false, unless it’s a very special and rare situation like M’ (and observing something M’ said was impossible). What we really or often mean when we say “Just look at R; E is false!” is that we have some rival model N in mind, a model which we are sure is true. McKibben is convinced that N = “death-from-the-skies global climate disruption” is true but uses an equation like (3) to prove N. This is a fallacy. Eq. (3) has nothing in the world to do with N; if E is true then no observation can prove E false, or even show it is unlikely because, of course, E is true.
Now for real temperatures, the model N could be true; but so could many other rival models, and so could a model like E, suitably modified. The climate Ec can be “the temperature can fall into one and only one of three buckets, labeled low, normal, high.” Thus Pr(high | Ec) = 1/3 and so on. Or Pr(high n times in a row | Ec & n) = (1/3)n as before. This means
(9) 0 < Pr( Ec | high n times in a row & n) < 1
just as before. Since there is no news about N in (9), N is irrelevant.
And just as before, we can start with an a priori judgment about the likelihood of N or Ec being true; and after seeing the data we can update these judgments. It will be the case, unless the a prior judgment is very skewed towards Ec, that N will be morel likely than Ec given the data.
But this does not mean that N is more likely true when judged against other models of the climate. We can, as we have just seen, compare N only against the straw man Ec, but this gives no evidence whatsoever about N and (say) W = “a climate model which does not assume the world will soon end unless new taxes are raised and given to politicians” (or any other climate model we might imagine).
It is therefore cheating, like all straw man arguments are cheating, not to use the best available competitor to N.
Life was tough in the Stone Age. The probability of a human fetus surviving to become a full grown human was very small. Babies died in the womb, in childbirth, from infant diseases and childhood accidents, or if they made it to adolescence, from a 1,000 and one other fatal factors.
Let’s call S the event of a pre-modern human fetus surviving to adulthood, and estimate the Pr{S} = 0.01. That is to say, there was only a one in a hundred chance that any fetus survived the rigors of birth and early life and made it all the way to adulthood.
Let’s call B the event that Bill McKibben exists.
In order for Bill McKibben to exist, an unbroken chain of survivors (his ancestors in his family tree) must stretch back to at least Mitocondrial Eve, the matrilineal most recent common ancestor (MRCA) of modern humans who lived in Africa no more recently than 150,000 years ago. At 20 years per generation, that’s 7,500 generations of Bill’s direct ancestors who must have survived infancy and childhood and grown to adulthood, and procreated successfully, or else he wouldn’t be here.
Clearly, the Pr{B|S} = Pr{S}^7,500 = (0.01)^7,500 = zero point zero one raised to the seventy-five hundredth power, a number so vanishingly small as to be zero for all intents and purposes.
Therefore Bill does not exist, or else he is so unlikely as to be ridiculous and not worth talking about.
I went and looked at the Rolling Stone article. I think his fallacy is far less subtle than equating a low probability of N consecutive high record months to global warming. His claim is: you (not me or Al, of course) are causing global warming and you must change your evil ways.
Let’s grant him the point that umpteen top temperature months proves global warming. Its existence doesn’t prove the cause and the expected result of any course of action.
The proper response to his low probability event is: so what?
I didn’t see where Mr. McKibben defined his three buckets beyond “exceeded the 20th century average” for the one bucket he is interested in. “Average” is a single number but a real analysis would require some measure of dispersion or distribution such as standard deviation.
Note that “If P(B|A)=0, then P(A|B)=0” is correct, logically, assuming P(A)>0 and P(B)>0. It’s analogous to the “contrapositive” taught in high school: “If A implies not B, then B implies not A.” It’s tempting to think that the 0’s in the first expression can be replaced by very very small positive numbers. But the logic breaks down: it’s easy to find an example where P(B|A) is tiny, but P(A|B) is very close to 1. Just take a case where A and B are independent, P(B) is tiny, and P(A) is close to 1.
Speaking of straw men, while McKibben referred to “the 327th consecutive month in which the temperature of the entire globe exceeded the 20th-century average, the odds of which occurring by simple chance were 3.7 x 10-99”, I don’t see any evidence of his confusing this (dubious)claim about P(data|no warming) with any specific claim about P(warming|data) and certainly not with anything about the actual rate of warming (which he does address – but by other arguments).