Probability is screwy, and we statisticians do a horrible, rotten job of teaching it. The first thing students learn in normal statistics classes is about “measures of central tendency” or some such thing. The idea of what probability means and why anybody would have the slightest interest in “central tendency” is never broached. As a consequence, students leave statistics classes with a bunch of half-remembered formula and no clear idea of what probability is.
This is unfortunate, because it allows educated men like Rolling Stone’s Bill McKibben to write the following:
June broke or tied 3,215 high-temperature records across the United States. That followed the warmest May on record for the Northern Hemisphere — the 327th consecutive month in which the temperature of the entire globe exceeded the 20th-century average, the odds of which occurring by simple chance were 3.7 x 10-99, a number considerably larger than the number of stars in the universe.[see note at bottom of page]
Poor man! Poor readers! McKibben actually believes he has said something of interest; he has worked himself into a lather over these numbers and goes on to say things like “the seriousness of our predicament”. McKibben figures that such a small number can only mean that we are doomed—unless, of course, massive amounts of money is taken from this country’s citizens and given to its politicians to apply as they see fit.
Now over the last week I tried to explain, via two examples, just what probability is and what it isn’t, and why numbers like McKibben’s aren’t of the slightest interest. See this post about global warming and this one about nine feet tall men. And if you find yourself disagreeing with me, read this one about foundations. You must at least read the first two posts because I assume it below.
What Probability Is
Suppose I let the symbol Q stand for “There are no men taller than nine feet,” and the expression D = “I observe a man 8.979 feet tall.” Let’s take this equation, or as some readers prefer to say, expression:
(1) Pr(D | Q)
and try to solve it.
Equation (1) is a matter of logic. It is just the same as Lewis Carroll’s French speaking cats: We know that if R = “All cats are creatures understanding French and some chickens are cats” that the proposition F = “Some chickens are creatures understanding French” is true; that is Pr(F | R) = 1. And this is so even if nobody ever, not ever never, in no possible world in no possible time, never never never measures or observes or sees or posits on genetical arguments any cats understanding French. It is true even if we learn tomorrow from God Himself that He has decreed that it is a logical and physical impossibility that any cat could understand French. F given R is true and that is that: and it is true because, again, logic only makes statements about the connections between propositions. Logic is mute on the propositions themselves.
All logic, which is to say all probability, because it is solely interested in the connection between expressions, must regard propositions as fixed. In any given equation, we cannot add or subtract from these expressions: we must leave them as they are: they are not to be touched: they are sacrosanct: they exist as they are and are carved out of uncuttable stone: we are forbidden upon pain of death to manipulate them in any way. For I testify unto every man that heareth the words of these theorems, If any man shall add unto these propositions, God shall add unto him the plagues that are written in Greenpeace press releases: And if any man shall take away from the words of these propositions, God shall take away his part out of the Book of Life. I am not sure how much more of a dire warning I can issue. Don’t touch Q or D!
Equation (1) says that assuming Q is true, assuming, that is, that there are no men taller than 9 feet, that it is true that there are no men taller than 9 feet, that it is impossible there are men taller than 9 feet, that God himself has willed that there are no men taller than 9 feet, that in any possible world there cannot be men taller than 9 feet, that it is just a fact, immovable, imperturbable, irrevocable that no man can be taller than 9 feet—even if we want one to be, even if we can imagine it to be so, even if real men are actually observed to be taller than 9 feet, even if you yourself are 9’1″—given, as I say, all that, what is the chance you see a man a quarter-inch short of 9 feet?
Well, on reading D to mean seeing a man shorter than 9 feet, (1) is certain, i.e. Pr(D|Q) = 1; or on reading D to mean seeing a man precisely 8.979 feet—the actual writing of D after all, and we know we should not touch D—the best we can say is 0 < Pr(D|Q) < 1 because we have no information on how heights are distributed; all we know is that heights are contingent, meaning it is not certain (given the information we have) that all men must be precisely 8.979 feet. And therefore all we can say is “I don’t know.”
We must judge equation (1) as written! Not as we imagine it to be written, or how it might be written differently is we change the meaning of Q and D. Or about how we feel about Q and D. How it is written and nothing else.
It’s kind of funny, but if we turn probability into math there wouldn’t be the slightest interest or confusion. Suppose instead Q = “X < 9” and D = “X = 8.979” where X is just some number unrelated to any physical real thing. Then Pr(D | Q) no longer seems mysterious. In this case it’s hard to see where to add bits about, “In my opinion, we might see X larger than 9” or “I would suspect that if X did equal 8.979 then X will be greater than 9.” Indeed, if anybody did announce the latter, you would regard him as eccentric. You’d say to him, “Listen, pal. These are just numbers. They don’t mean anything. And by assumption, no number can be greater than 9. So you are speaking out of your hat.”
Or change them again: Q = “Just half of all winged blue cats who understand French are taller than 9 feet” and D = “Observe a winged blue cat who understands French standing 8.979 feet”. Once again, we are not tempted to change Q and D and we interpret them as written.
Today’s lesson: don’t touch the propositions!
In Part II: McKibben’s Fantasy
———————————————————————–
If there were only 3.7 x 10-99 stars in the universe, there would not even be 1 star. 3.7 x 10-99 is of course less than 1.
Continuing the comments from the previous post… Alan does seem to have (1) backwards. Expression (1) is P(D|Q) not P(Q|D). However, it’s not clear to me why P(“You find a man just a quarter inch short of nine feet” | “No man is more than nine feet tall”) = 1. I think that P(“You find a man not more than nine feet tall” | “No man is more than nine feet tall”) = 1.
After reading your current post again more carefully… you seem to be meaning D=”The height of this guy here whom we’ve already found and measured is 8’11.75″. Assigning probabilities then seems pretty weird. Are you trying to give meaning to the statement “P(X=8.979|X<9)=1" ? Yipe. Is it the same as "P(8.979<9)=1" ?
I am also a little confused by what D means.
In the 1st post I took it to mean that I have met someone 8’11.75″. In this case P(D|Q) = 1 – it’s happened! This is contrasted with F – meeting someone who is 8’11.75″. Where the P(F|Q) is some unknown between 0 and 1, depending upon the variation of height in the population.
The importance of tense in the propositions(“I have met” vrs “meeting”) indicates to me they are tricky things where interpretation can really affect the results.
In this post the statement D is tense neutral: “I observe a man 8.979 feet tall”
How that is interpreted would seem to greatly affect how you calculate your priors, likelihoods etc. Prof Briggs says we can’t “touch D”. But how we interpret it seems to affect the probabilities we assign to it.
That seems to blur the precision that this is only about the relationship between propositions. Oh woe!
Yes I was wrong to accuse Briggs of claiming that P(Q|D)=1, but I share the concerns of Jonathan D and SteveBrooklineMA regarding P(D|Q).
In fact I don’t see how one can deduce P(D|Q)=1 even for the less restrictive case where D=”I observe a man” as there is no logical entailment that any men even exist from the assertion that none are taller than 9ft.(This contrasts with the Lewis Carroll example in which the existence of chickens which are cats and the assertion that all cats understand French do together imply the existence of chickens who understand French)
Surely if Q = “X < 9″ and D = “X = 8.979″ then D implies Q so in that case we CAN say P(Q|D)=1 but the claim that P(D|Q)=1 (or for that matter any other substantive claim about P(D|Q))remains, to me, mysterious.
Pr(D|Q) is a statement and its value is the likelihood that it is true. It’s value must be 1 if D is true**. Effectively, in context, Q is irrelevant. The statement could just as easily have been Pr(D | vanilla ice cream has no bones). The problem with converting English into math-like expressions is that it loses the nuances between the verbs “did” and “could have done”.
Pr(found a man 9.25 ft. tall| no man can exceed 9 ft.)=0 as it is logically impossible since the D and Q here are mutually exclusive. Another case where converting English (or partial English) looks strange, particularly if you treat the ‘|’ as a special ‘and’ operator making all that follow true.
** So, to get the value, you have to make the tacit modification of the proposition: Pr(D|Q & you never lie about things like D).
Seems this kinda breaks Briggs’s lesson du jour: don’t touch the propositions!. Irony and all that.
The second sentence,It’s value must be 1 if D is true, really should have been:
It’s value must be 1 if D and Q are true.
Another interpretation of ‘|’: when all that follows is true.
Ahem, carved out of uncuttable stone… speaking of logic.
Thanks DAV, that’s helpful. I think Pr(SteveBrooklineMA<9' | any true statement)=1. Briggs' original text though is: "It says given that we know, or accept as true, that no man can be taller than nine feet, what is the probability of seeing a man less than nine feet, specifically a man a quarter inch shorter than nine feet." If he had ended it before "specifically" then I think this probability is 1.0 (assuming per Alan that we assume the ability to meet a man). But with the part after "specifically" in there, it's not so clear. I've come to believe that common language doesn't work well for these sort of problems. I'd prefer well-defined sets and functions.
I thought an equation has an equal sign somewhere.
Okay this is freaky. What is the probability that 1 person would be present at 2 differents shoothing in her lifetime.
One of the victim was present in a shooting in Toronto in last June and died in the theater shooting in Colorado.
Considering how rare these events are, I mean you have to be the unluckiest person to witness something like that.
SteveBrooklineMA, All,
No. D means “I observe a man 8.979 feet tall.” D does not mean “I already have observed a man 8.979 feet tall.” When stated as Pr(D|Q) is means the chance of seeing a man 8.979 feet tall given that no man can be taller than 9 feet. Q contains no information about having observed anything.
Your “Pr(SteveBrooklineMA< 9′ | any true statement)=1” is false. Let T = “Either DAV has cancer or he hasn’t” is a true statement. Then Pr(Steve < 9′ | T) is either unknown or is strictly between 0 and 1—but only if we read into T information that “Steve < 9′” is contingent.
Alan,
No. If we accept as true Q = “X < 9” then Pr(“observe an X = 8.979” | Q) is either true on one reading or merely possible on anther. If in all these things we take D to mean “The possibility of observing a man less than 9 feet” then if we assume Q contains information that men will be less than 9 feet, then Pr(D|Q) = 1. But if we interpret D to mean any many exactly 8.979 feet, the probability D is true given only that we know that no man can be taller than 9 feet is merely probable, i.e. this probability is between 0 and 1 until such time as we add information to Q about the distribution of heights.
DAV,
Nope. The equation/statement/expression/call-it-what-you-will is about D’s truth, it does not assume it. We judge its true assuming Q true, so Q is absolutely relevant. There is no and can be no Pr(D): there must be some Pr(D|E) where E is some evidence/premises. And Pr(D|E) = Pr(D|Q) is only true is E=Q or coincidentally.
It is perfectly possible to write “Pr(D | vanilla ice cream has no bones).” In this case all we can say is the answer is unknown since we cannot even glean that D is contingent from this information.
Your second equation “Pr(found a man 9.25 ft. tall| no man can exceed 9 ft.)=0” is so, and so is “Pr(will find a man 9.25 ft. tall| no man can exceed 9 ft.)=0” and so is “Pr(will have found a man 9.25 ft. tall| no man can exceed 9 ft.)=0” and so is “Pr(can find a man 9.25 ft. tall| no man can exceed 9 ft.)=0” and so is “Pr(possible to find a man 9.25 ft. tall| no man can exceed 9 ft.)=0” and so on.
Sylvain Allard,
This is an excellent question. To put it in notation
Pr(1 person would be present at 2 different shootings in her lifetime)
And it cannot be answered. There is no “|” and therefore no evidence on which to base an answer.
JDougherty,
Exactly!
Briggs,
Well it did come out a little muddled.
If you aren’t assuming the truth of D in your example then Pr(D|Q)=1 and Pr(D’|Q)=0 are coming out of thin air. Pr(X less than 9| X must be less than 9) can only be 1 when it is known for certain that X is indeed less than 9. Without any assessment of the truth of D, how can you claim a value for Pr(D|Q)? Surely, you aren’t claiming a D with any X value still makes Pr(D|Q)=1?
DAV,
Yes, I was sloppy in my language. Strictly as she is written:
0 < Pr(D|Q) < 1
and that is it. I should have better said
Pr(The possibility of D|Q) = 1
which left side is now not D, but which some interpret as D (including me at one point, but I did not make it clear).
Briggs,
Wouldn’t it be better to say that the value of Pr(D|Q) is effectively what Q says about D? Then if I have a D in hand, I can say (e.g.) that Q says the 9.25 ft man I found is impossible (silly Q)? Or if the man was 8.95 ft. tall then Q says “Yep. Told ya”?
Pingback: What Probably Isn’t: Heat Waves and Nine Feet Tall Men: Part II | William M. Briggs
Let N be a positive integer. Let H be the set of real-valued functions which have domain {1,2,…,N} and which are bounded above by 9. Then I might buy saying Pr(there is an h in H with 8.979 in its range)=1, because such a function h exists. I might buy saying Pr(there is an h in H with 10 in its range)=0, because no such function h exists. But Pr( 8.979 is in the range of h | h in H) = 1 seems wrong to me, because not every h in H has 8.979 in its range. This last statement seems akin to Briggs’ statement “Pr(The possibility of D|Q) = 1.” In order to give the last statement meaning, I would assume a probability distribution on H.
Briggs,
By saying 0 < Pr(D|Q) < 1 are you meaning to imply that P(X = 8.979 | X<9) actually has a well defined value which is somewhere between 0 and 1 or are you really just meaning to say that does not exist?