Men Nine Feet Tall And Bayes Theorem

The OFloinn put up a most readable and recommended essay When is Weather Really “Climate”? and in one of the comments a reader named Gyan in part said:

Many economists and radical empiricists claim to reduce the whole of rationality to the Bayes’ Theorem. But John Derbyshire in his popular book on Riemann Hypothesis provides a curious counterexample.
Suppose you have a proposition that no man is more than nine feet tall. Then you find a man just a quarter inch short of nine feet.
Should your confidence in the proposition increase or not?
By Bayes’ it seems it should but common sense tells me that it should decrease.

I admit to flying somewhat blind here, because I don’t have Derbyshire’s book and can’t read his example; nevertheless, nothing ventured etc.

The economists and radical empiricists are partly right: it’s Bayes all the way, but only in the logical sense, i.e. the sense in which Bayes describes the probabilistic relationship between propositions, just as traditional logic only describes the logical relationship between propositions. About the origin of the propositions, and of fundamental truth, about which propositions are worthy of entertaining and which not, Bayes and logic are silent. In other words, radical empiricism is false, as it just-plain empiricism, and most of what economists say is best left unsaid. But of these things, another day. On to the example!

For ease of writing, let Q = “No man is more than nine feet tall,” and let D (for data) = “You find a man just a quarter inch short of nine feet.” These are two propositions and we can use Bayes, i.e. extended logic, to say something about their relationship. For instance, we can ask

     (1) Pr( D | Q )

or we can ask

     (2) Pr( Q | D ).

These probabilities are not the same, and are rarely the same for any two propositions; and unless you are clear about which you mean, you can easily mix them up.

Equation (1) is easily solvable. It says given that we know, or accept as true, that no man can be taller than nine feet, what is the probability of seeing a man less than nine feet, specifically a man a quarter inch shorter than nine feet. The answer is, in this interpretation, 1, or 100%. Of course it is! We have just said that it is a fact that no man can be taller; and here is a man who is indeed not taller.

This interpretation is not the same as F = “Any man a quarter inch shy of nine feet”. That would be

     (3) Pr( F | Q )

and to answer it fully would require we know more about the distribution of heights (F is about any old man; D is about a man). What we do know about heights is this: we know, via deduction, they are greater than zero feet, and, by assumption, they are less than nine feet. Therefore, the best we could say about (3) is that its probability is between 0 and 1. Now you might be tempted to say it is closer to 0 than to 1, but that is because you are implicitly adding information to Q, to the right-hand-side. That is, you might add information to Q about your experience with real heights of real men, experience which suggests a decreasing probability for very high heights. If you say (3) is closer to 0 than to 1, you are actually answering

     (4) Pr( F | Q & My experience about actual heights)

which you can see is not (3) and is therefore not an answer to (3).

Now turn the question around and answer (2): this is the chance that no man is taller than nine feet given we have seen one just shy of that number. The answer feels like it will be close to 0, but again that is because we are not strictly answering (2)—the strict answer to (2) is unknown, or perhaps just between 0 and 1 if we assume the contingent nature of these events. But what we really think we are answering is

     (5) Pr( Q | D & My experience about actual heights),

and that seems to make (5) close to 0. Let’s call E = “My experience about actual heights.”

What about Bayes’s theorem? Well, it’s easy to work out that (5) is equal to (via Bayes’s theorem):

     (5′) Pr( Q | D & E) = Pr( D | Q & E )Pr( Q | E )/Pr( D | E ).

This “updates” our belief in Q from Pr(Q | E) to Pr( Q | D & E) based on observing our “data” D. About the exact value to Pr(Q | E), I don’t know (here’s another point where we depart from economists and empiricists: Bayes does not claim all probabilities are quantifiable). As long as E doesn’t contain information contradictory to Q, such that Q is false given E, then we’re okay. In my mind, using my E, Pr(Q | E) is high, close to 1 (my E says I don’t know of any man taller than nine feet).

That leaves us Pr( D | Q & E ) and Pr(D | E) to figure out. We can attack Pr(D|E) directly or it turns out that Pr(D|E) = Pr(D|Q&E)Pr(Q|E) + Pr(D|not-Q&E)Pr(not-Q|E). The first part is just a repeat of the numerator, and “not-Q” means “it is false that no man is more than nine feet tall.” Let’s be lazy and answer Pr(D|E) directly: this is the probability of seeing a man 8′ 11.75″ given E. Pr(D|E) might be close to 0. But then so will Pr( D | Q & E ).

We already assumed Pr(Q|E) was “large”, so that if Pr( D | Q & E ) < Pr(D|E) then Pr(Q|D&E) < Pr(Q|E), i.e. our belief in Q shrinks after seeing D. But if Pr( D | Q & E ) > Pr(D|E) then Pr(Q|D&E) > Pr(Q|E) and our belief in Q increases after seeing D. Whether “Pr( D | Q & E ) < Pr(D|E)” or “Pr( D | Q & E ) > Pr(D|E)” is true depends entirely on E, which since it is so fuzzy makes this problem difficult and (sometimes) seemingly against intuition.

11 Comments

  1. Alan D McIntire

    I remember years ago reading a similar article by Martin Gardner. His example was, “Test the hypothesis that all spades in a deck of cards are black.” One picks up a
    deck of cards, turns over a red heart ( confirming example), a red diamond(confirming example), and a blue club(confirming example).

  2. Carmen D'oxide

    I think he meant pretend economists like Paul Krugman, for instance.

  3. The crux of this problem lies in what is meant by the proposition “that no man is more than nine feet tall”.

    If this means that all men have been measured and none is more than nine feet tall (or if physics or biology tells us that being taller than nine feet is impossible) then seeing someone almost nine feet tall is irrelevant: the probability is 1 that any man is no more than nine feet tall.

    If the proposition means that we have observed a thousand men (or a million men) and none was more than nine feet tall so we extrapolate and propose that no man is taller than nine feet, then seeing a man who is just short of nine feet tall might change our proposition, or it might not. Did we see any men almost nine feet tall before we formed our proposition? If so, then seeing another won’t change our estimate much. However, if the tallest man we saw previously was, say, 7 feet and now we see someone almost 9 feet, we probably should update our probabilities.

    Everything depends on how and why we formed the original proposition.

  4. Briggs

    Charlie B,

    Thank you. I understand what you mean, however it’s not quite on target.

    The proposition Q = “No man is taller than 7′” does not mean anything except what it says. All those other thing you mention like “we have observed 1000 men…” or that we did or did not see any men taller than 7′ are not relevant to Q as it stands. However, they are relevant only when we want to ask questions about Q, as in Pr(Q | we have observed a man taller than 7′) or Pr(Q | we have never observed a man taller than 7′) or Pr(Q| we have observed 1000 men, none taller than 7′) or whatever.

    Again, logic and probability have only to do with the connections between propositions and not the propositions themselves. Earlier this week I wrote (suitably modified) “we can conclude that Dr. Dodgson was right when he wrote that given, E = ‘All cats are creatures understanding French and Some chickens are cats’ that the proposition Q = ‘Some chickens are creatures understanding French’ is true—and deduced to be true at that.”

    That is, Pr(Q|E) = 1. And that is so whether we have observed 1000 or none cats understanding French, and 1000 or none chickens who are cats. If we have additional evidence that F = “We have observed 1000 cats understanding French” then we can ask what is Pr(Q|E&F) and that may or may not equal Pr(Q|E), but they are in any case different questions.

    The rule is we ask, for some Q and some explicit evidence E, Pr(Q|E). We do not ask Pr(Q|E) & F for any F; though we can ask Pr(Q|E&F). There is a world of difference.

  5. “Equation (1) is easily solvable. It says given that we know, or accept as true, that no man can be taller than nine feet, what is the probability of seeing a man less than nine feet, specifically a man a quarter inch shorter than nine feet. The answer is, in this interpretation, 1, or 100%. Of course it is! We have just said that it is a fact that no man can be taller; and here is a man who is indeed not taller.”

    Am I correct in assuming that by the first sentence you mean that the *expression* (1) is easily *evaluated*? Even if that is the intent, I am still puzzled by your argument because finding a man less than nine feet tall does not preclude the existence of others who are taller. Even if the single height sampled at 8’11.75″ comprises the entirety of our knowledge of the concept of “man” I would expect that your concept of probability should take some account of the fact that there may be other information that we do not know and so that the conditional probability should still be less than 1.

    Of course there is nothing special about the number 9 here, so I gather that your argument is also intended to imply that if D was that we observe a man less than 5′ tall and Q was the assertion that no man is taller than 5′ then it would be true that P(Q|D)=1. Am I reading you right here, or am I missing some restriction on your belief in the power of a consistent example as giving actual confirmation of a claim?

  6. Jonathan D

    I seems that Alan Cooper has read (1) the wrong way round, but I am also puzzled by there assertion that (1) can be known. Perhaps I am simply speaking a slightly different version of English, as I am also bothered by what you mean by proposition F, but it seems to me that D requires the existence of at least one man around 273.5cm tall, if not also the circumstances where I come across one of said men. I don’t see why (1) is any more answerable than (2).

  7. Briggs

    Alan, Jonathan,

    See today’s (20 July) post for a reply. I don’t want it to get buried.

  8. Chinahand

    I am unclear. Initially Prof Briggs states that P(D|Q) = 1. He then contrats this with P(F|Q) which is some unknown probability between 0 and 1 depending upon the height distribution of people in the population.

    Later he asks us to speculate about the values of P(D|Q&E) and P(D|E).

    Here his speculations seem to be far closer to trying to answer P(F|Q&E) and P(F|E).

    Which is it D or F? It seems we are able to interpret the statements either way to fit our argument.

    I find this flexibility in definition (ie are we talking about F or D) a serious problem when trying to understand Bayes’ Theorum.

    When talking about the number of black marbles in a set of jars then it’s easy, but with real world situations trying to understand what the values of your priors, likelihood and marginal likelihood are is very much based on interpreation and real world evidence and this doesn’t fit well with the idea we are discussing the relationship between propositions and nothing more.

  9. Jonathan, you are right. I was thinking of P(Q|D) and I don’t know where I got the idea Briggs was claiming that was equal to 1 – especially since the passage I quoted correctly describes P(D|Q).

    With regard to P(D|Q) though I agree with your concern. Just knowing that all men are under 9 feet tall doesn’t even guarantee that we will ever see a man at all – let alone one who happens to be exactly 8’11.75″

    Perhaps I can claim that I first had your reaction then had a drink and came back to express my concern having forgotten exactly what was bothering me. I leave it to you and Briggs to estimate the probability of this being the correct explanation.

  10. Caroline

    I’m pig ignorant, but how can the likelihood of the probability of seeing a man less than nine feet be “1, or 100%” based on a single sample that is under nine feet? Don’t we have to assume that some of the population is precisely nine feet tall, but not taller?

Leave a Reply

Your email address will not be published. Required fields are marked *