Statistics

Failed Counterexamples To The Principle Of Indifference

Say Abbot, this probability stuff isn't so hard.

Say Abbot, this probability stuff isn’t so hard.

What is the so-called Principle Of Indifference? A semi-screwy, semi-right idea in probability. To cadge an example from David Stove, let T be any tautology (a truth), then

     Pr(Bob is black | T) = Pr(Abe is black | T) = etc.

where the constants “Bob”, “Abe”, or whom or whatever, are equal not because, as the Principle Of Indifference would say, you are “indifferent” between them, and not because there is “no reason” to select between them, but because any constant swapped in must, on this evidence, have the same probability. There is no proof of this: it is axiomatic. It gives the same answers as the POI, but I like this positive way of stating things because it more easily avoids mistakes and misunderstandings.

The biggest error in figuring probabilities is forgetfulness. Unnecessary paradoxes arise when evidence that is tacit is shunned or mislaid. Since all probability is conditional, we have to keep in the fore the precise premises we’re using.

With that warning, some examples of supposed counterexamples to the POI given in the unpublished paper “A new theory of intrinsic probability” by Purdue’s Paul Draper. The paper was kindly provided to me by Justin Schieber. (I gather this paper makes the rounds; anyway, these examples are common.)

Incidentally, as always I follow a logical view of probability—which is not subjective Bayesianism nor frequentism. I’ll use single quotes to encapsulate propositions and double quotes for actual quotations.

No need to take these all at once. Read at your leisure.

Example 1: A Die

We have evidence E = ‘A six-sided object with sides labeled 1 through 6, which is tossed and which when tossed must show one of these sides.’ The probability of Q = ‘The die comes up 6’ given E is 1/6. In notation:

     Pr(Q | E) = 1/6.

And this would evidently be the same if instead of Q we had Q1 = ‘The die comes up 1’ or Q2 = ‘The die comes up 2’ etc. via the logical argument used above or via the POI. Draper:

…a person not only has no more reason to believe any one of the six statements ‘the die comes up 1’, ‘the die comes up 2’, etc., than any other, but also no more reason to believe either of the statements ‘the die comes up 6’ and ‘the die does not come up 6’ than the other. In such a case, the principle of indifference implies that (relative to that person’s epistemic situation) the statement ‘the die comes up 6’ is equal in probability to each of the statements ‘the die comes up 1’, ‘the die comes up 2,’ etc. and also equal to the statement that the die does not come up 6. But that can’t be right. So the principle of indifference must be false.

Draper agrees with our first calculation but says that R = ‘The die does not come up 6’ has, conditional on something not quite E, probability of 1/2. But R, given E, is equivalent to ‘The die comes up 1 or 2 or 3 or 4 or 5′, and the probability of that given E is clearly not 1/2.

Draper forgot he had E. It appears, in calculating the probability of R, he was thinking of something like E’ = ‘Either a 6 comes up or it doesn’t’, but then Pr(Q | E’) does not equal 1/2 because E’ is a tautology; it is true no matter what, even if we are not dealing with dice. Pr(Q | E’) is thus the interval from 0 to 1, i.e. no fixed number. Of course, Draper may have had something other than E’ in mind, but what it was is a mystery since he didn’t make it explicit.

Assuming probabilities can be calculated conditional on tautologies is another common mistake. His compound mistake led him to mistakenly reject the POI in this case.

Example 2: Tile size

Our evidence E = ‘Square tiles are produced from a factory having sides anywhere from 1 to 3 inches and this is a tile.’ Draper asks, “How probable is it that the length of [this] tile’s side is between 1 and 2 inches?” And he answers, conditional on E, 1/2.

But we also know the surface area of our tile can be from 1 to 9 square inches. Thus it appears that the probability, given E, of Q = ‘A surface area of this tile of between 1 and 4.5 square inches” is 1/2. But the lengths corresponding to these surface areas are 1 and 2.12132 inches. A contradiction! and bad news for the POI.

Draper then proposes a solution to this paradox which—I’m guessing he didn’t understand this—re-invokes the POI, but after a twist which I could not support more fully—because I’ve insisted on it many times.

Tile lengths cannot be continuous. No matter what, physics will limit the dimensions to a set of discrete sizes. It doesn’t matter what these are, only that they exist. Draper supposes 1 inch increments, but that doesn’t matter. Let them be 1/64 inch or whatever you like, and even let them be unequal; i.e. some increments are larger and some smaller than others, whatever. It doesn’t even matter if the tiles are square, but for ease of explanation suppose they are.

For fun and with Draper, amend E so that the square tile lengths are punched out in increments of 1 inch, but still between 1 and 3 inches. That means the only possible sizes are

     (1, 2, 3) inches,

which correspond to only these surface areas

     (1, 4, 9) square inches.

Recall Draper’s original question: “How probable is it that the length of the tile’s side is between 1 and 2 inches?” Given our modified E, the answer is 0. No chance at all. Tiles can be made with 1 inch or 2 inch sides all right, but no tile can be made between these lengths.

So let’s form a new Q: ‘The length of the tile is less than or equal to 2 inches’, which has, given E and the POI, probability 2/3. We can then ask what is the probability, given E, or R = ‘The surface are is less than or equal to 4.5 square inches”, which is also 2/3. (We’d get the same answer for any number less than 9 but more than or equal to 4 square inches.)

Because why? Because each possible length is tied to every possible surface area. And because we invoked the POI, or rather its logical equivalent, just as Draper did without noticing it. And he probably didn’t notice it because he gave a new name to the POI which did the same service. About that new name, I say nothing more here. We have bigger problems.

Example 2 Extended: Infinite tile lengths

Draper and I solved the problem by changing it, which seems like cheating. But it isn’t, because the original problem can never be met in real life. No manufacturing process can ever make tiles with infinitely different lengths. But if you want infinite lengths, the trick is always to work things out discretely, and to only pass to the limit when searching for approximations to tough combinatoric problems. There just are no real life examples of infinitely graduated measurements. None. Zippo. Zilch. Nada.

That really is the full answer. It’s that passing to the limit which creates, incidentally, the parameters of the usual probability models (see this pdf example).

Example 3: Bags in balls

Our E is that there are three balls in a bag, each of which must be either white or black (that link immediately above works this problem out in nauseating mathematical detail). Draper asks, given E, “what is the probability that all three balls are black?” He says 1/8 via the route, “Consider the following eight statements: all three balls are black, the first two are black and the third white, the first two are white and the third black, etc. One can easily imagine having no more reason to believe any one of those eight statements than any other.”

But then, says Draper, there are also four possible ratios of black balls to total balls in the urn (i.e., 1, 2/3, 1/3, and 0)…[and] the principle of indifference implies that the probability of the urn containing three black balls is 1/4.” Contradiction!

As above, Draper forgets some of his evidence. One of the ratios is indeed 3 out of 3, and another is 2 out of 3. But there are three ways to get 2/3: B1B2W3, B1W2B3, W1B2B3. Likewise, there are three ways to get 1/3, and just one way to get 0/3, That makes 8 total ratios, only one of which contains all black balls; thus, conditional on the full evidence (and notice even Draper started by labeling the balls but then forgot), we’re back to 1/8.

Conclusion

To solve the last problem, Draper again used his renamed and morphed POI to get back to where he never should have left. He did that in support of what he calls “intrinsic probability”, which is the probability a proposition has in absence of “our” evidence. I don’t buy that, but that’s a subject for another day, particularly since I haven’t let Draper have a say.

Forget that for now. Before us is the Principle of Indifference, or rather its logical counterpart. It manfully stood up to the supposed counterexamples Draper presented, as indeed it has rebuffed all challenges I ever heard tell of. All finite discrete challenges, that is.

Nothing in the world wrong with infinity. Why, God Himself is infinite. But in mathematics it makes a difference how you approach it. Two people taking different paths will end up at Infinity, all right, but in vastly different neighborhoods. And both will claim that because he can’t see the other, the other isn’t there, thus somebody must be in error.

(There may be no better demonstration of this than the first three minutes of this video.)

Update: Computing tip

Notation can be a dangerous tool, leading easily to the Sin of Reification, which is when the notation becomes real and that which it represents is forgotten. But notation is also darn helpful. When computing some probabilities, it is best to use it. Write out Draper’s first counterexample in notation, keeping all our premises:

    Pr( The die does not come up 6 | A six-sided object with sides labeled 1 through 6, which is tossed and which when tossed must show one of these sides)

Stated so plainly, it is (next to) impossible to make the mistake of thinking this probability is 1/2.

Update See Part II: There Is No Such Thing As Intrinsic Probability.

Categories: Statistics

19 replies »

  1. The first die example is so bad, I can’t help but wonder how it even made it to publication. I don’t want to commit the genetic fallacy, but it does lower my trust in David Stove.

    [A] person not only has no more reason to believe … but also no more reason to believe either of the statements ‘the die comes up 6′ and ‘the die does not come up 6′ than the other.

    Did he forget P(A’) = 1-P(A)? That is such a basic rule that instead of talking about intrinsic probabilities he should have checked his work first. I wonder how he would approach the Monty Hall problem.

    My question is why does the POI even matter? It seems that if you have a problem in probability that gathering evidence and using the rules of probability is more useful than appealing to principles. If many probabilities of things you care about are the same (and the evidence is sound), then you don’t need a principle to tell you that you are unable to tell the difference between them. You can just, as Mr. Briggs says, look at the data.

  2. James,

    Good comment, except that Stove is on the happy side of truth. It’s Paul Draper who illustrated the (very common) mistakes.

  3. Now I look like the fool with poor reading comprehension! Apologies to Stove.

  4. Okay, I’m confused….

    Doesn’t your (and Draper’s) solution to #2 invoke an imaginary extension in the information available? You say: “Our evidence E = ‘Square tiles are produced from a factory having sides anywhere from 1 to 3 inches and this is a tile.’” but your solution requires the factory to produce equal numbers of each tile size it makes. If you make this explicit by adding information about the ratio of the number of tiles of each size produced to the total number of tiles produced, your problem #2 becomes the same as #1 – and the right way to get the right answer equally obvious.

    Of course doing this brings in another problem: produced over what period? – but this, I suspect, actually brings out what you’re really trying to say (? guessing, of course): that every P = 1 or 0, all other values are estimates based on inadequate knowledge of the set of things E that all have to be true for P to = 1.

  5. Paul,

    To your first question: I don’t think so, no, because tacit in E has to be some knowledge about how tiles come out. Do any tiles appear with infinite gradations in measurement? No, they do not. Do tiles come out in discrete sizes? They do. If you claim that tiles can be infinitely graduated, then you still have to specify what you mean by “infinitely graduated.” There are more than one kind of infinity, e.g. countable, non-countable, and so on. Which do you mean? That’s where the paradoxes arise, by not accounting properly for the tacit evidence. Keeping things discrete obviates all the other questions.

    I don’t understand your second question.

  6. Briggs:

    1 – there is no second question – which does explain why it’s hard to understand 🙂

    2 – the factory makes discrete quantities of tiles in discrete sizes. If it makes 100 tiles a year in the 2 x 2 unit size your solution only works if it makes 100 in each of the other sizes. Consider what happens to your solution if it makes 100 in size A, but 1E5 in B, 1E7 in C.. etc.

  7. Paul,

    No, the number of tiles the factory makes does not matter. Suppose it only makes one and is then bombed to rubble by the Luftwaffe. Using the evidence E, what is the probability that tile length is 1 inch? One-third. And so on.

  8. Ah, this post basically discusses the problems of classical probability.

    Is “reality” a requirement for epistemic probability and probability calculus? I don’t think so. Just as the non-existence of perfect triangles doesn’t invalid the mathematics involving triangles.

    So, what is intrinsic probability exactly? Where I can find the paper? Anyway, below is what I understand about POI.

    The problem with POI rests on how the sample space is partitioned or defined. Evidence (ignorance) gives no reason to think any one of the mutually exclusive events is more probable than any other, hence equal epistemic probabilities are assigned to those events. Therefore, in the example of rolling a die, a probability of ½ is assigned to the event of “the die coming up 6” if sample space is said to contain only two events of ‘the die comes up 6′ and ‘the die does not come up 6′ than the other.

  9. Briggs:

    I’m liking this post. Thanks for turning me on to both Draper and Schieber, both look like interesting reads.

    Going in order, here’s where I’m getting stuck on Example 1; I think mostly because I don’t have all the vernacular yet loaded. Yeah, pun intended.

    Draper forgot he had E. It appears, in calculating the probability of R, he was thinking of something like E’ = ‘Either a 6 comes up or it doesn’t’, but then Pr(Q | E’) does not equal 1/2 because E’ is a tautology; it is true no matter what, even if we are not dealing with dice. Pr(Q | E’) is thus the interval from 0 to 1, i.e. no fixed number. Of course, Draper may have had something other than E’ in mind, but what it was is a mystery since he didn’t make it explicit.

    I think what you’re saying is that Draper is potentially in danger of accepting the gambler’s fallacy: “I’ve lost some large x number of times in a row, so the next bet is more likely to be a winner than the last since long streaks like this are relatively infrequent.” Am in in the neighbourhood?

  10. Brandon,

    No. The best answer, I think, is the update I gave at the end, combined with the changed evidence I believe Draper used.

  11. Thanks. I’m notationally challenged retarded, which may explain my struggles with probability as well. No time like the present to try and remedy it.

  12. “Concepts that have proven useful in ordering things easily achieve such authority over us that we forget their earthly origins and accept them as unalterable givens. Thus they might come to be stamped as ‘necessities of thought,’ ‘a priori givens,’ etc. The path of scientific progress is often made impassable for a long time by such errors.”
    — Albert Einstein

  13. Oh that’s good. Can’t remember if I’ve used this quote here before, but:

    “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” — Max Planck

    He and Albert were good friends, of course, and played strings together while chatting about wave functions. It shows.

  14. @JH

    Its a die. Which means it must have 6 sides with numbers from 1 to 6 on a side and so on.

    It is not a square block of material with sides numbered in an unknown way.

  15. Sander van der Wal,

    Right, it a die with six sides. So we know the insufficiency/ignorance is not on what the sample space is. Draper seems to have used the following as the POI rule – (http://en.wikipedia.org/wiki/Principle_of_indifference)

    ”Suppose that there are n > 1 mutually exclusive and collectively exhaustive possibilities. The principle of indifference states that if the n possibilities are indistinguishable except for their names, then each possibility should be assigned a probability equal to 1/n.”

    The sample space is partitioned into the two events of A={observing a 6} and AC={not observing a 6}. The two events are mutually exclusive and collectively exhaustive. Now, apply the above rule. A probability of ½ is assigned to each using the POI rule. Note evidence doesn’t tell us whether the die is biased. Nor does it whether the probability is uniform on {A, AC} or uniform on {1,2,3,4,5,6}.

    If the POI formulation is to assume equal probability to each simple outcome/possibility, then the POI is the same as the classical theory of probability.

  16. @JH

    You are assuming that a die with a sample space size of 6 can be mapped without loss of information to a smaller sample space with size 2, with one of the possibilities being the same as one possibility in the original sample space, the other possibility being the aggregate of the 5 other states. This clearly does not work, as one can use the same argument for all 5 other faces.

    The principle works, and might even be meant just to work, for the natural sample space of a problem. If you get things that look like paradoxes, you’re not applying the principle in the way it was meant to be applied.

  17. Sander van der Wal,

    The sample space is represented differently. It stays the same since the union of A and AC is {1,2,3,4,5,6}. The number of simple outcomes in the sample space remains 6.

    In your terms, the ignorance on {1,2,3,4,5,6} is mapped to the ignorance on {A, AC}. The indifference or ignorance has not decreased in terms of volume or area. It is not any louder, either. 🙂

    If you get things that look like paradoxes, you’re not applying the principle in the way it was meant to be applied.

    Or it implies that there are problems with POI, which you have already pointed out. The POI cannot justify a unique distribution.

    Apparently, Draper believes that he has correctly applied the POI rule given in the wiki link. I can’t tell you in what way it is “meant” to be applied, I don’t see how the rule is applied inappropriately. If it’s applied incorrectly, in what way? Which condition of the rule is violated? I’d like to know.

    If one wants to save the POI with the formulation Mr. Briggs employs, i.e., assign equal probability to each SIMPLE possibility, I don’t see how uniform probability distribution can be derived from ignorance. Is it axiomatic? I don’t think so. Is it a belief? Perhaps. I would say such uniform probability assignment is reasonable if it is necessary, e.g., when performing Bayesian analysis.

    A reference for you: Probability and chance by M. Strevens. In D. M. Borchert (ed.), Encyclopedia of Philosophy, pp 24 – 40.
    http://lembarannalar.files.wordpress.com/2012/10/encyclopedia_of_philosophy_vol_8.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *