Philosophy

Background

A paradox is a mistake in thinking; an artificial, human creation which usually arises because a conclusion which follows from a set of beloved premises is itself unloved.

Twitter user @alpheccar asked me to look at an example of the so-called Jeffreys-Lindley paradox, which was lately discussed by physicist Tommaso Dorigo at the Science 2.0 website. It is best to go there to read the complete details of his imagined experiment, but I’ll summarize it here. (I change his notation slightly.)

A particle counter collects n = 1,000,000 instances of some quantum mechanical event, of which n+ = 498,800 were “positive” and n- = 501,200 were “negative.” The details aren’t especially interesting to us, except to note that the theory which informs these counts suggests that equal numbers of positive and negative hits should be even.

Suppose you fire up the machine and run n = 1 instance of the event. Will n+ = n-; i.e., will the counts be equal? Obviously not: it is impossible. Is the theory (which we can call T) that said that n+ = n- then wrong? The answer, which may be counter-intuitive is, yes it is. T is false.

That is, if T says, “In any experiment, n+ = n-.” We ran an experiment, n+ did not equal n-, therefore we have falsified T. Not fair? Well, the word any does mean any: there is no escape. Suppose instead that T actually means, “In any experiment where n is divisible by 2 , n+ = n-.” More of a fair playing field.

Okay fire up the detector: n = 2 and (say) n+ = 2, n- = 0. Is T true or false? False again and for the same reason, n+ does not equal n-.

But wait a second. We’re talking about quantum mechanical events, here. The realm of the truly uncertain. T allows more wriggle room; it is not as demanding as we have been suggesting. What T really mean is this: “The probability of n+ is 1/2.” From T we can infer that the probability of n- is also 1/2. So now if we see n = 2 and n+ = 2, n = 0, we are no longer sure that T is true or false, because given T these kinds of results can happen.

In fact, no matter what n is, if in any experiment we see n+ = n, n- = 0—or we see any other value of n+, n- —we cannot say that T is false because T says that any sequence of n+, n- can happen. As long as n is less than infinity, which it always will be (I mean always in the sense of always), no observations can prove T wrong.

There is only one other thing we can do. We can, for fun, calculate the probability, given T and a fixed n, of seeing n+, n- equaling their observed values. Since this is simple, I leave it as a homework assignment. And that, without adding more information to T, or providing alternatives to it, is all we can say (I mean all in the sense of all).

Dorigo imagines a frequentist statistician seeing n+ = 498,800 and n- = 501,200. That frequentist, in order to simplify life, calculates x = n+/n- and s2 = x*(1-x)/n, and then plugs these values into a normal distribution as the central and spread parameters, i.e. he forms N(x, s). The frequentist also accepts that T is true. This lets him calculate the probability that x < 0.4988 (the observed value) given T is true and that this normal approximation is okay and given the plug-in values for the parameters are uncertainty-free. This calculation gives p = 0.0082.

The normal approximation isn’t really necessary; we can easily do the actual binomial calculation (In R this is pbinom(498800, 1e6, 0.5) ) and it gives the same answer. So skip worrying about the approximation. Worry instead about what this number means. It is, assuming T is true, the probability of seeing n+ = 498,800 or fewer hits in an experiment with n = 1,000,000 runs. Okay so far?

Now Dorigo imagines a (possibly inebriated) Bayesian thinking to himself that T might be true, as he was told it might be by the physicist. The Bayesian says to himself that “I might as well suppose that the probability that T is true is 1/2, which means the probability T is not true is also 1/2. Now T says that the probability of n+ is 1/2. But an alternative to T, call it T’, might say that the probability of n+ = 1/4. Still another alternative, T”, might say the probability of n+ = 2/3, and so on for all the other alternatives.”

How many alternatives to T are there? Uncountably many. Every number between 0 and 1 (excepting 1/2) is a potential alternative. The Bayesian doesn’t know which of these uncountably many alternatives is more likely true than another and so decides to give them all the same probability: the actual amount is that 1/2 left over and spread out after assuming the probability T is true is 1/2.

Now, through the miracle of Bayes’s theorem, the Bayesian can calculate the probability T is true given the observed values of n+, n- and the assumption that T and its alternatives each had those certain chances to be true. This probability, again using the normal approximation, is q = 0.978.

Whoa! The frequentist “rejects” his “null” hypothesis that T is true, but the Bayesian says it’s all but certain that T is true. A paradox!

The resolution

Both the frequentist and the Bayesian are out of their minds. Both have produced numbers which are completely useless and answer no real-life questions.

If T is true, the probability that n+ < 498,800 is of no interest to anybody. Remember, if T is true, any value of n+ which is less than or equal to n is possible, so just because we see one of these values means nothing.

And what was that Bayesian drinking? Why is it that he thought the probability of T being true was 1/2 and that every other value was equally likely? Too in love with the math (or bottle), we can suppose. We’d have to verify this with the physicist (who knows the quantum mechanical niceties of apparatus), but it strikes me as a nutty assumption. Since the Bayesian started with something absurd, his result is nothing more than a curiosity.

The only paradox here is why people trust us statisticians as much as they do.

What to do?

I’m going to disappoint you, but in so doing also answer the implicit question I just posed, by saying that the real answer is not easy, and perhaps not even quantifiable. In other words, people trust us because we provide these pretty little, and comforting, quantifications.

If our question is to ask, “What is the probability T is true?” we must provide alternatives or we must just accept that T is true. This is the answer—there is no other. By no other I mean no other.

Specifically, calculating the probability n+ < 498,900 given T is true is not calculating the probability T is true. It assumes T is true. And anyway, who cares about the probability of values we didn’t see? Values such as n+ = 498,899, n+ = 498,898, and so on.

The Bayesian has the better idea—he does calculate the probability T is true given the observations—but his implementation was bizarre. He put strange a priori probabilities on alternatives that are (probably, we’d have to check) physically nonsensical.

What we need are realistic alternatives. What are they? I certainly don’t know; at least, not for this experiment. The physicist might be able to provide these, and even to say (given some theory) what the probability each of the alternatives is true. If he can, then the Bayesian can work his magic and incorporate them into Bayes’s formula and produce a quantification that, given the evidence of the alternatives and the experimental data, the probability T is true.

If the physicist can’t quantify these alternatives—and chances are he can’t, since there are too many ways for an experiment to go wrong—he would be better going by his gut. Are there any cables loose? Somebody forget to divide by 2? Probably T + E is true (T plus some measurement error). Or perhaps the theory that gave rise to T needs to be altered? T really is false, but something like T is true. Do changes to this theory give us models (different T’, T”) which better predict the observed data? This is all hard work—unavoidable hard work.

Conclusion: there is no paradox, only unrealistic assumptions. For a terrific discussion of this, and for more reasons why the answer is more complicated than we would like, go to Jaynes and read his “Queer uses of probability theory.

Update Although I go over a specific example here, it is paradigmatic: the paradox just isn’t one.

Categories: Philosophy, Statistics

9 replies »

1. Big Mike says:

Very nicely stated.

One minor complaint: the last link seems to be broken.

2. Briggs says:

Big Mike,

Thanks, fixed.

3. alpheccar says:

Thanks for another excellent post. I am not surprised by the result because I have read Jaynes. But not enough people have. I am not a statistician and I discovered Bayesian probability only a few years ago. I was not satisfied at all by the frequentist approach and finally found the Jaynes’ book.

What you do on this blog is important : highlighting again and again the misuses of the statistical tools and highlighting when people are too confident about the results.

And for people, like me, who are still learning, it is very useful.

4. Mike B says:

Hey Briggs, before I got to the end of your post, I was screaming at my screen, “check your damn measurement system, Mr. Physicist!”

I’m neither a Bayesian nor a Frequentist, just a statistician that has spent way too much time in labs, plant QA labs, and any number of other places for me to even begin to tackle something like this with some evidence of a reliable measurement system.

Unfortunately, experience also tells me that Mr. Physicist’s answer to my question on his measurement system would be to quote how much he spent on it.

5. Doug M says:

Let me see if I have this straight….

The physicist says: My theory predicts that I should see phenominon X in 50% of my observations.
He runs his expiriment 1million times and sees X in 49.88%.

The frequentist says: if this theory is true, what is the probability of seeing X with the observed frequency. And says unlikely.

the Beysian would try to say what is the likelyhood that a rival theory fits the data better than the physicists theory. The flaw in the paradox as described above is that “Beysian” assumed that there is an ulimited range of rival theories and the physicists is better than most. This is an incorrect starting point.

In fact, unless we have a solid rival theory, the Beysian really can’t get started.

A possible explanation is that the theory is fundamentally sound, but the apparatus fails to obsserve X 0.25% of the time.

6. genemachine says:

The frequentist comes out of this best. The physicist asks an unreasonable question, the Bayesian answers a nonsensical question, and the frequentist answers a closely related tractable question in a way that we can all easily understand.

7. SteveBrooklineMA says:

The frequentist is right. What value of p fits the data best? Clearly p = 498800/1e6, the maximum likelihood estimator. How many times more likely is it to see n+=498800 when p=498800/1e6 than when p=1/2?

octave:12> binopdf(498800,1e6,498800/1e6)/binopdf(498800,1e6,1/2)
ans = 17.814

about 18 times more likely. Without an a-priori reason to favor p=1/2 over p=498800/1e6, it seems reasonable to reject p=1/2. Thus, this semi-bayesian approach agrees with the frequentist. I think the frequentist reasoning about “at least as extreme results” is nonsense, but in practice I don’t think it really matters.

8. Briggs says:

SteveBrooklineMA,

The frequentist is indeed out of his mind: he has lost the question.

We saw n+ = 498800. We did not see n+ = 498799, nor n+ = 498798, nor any other number but n+ = 498800. Any decisions we make should be based on what we observed and the prior information we have. Decisions should not be made on observations that were not made.

As you point out, it is true that seeing n+ = 498800 is about 18 times more likely if p = 0.4988 than if it is 0.5. But so what? We are not interested in the theory T’ = “The probability of n+ = 0.4988.” We only want to know if T is true or false; and how likely T is if it is not certainly true or false.

The mistake the Bayesian made was not so much an absurd prior, but that he created a prior for theories for which no other evidence exists.

The physicist has not provided us with any other theoretical reasons for theories other than T. All we can do is suppose, as I say earlier, that T + E might be in effect for this experiment: that is, T + E = “The probability of n+ = 0.5 but the equipment might give imperfect measurements.” Therefore, large departures from n+ = n/2 give evidence that T + E and not just T is true. Naturally, this does not mean that T is false: large departures probably only mean that E is true.

If all we start with is T, then we are left with T. We can only rate the probability of T in the presence of alternatives. The Bayesian did that, but with alternatives for which there was no reason to contemplate.