**Twitter @ceptional reminded me of this post, which I had forgotten. Since it is highly relevant to The Great Bayesian Switch, I decided to repost. Some minor errors in grammar have been corrected.**

The Twitter user @alpheccar asked me to look at the discussion with the same name at Stack Exchange. We’ve covered this before—a statistical pun: see below—but it’s worth another look.

It helps to have an example, so here is a trivial one. I want to quantify my uncertainty in the heights of male U.S. citizens, from age 18 until dead. This will let me answer such scintillating questions as, “What is the probability that a man (meeting this demographic condition) is taller than the perfect height of 6’2″?”

I decide, based on reasons unfathomable to the lay mind (but really because the software allows me to), to quantify my uncertainty in heights using a normal distribution. I do *not* say “Heights are normally distributed”, because they are not. Heights are determined by various biological and environmental causal factors. But I *can* say, “My *uncertainty in heights* is quantified by a normal.”

Normal distributions are characterized by two parameters, a central and a spread, usually labeled with the Greek letters μ and σ. In order to characterize my uncertainty in height, I must needs supply values for these parameters.

I can, if I like, just guess what these parameters are. Why not? I can set μ = 5’8″ and σ = 3″, and who are you to say I am wrong? Can you *prove* I am wrong? You cannot. In this case, via my argument by authority, there is no question of confidence or credible intervals. My guesses are fixed and final.

But I am humble and decide to be amenable to empirical evidence, so I grab a sample of men (alive now) and measure their heights. Because I am a mathematical wizard, I can compute the mean and standard deviation of this sample. There is no uncertainty in these calculations, or in this sample. If I want to know the probability of a man in this sample being taller than the average, I just count. There is still no question of credible or confidence intervals.

What have the mean and standard deviation to do with the values μ and σ? Not a damn thing. Unless I embrace frequentist theory, which allows me to substitute these empirical measures as *guesses* of the parameters. And recall I need guesses, otherwise I cannot quantify my uncertainty in heights.

No frequentist believes that these guesses of the parameters are perfect, without error. Here is where *confidence intervals* arise: through a formula, I can compute the 95% confidence interval for these guesses. But first, an old joke:

“Excuse me, professor. Why a 95% confidence interval and not a 94% or 96% interval?” asked the student.

“Shut up,” he explained.

**What is a confidence interval**

The confidence interval formula supplies a numerical interval for both the guess of μ and σ. But if I wanted, I could go back and regather a new sample, and I could compute a new guess of μ and a new guess of σ, and new numerical intervals. Agreed?

Well, I could do it a third time, too, producing a new set of guesses and intervals. Then a fourth time, then a fifth, and so on *ad infinitum*. Literally. Still with me?

Here is what a confidence interval is: 95% of those intervals from the repeated samples will “cover”, or contain, the true values of μ and σ. That’s it, and nothing more.

But what about *the* interval I calculated with the only sample I do have? What does *that* interval mean? Nothing. Absolutely, positively nothing. The only thing you are allowed to say is to speak the tautology, “Either the true values of μ and σ lie in the calculated intervals or they do not.”

**What is a credible interval**

The Bayesian way is not satisfied with a mere guess of μ and σ. Instead, prior information about the values of these parameters is gathered or assumed and used to probabilistically quantify the uncertainty in their values *before* any data is gathered. That is, before new data comes in, I can ask questions like, “Given this prior information, what is the probability that μ < 5″?”

But after I gather a sample, and through the magic of Bayes’s rule, I can calculate the uncertainty I have in the values of μ and σ accounting for this data. I can then picture the complete distribution of uncertainty of both parameters, or I can pick an interval of values where I think the true values of the parameters are most likely to lie.

Here is what a credible interval is: Given the data and the model, there is a 95% chance the true values of μ and σ lie in that interval. That’s it again, and nothing more.

**But what about the heights of those men?**

We forgot! Just like everybody else who does a statistical analysis, we got distracted by computing guesses and intervals that we plumb forgot what our original purpose was. How silly of us! We were so happy that we knew the difference between confidence and credible intervals that we abandoned our original question and spoke as though the certainty we had in the parameters translated to the certainty we had in heights. But of course, this is miles from the truth. See this post.

Categories: Philosophy, Statistics

Jeezily crow. I feel dumber than ever after reading this one!

HA! I still enjoyed it.

Hi Matt,

What do you think of this related to an earlier post?

http://www.bmj.com/content/343/bmj.d5531

I am handed a bag containing a billion balls, 95% of which are of the same color. I reach in and pull out a ball, and declare with 95% confidence that the majority of the balls in the bag are the same color as the ball I pulled out. If, as you seem to say, I can only conclude from my single draw that either the majority of balls in the bag are the same color as my ball or they are not, then you should be willing to bet that I am wrong, giving even odds. The same should be true when 95% becomes 99%, 99.9999% etc. Iâ€™d take that bet.

I am handed a bag containing a billion balls. I reach in and pull out a ball. Look! A coloured ball!

Steve,

The kind of problem you describe has been fully worked out (for finite populations, like yours). See this paper.

I am still working this in my head. Tell me I have this straight.

Porfessor Briggs has 50 male sudents in his agebra sans algebra class. On the first day of class he measures them all. We don’t need to talk about normal distributions. We have the data. If there is one student taller than 6’6″ and one sudent shorter than 5’5″, we can say that if we were to select a male student at random, there is a 96% probability that he taller than 5’5″ and shorter than 6’6″. However, this is NOT a confidence interval.

If Mr. Briggs took his sample of 50 students and used that data to make a statement about the male popluation at the school or a the male popluation in America, He might use a confidence interaval.

The confidence interval does not say that given his data there is a 95% probability that the population mean is within the interval. Rather, it says that there is a 5% chance that my sample is unusual, and that the calculated interval does not include the population mean.

Doug M,

First two are correct, but the confidence interval says nothing about “unusualness” of samples. It’s only true that some say it does. But according to the theory, no. The explanation I gave is correct. Look up “Neyman” on this site for the story behind this strange creature.

Mr. Briggs, I’m 95% confident that I am confused.

If I take your point correctly, and that CI and CrI don’t really say all that much about the data outside the data, what’s the point of calculating them at all? Basically my question is: why bother calculating the number CI and CrI? Sorry if this is a dumb question.

Will,

It is an excellent question. It is because practitioners, who accept the truth of their models, are interested only in the parameters of those models. Most statistical practice is like a quest whose goal is to find tangible proof of a relationship, but which gets distracted by some shiny rocks, the beauty of which is taken as the tangible proof sought; only it isn’t.

Now, that metaphor stinks to high heaven. I’ll work on a better one.

I don’t know exactly what practitioners do, but if you use a parametric model, you need to first estimate the parameters regardless of the method you employ and then predict the response accordingly. For example, a simple linear model of y = α + β x + ε. Also, the parameters α (intercept) and β (slope) have their own interpretations. Estimation of the parameters and prediction of a new response y answer different questions.

I hate this time of year. Nothing but repeats everywhere.

I always thought a confidence interval was the time spent by a beginner at the end of a diving board before jumping.

@ SteveBrooklineMA,

I hope I’m not misrepresenting your argument, but from what you wrote you seem to misunderstand the probabilistic meaning of confidence intervals. My take on CI’s is this: Let’s say your estimate m for a mean Âµ is m = 10, with confidence interval [8,12]. Indeed, if Âµ were exactly equal to 10, and you repeated your experiment and sampled your data again and again, then 95% of all the confidence intervals calculated from these data would cover Âµ, according to the definition of confidence intervals. But this does *not* mean that there is a 95% probability that the true mean Âµ is an element of [8,12]. Even if we forget that Âµ is a parameter in classical statistics and cannot be given a probability distribution, and instead interpret Âµ in terms of Bayesian probability, the probability that Âµ falls into the given interval is still unknown. The reason is this: The procedure to calculate the one and only confidence interval we have forces us to *assume* that Âµ = m = 10 (or whatever value the mean of the sample at hand has). Therefore, no conclusion about a probability for Âµ can be drawn, because Âµ has a fixed value (with probability 1) by assumption.

So it is possible that Âµ is indeed 10, and our confidence interval is one of the 95% of intervals covering it. Âµ could of course have other values, like Âµ = 10.5, or 11, or 8.75 etc. Also in these cases, our (one and only) confidence interval happens to cover the true mean. On the other hand, Âµ could equal -2, 27.5, 111073.333, or any other value outside our calculated confidence interval. It was of course unlikely for us to find an interval which does not cover Âµ; but this is all we can say. The true mean Âµ is either inside the interval, or not.

So, confidence intervals share the disadvantage of p-values: They refer to probabilities of data, given the value of the parameter. To turn this information around via Bayes’ formula, we would need an a priori distribution for the parameters, which we don’t have (and cannot have in classical statistics). If, for instance, the a priori distribution for Âµ were zero for the interval [8,12], then the probability of Âµ being an element of [8,12] would be strictly zero, despite the luring 95% probability interpretation of our confidence interval; but these 95% refer only to the probability of certain data given the parameter, while we want to know the reverse, namely the probability of the parameter, given our data.