Confidence Interval Interpretation

This post originally ran 3 November 2014.

Reader Professor Doctor Moritz Heene writes:

I read your post on CIs with great interest, especially this one:, see “Thinner is better”: “Frequentists prefer thinner, which is to say, narrower intervals over wide, assuming that, ceteris paribus, narrow intervals are more precise. For example, larger samples result in narrower intervals than small samples. But since all you can say is your interval either contains the true value or it doesn’t, its width does not matter. The temptation to interpret the width of an interval in the Bayesian fashion is so overwhelming that I have never seen it passed up.”

However, a colleague, with whom I discussed this issue sent me the following lines and I wonder what you think of it. I think he made a reasonable point: “For me a confidence interval is a summary of the effects I would [have] rejected if submitted to a hypothesis test (and we don’t need to think discretely here, we can think of the p-value as the continuous measure that it is of inconsistency of data with null, so I have stronger evidence against effects closer to the end of the confidence interval).

So a tight confidence interval is one that rejects many effects I may find interesting to know are rejected. A wide confidence interval is one that does not reject many effects I may find interesting.”



Your colleague didn’t pass up the Bayesian interpretation, either. He can’t really be blamed. The official frequentist meaning is too perplexing to keep in mind, its consequences intolerable, that relief is sought.

To repeat the official definition. Observe data, posit a parameterized probability model to “explain” that data, and construct a confidence interval (for one of these parameters). Now repeat the “experiment” such that the repetition is exactly the same as the first run, except that it is “randomly” different. Reconstruct the confidence interval. Repeat the “experiment” a third time, giving a third confidence interval. Repeat again. Then again. Then again forever.

When you reach forever, 95% of your collection of confidence intervals will overlap the “true” value of the parameter.

But what, you ask, about the confidence interval you have in hand? What does it mean? Well, it means just what I said, and nothing more. The only thing you can say about the confidence interval before you—regardless of its width—is that either the true value of the parameter lies within it or it doesn’t.

Suppose your interval is [a, b]. Either the true value of the parameters is in the interval or it isn’t. Introduce hypothesis testing: form a “null” which says the true value of the parameter is some number c. The frequentist then checks whether c is outside [a, b]. If so, he “rejects” the null.

Rejects is a word more apt than you think. For, as Neyman the man who invented confidence intervals tells us, rejecting a “null” is a pure act of will, just as is assigning the “null” a value of c. When the “null” is rejected, because all we know is that the true value of the parameter is in [a,b] or it isn’t, which is a tautology and true for any interval, there is no basis besides “I want”.

Your colleague says he would reject “nulls” where c is anywhere not in a to b. Well, he might. But he does so—on the official frequentist theory—with no basis. We are not entitled to say that the true value of the parameter has any probability to be c nor any probability to be in the interval [a, b]. We are not entitled to say that any finite collection of confidence intervals will be “well behaved” either. Only once we have an infinite collection are we allowed to speak—but only because we have observed everything that can ever be observed.

It is a Bayesian interpretation to say that the parameter “probably” or “likely” lies in [a, b]. It is a Bayesian interpretation that the parameter “could very well be” c. If you decide to reject the “null”, or to “fail to reject” it, with any kind of sureness or conviction or hope (the word “confidence” is lost to us here) then you have used a Bayesian interpretation.

Of course, this Bayesian interpretation is not a formal one, where the priors have been set up in the official fashion and so forth, but assigning any kind of probability, quantified or in the form of a “feeling”, to a parameter just is to be Bayesian.

If this is confusing, well, so it is. But that’s frequentism for you. A bizarre idea that you only know a probability at the end of all time.


  1. “If you decide to reject the “null”, or to “fail to reject” it, with any kind of sureness or conviction or hope”.

    No. We can surely be confident about a claim because the process that generated this claim is reliable. This does not have to do with degree of beliefs.

  2. Having done some amateur tutoring for intro prob/stats, no one I’ve seen has really grasped what a p-value means, and they instinctively want to think Bayesian. It’s such a natural way of thinking! P-values seem almost pedagogically impossible to teach unless you start a prob/stats class with the ideas of logical probability. For students who just “have” to take the class, they don’t generally care enough to get the nuances, they just want to plug and chug, get a passing grade, and move on.

    It actually disappoints me a bit that the Bayesian results for various basic tests and regressions are the same (mathematically) as Frequentist formulations. If there were more differences, students might grasp the difference in how the two schools think of things.

  3. James,

    Well, that frequentist theory sometimes gets lucky can’t really be a disappointment, because that’s the way things worked out. I often use regression as an example to seemingly unbudgeable frequentists who can’t abide priors. “Well, the Bayesian flat-prior and frequentist answers overlap. How do we explain that?”

    But let’s don’t argue about priors here. I’m no fan of non-probability “probabilities”, i.e. “improper” priors, and no fan of parameters period. See the Classic Posts page for material on this.

  4. For the benefit of more recent readers and for Professor Doctor Moritz Heene, his colleague, and others, I re-post an example of the fact that Matt’s statement about the meaning of the confidence interval is admitted to in all the best statistics textbooks.

    Which is to say, frequentists themselves admit that Matt’s definition is completely correct — and then spend the next 200 pages of their texts ignoring what they admitted. For example, the following is from my wife’s old Bio-Stat book, which I had pulled down off our bookshelves one day long ago to check Matt’s claim. Here is Bernard Rosner (Rosner B. Fundamentals of Biostatistics. Fourth Edition, 1995. p. 162.):

    “Therefore, we cannot say that there is a 95% chance that the parameter µ will fall within a particular 95% CI. However, we can say the following: Over the collection of all 95% confidence intervals that could be constructed from repeated random samples of size n, 95% will contain the parameter µ.”

    Remove the obfuscatory language, and this says precisely what Matt says: the meaning of any particular CI that we can calculate within history is undefined — or, more precisely, as Matt writes: “all you can say is your interval either contains the true value or it doesn’t “.

    Rosner admits this exactly: “we cannot say that there is a 95% chance that the parameter µ will fall within a particular 95% CI.” That is, just as Matt states, there is no meaning to a CI that we could actually calculate. The entire meaning of a CI is exclusively in the infinite set, or as Rosner writes: “[the set of] all 95% confidence intervals that could be constructed from repeated random samples of size n”).

    And of course, even the ‘infinity-minus-one’ set of CI has no meaning. Rosner is obfuscating, using the idea that somehow an infinite set “could be constructed”, if we really tried.

    But we can’t calculate an infinite number of CIs within any amount of history. And “we cannot say” what meaning to give to any particular CI that anyone, anywhere, at any time, might calculate.

    There is no getting around this. There is no wiggle room. It’s not subject to interpretation. The best frequentist statistics texts say out loud that a Confidence Interval is exactly what Matt states that it is. There is NO meaning that can be given (“we cannot say”) to any actual calculated CI, by the admission of frequentist statisticians themselves.

  5. John K,

    Thanks! And it would be a fun exercise for other people to find similar quotes in whatever stats books they have.

    Look to where the CI is first introduced. Then look later when it’s actually used. The frequentist definition is almost immediately forgotten and the author turns into a closet Bayesian.


    Good idea about Silver’s predictions. I won’t be able to get to it until Wednesday or after, but it turns out there’s an important point to be made.

  6. So if the CI was as thin as a line, would that be the true value of the parameter? At what point would a narrow CI be so narrow as to be the parameter?

  7. Define “reliable.”

    I would say a procedure is reliable when:
    i) when H is true, it will most surely tell me it is true. And;
    ii) when H is false, it will most surely tell me it is false.

    Or, in terms of CI.
    i) most of the time the constructed CI will contain the true value.

    If the evidence was generated by a reliable procedure, I can have confidence in that evidence.

  8. anon,

    Now define “most surely” and “most of the time ” and how you know the “reliable” procedure will yield it and, for that matter, how you know when the procedure yields it so the procedure can be deemed “reliable”.

    I’m curious how knowing how good a parameter might be tells you anything beyond the quality of the parameter and, in particular, male a statement about H if H is anything other than “I have a good parameter” (given the data at hand).

    Andy ,

    No. Read JohnK’s comment

  9. John K,
    So true. I have a statistics book where the author correctly explains that the confidence interval has to do with repeated experiments and then proceeds to use confidence intervals through the rest of the book. It’s bizarre. He gives the correct explanation and then uses the confidence interval incorrectly throughout the book.

  10. So Neyman rejects inductive reasoning as a source of knowledge and so rejects all conclusions based on sampling. And evades this problem by proposing an infinite number of samples? Which nobody could ever draw so they skip that step? And then they pretend they did? So that’s alright then? How did this pass peer review? More, how did it become the standard?

  11. When students have a wobbly understanding of confidence intervals in the early days of a course, and after what is now revealed as the sleight-of-hand by the prof or the text, there are incredible periods of struggle trying to reconcile the two definitions, or the initial definition with the subsequent application. A student who has a more flexible outlook will likely be able to adjust to and accept the inconsistencies, but one who is more concerned with methodical mastery of the material will continue to have trouble with statistics (as generally taught).

  12. “So a tight confidence interval is one that rejects many effects I may
    find interesting to know are rejected.”
    – My line of argumentation: This is like saying that the parameter c has a probability distribution, which is, of course, a Bayesian interpretation. But, as we
    know, c is supposed to be a fixed parameter in frequentist statistics.
    What we actually have in frequentist statistics is a **sampling**
    distribution, which is not the same as a probability distribution of the
    parameter c itself. The difference *seems* to be subtle but is, in fact, substantial.

  13. Suppose our model distribution is uniform on [mu-1,mu+1]. Suppose we sample N times. Let m be the minimum sample and M the maximum. I claim a 100% confidence interval is [M-1,m+1]. Not only that, I claim the probability is 100% that mu is in [M-1,m+1]. I claim a narrower region is more precise. I claim it works for any finite N, but (roughly) as N goes to infinity precision becomes absolute. I claim I can say more than “it’s either in there or it isn’t.” I say… it’s in there!

  14. Ethologists and animal trainers speak of “intermittent rewards”. Doggys learn best if they don’t get a treat every time they perform the trick. The options, of course, are “treat or no treat”, not “treat or swat on the muzzle”, for successful performance.


    “*Note: This study used cancer registry data to estimate the amount of HPV-associated cancer in the United States by examining cancer in parts of the body and cancer cell types that are more likely to be caused by HPV. Cancer registries do not collect data on the presence or absence of HPV in cancer tissue at the time of diagnosis. In general, HPV is thought to be responsible for about 91% of HPV-associated cervical cancers.”

    Someone told me that all cervical cancers were caused by HPV. I pull this up and I see two statements that cause me to wonder if people read them.

    “Cancer registries do not collect data on the presence or absence of HPV in cancer tissue”


    “in general, HPV is ***** thought ***** to be responsible for 91% of HPV associated cervical cancers”

    Can someone define the confidence interval of “thought to” and ‘associated with’? Two separate statements of uncertainty.

    How many HPV non associated non cervical cancers are there?

  16. from Paul L. Meyer, Introductory Probability and Statistical Applications (Addison-Wesley, 1965)
    Ch. 14 Estimation of Parameters, pp 281-282.
    “Consider 2?(z)-1=P(X-bar -z?/?n?X-bar +z?/?n).
    This last probability statement must be interpreted very carefully. It does not mean that the probability of the parameter ? falling into the specified interval equals 2?(z)-1. ? is a parameter and either is or is not in the above interval. Rather, the above interval estimate should be interpreted as follows: 2?(z)-1 equals the probability that the random interval (X-bar -z?/?n?X-bar +z?/?n) contains ? .”

    Meyer goes on to note that forming multiple interval estimates will not necessarily yield the same intervals, since the confidence interval is a sampling variable every bit as much as the sample average. Both the interval estimate and the point estimate are attempts to get a handle on the mean of the population, the interval being an effort to determine how precise the estimate is. I have always compared the mean to the stake in a game of quoits or horseshoes and the interval estimate to the quoit or horseshoe itself. It may or may not ring the stake, but the stake is not the variable, the horseshoe is. “When we make the statement that [? lies between the endpoints of a particular interval] we are simply adopting the point of view of believing that something is so which we know to be true ‘most of the time.'”

    This text was used in my undergraduate course in the math department.
    V.K. Rohatgi, An introduction to Probability Theory and Mathematical Statistics (Wiley, 1976)
    Ch. 11 Confidence Estimation p.46.
    Let ??? and 0<?<1. A function ?(X) satisfying
    P(sub-?){ ?(X) ? ?}?1-? for all ?
    is called a lower confidence bound for ? at confidence level 1-?. The quantity
    inf(over ???)P(sub-?){ ?(X) ? ?}
    is called the confidence coefficient.

    This has the singular advantage of at least being different from that which is usually encountered! I do not do the typography justice here. The author goes on to remark:
    “It is not quite correct to read [the equations] as follows: The probability that ? lies in the set S(X) is ?1-?. ? is fixed; it is S(X) that is random here. One can give the following interpretation:
    Choose and fix ? (usually small, say 0.01 or 0.05). Consider a sequence of independent experiments of size n from a population of unknown parameter ?, where n and ? may vary from experiment to experiment. For each sample point x, compute S(x). Then the parameter of the corresponding population may or may nor be covered by S(x). Although the set S(x) will vary from sample to sample, the probability that the statement “S(x) includes ?” will be true is roughly at least 1-?. … Note that a given confidence set either includes the actual parameter point or not; but we would not know which unless we already knew the actual parameter value.
    Cochran’s Sampling Techniques (Wiley, 1977) addresses only the practical aspects of calculating confidence intervals in various sampling situations, taking account of sample sizes, skewness, multivariate categories, and the like.
    H.A. Freeman’s Industrial Statistics (Wiley, 1943) does not mention confidence intervals as such.
    W.A. Shewhart’s Economic Control of Quality of Manufactured Product does not mention confidence intervals as such.
    I did not feel like looking through the rest of my books.

    I intuit a relationship between confidence intervals and topological proximity spaces; but I never got around to working it out.

Leave a Comment

Your email address will not be published. Required fields are marked *