William M. Briggs

Statistician to the Stars!

Readers Write: Confidence Interval Interpretation

Reader Professor Doctor Moritz Heene writes:

I read your post on CIs with great interest, especially this one: http://wmbriggs.com/blog/?p=11862, see “Thinner is better”: “Frequentists prefer thinner, which is to say, narrower intervals over wide, assuming that, ceteris paribus, narrow intervals are more precise. For example, larger samples result in narrower intervals than small samples. But since all you can say is your interval either contains the true value or it doesn’t, its width does not matter. The temptation to interpret the width of an interval in the Bayesian fashion is so overwhelming that I have never seen it passed up.”

However, a colleague, with whom I discussed this issue sent me the following lines and I wonder what you think of it. I think he made a reasonable point: “For me a confidence interval is a summary of the effects I would [have] rejected if submitted to a hypothesis test (and we don’t need to think discretely here, we can think of the p-value as the continuous measure that it is of inconsistency of data with null, so I have stronger evidence against effects closer to the end of the confidence interval).

So a tight confidence interval is one that rejects many effects I may find interesting to know are rejected. A wide confidence interval is one that does not reject many effects I may find interesting.”

Regards,

Moritz

Your colleague didn’t pass up the Bayesian interpretation, either. He can’t really be blamed. The official frequentist meaning is too perplexing to keep in mind, its consequences intolerable, that relief is sought.

To repeat the official definition. Observe data, posit a parameterized probability model to “explain” that data, and construct a confidence interval (for one of these parameters). Now repeat the “experiment” such that the repetition is exactly the same as the first run, except that it is “randomly” different. Reconstruct the confidence interval. Repeat the “experiment” a third time, giving a third confidence interval. Repeat again. Then again. Then again forever.

When you reach forever, 95% of your collection of confidence intervals will overlap the “true” value of the parameter.

But what, you ask, about the confidence interval you have in hand? What does it mean? Well, it means just what I said, and nothing more. The only thing you can say about the confidence interval before you—regardless of its width—is that either the true value of the parameter lies within it or it doesn’t.

Suppose your interval is [a, b]. Either the true value of the parameters is in the interval or it isn’t. Introduce hypothesis testing: form a “null” which says the true value of the parameter is some number c. The frequentist then checks whether c is outside [a, b]. If so, he “rejects” the null.

Rejects is a word more apt than you think. For, as Neyman the man who invented confidence intervals tells us, rejecting a “null” is a pure act of will, just as is assigning the “null” a value of c. When the “null” is rejected, because all we know is that the true value of the parameter is in [a,b] or it isn’t, which is a tautology and true for any interval, there is no basis besides “I want”.

Your colleague says he would reject “nulls” where c is anywhere not in a to b. Well, he might. But he does so—on the official frequentist theory—with no basis. We are not entitled to say that the true value of the parameter has any probability to be c nor any probability to be in the interval [a, b]. We are not entitled to say that any finite collection of confidence intervals will be “well behaved” either. Only once we have an infinite collection are we allowed to speak—but only because we have observed everything that can ever be observed.

It is a Bayesian interpretation to say that the parameter “probably” or “likely” lies in [a, b]. It is a Bayesian interpretation that the parameter “could very well be” c. If you decide to reject the “null”, or to “fail to reject” it, with any kind of sureness or conviction or hope (the word “confidence” is lost to us here) then you have used a Bayesian interpretation.

Of course, this Bayesian interpretation is not a formal one, where the priors have been set up in the official fashion and so forth, but assigning any kind of probability, quantified or in the form of a “feeling”, to a parameter just is to be Bayesian.

If this is confusing, well, so it is. But that’s frequentism for you. A bizarre idea that you only know a probability at the end of all time.

19 Comments

  1. “If you decide to reject the “null”, or to “fail to reject” it, with any kind of sureness or conviction or hope”.

    No. We can surely be confident about a claim because the process that generated this claim is reliable. This does not have to do with degree of beliefs.

  2. Briggs

    November 3, 2014 at 9:29 am

    anon,

    That so? Define “reliable.”

  3. Having done some amateur tutoring for intro prob/stats, no one I’ve seen has really grasped what a p-value means, and they instinctively want to think Bayesian. It’s such a natural way of thinking! P-values seem almost pedagogically impossible to teach unless you start a prob/stats class with the ideas of logical probability. For students who just “have” to take the class, they don’t generally care enough to get the nuances, they just want to plug and chug, get a passing grade, and move on.

    It actually disappoints me a bit that the Bayesian results for various basic tests and regressions are the same (mathematically) as Frequentist formulations. If there were more differences, students might grasp the difference in how the two schools think of things.

  4. Briggs

    November 3, 2014 at 9:39 am

    James,

    Well, that frequentist theory sometimes gets lucky can’t really be a disappointment, because that’s the way things worked out. I often use regression as an example to seemingly unbudgeable frequentists who can’t abide priors. “Well, the Bayesian flat-prior and frequentist answers overlap. How do we explain that?”

    But let’s don’t argue about priors here. I’m no fan of non-probability “probabilities”, i.e. “improper” priors, and no fan of parameters period. See the Classic Posts page for material on this.

  5. This seems like a good place to discuss Nate Silver’s latest predictions.

  6. Briggs

    November 3, 2014 at 10:38 am

    Scotian,

    Sounds good! What are they?

  7. Regarding:

    Isnt’t the null rejected when c is outside [a,b]?

  8. Briggs

    November 3, 2014 at 12:08 pm

    JPetersen,

    Yes, idiot mistake on my part. I can’t even blame my enemies. I fixed it. Thanks.

  9. For the benefit of more recent readers and for Professor Doctor Moritz Heene, his colleague, and others, I re-post an example of the fact that Matt’s statement about the meaning of the confidence interval is admitted to in all the best statistics textbooks.

    Which is to say, frequentists themselves admit that Matt’s definition is completely correct — and then spend the next 200 pages of their texts ignoring what they admitted. For example, the following is from my wife’s old Bio-Stat book, which I had pulled down off our bookshelves one day long ago to check Matt’s claim. Here is Bernard Rosner (Rosner B. Fundamentals of Biostatistics. Fourth Edition, 1995. p. 162.):

    “Therefore, we cannot say that there is a 95% chance that the parameter µ will fall within a particular 95% CI. However, we can say the following: Over the collection of all 95% confidence intervals that could be constructed from repeated random samples of size n, 95% will contain the parameter µ.”

    Remove the obfuscatory language, and this says precisely what Matt says: the meaning of any particular CI that we can calculate within history is undefined — or, more precisely, as Matt writes: “all you can say is your interval either contains the true value or it doesn’t “.

    Rosner admits this exactly: “we cannot say that there is a 95% chance that the parameter µ will fall within a particular 95% CI.” That is, just as Matt states, there is no meaning to a CI that we could actually calculate. The entire meaning of a CI is exclusively in the infinite set, or as Rosner writes: “[the set of] all 95% confidence intervals that could be constructed from repeated random samples of size n”).

    And of course, even the ‘infinity-minus-one’ set of CI has no meaning. Rosner is obfuscating, using the idea that somehow an infinite set “could be constructed”, if we really tried.

    But we can’t calculate an infinite number of CIs within any amount of history. And “we cannot say” what meaning to give to any particular CI that anyone, anywhere, at any time, might calculate.

    There is no getting around this. There is no wiggle room. It’s not subject to interpretation. The best frequentist statistics texts say out loud that a Confidence Interval is exactly what Matt states that it is. There is NO meaning that can be given (“we cannot say”) to any actual calculated CI, by the admission of frequentist statisticians themselves.

  10. Briggs

    November 3, 2014 at 2:40 pm

    John K,

    Thanks! And it would be a fun exercise for other people to find similar quotes in whatever stats books they have.

    Look to where the CI is first introduced. Then look later when it’s actually used. The frequentist definition is almost immediately forgotten and the author turns into a closet Bayesian.

    Scotian,

    Good idea about Silver’s predictions. I won’t be able to get to it until Wednesday or after, but it turns out there’s an important point to be made.

  11. So if the CI was as thin as a line, would that be the true value of the parameter? At what point would a narrow CI be so narrow as to be the parameter?

  12. Define “reliable.”

    I would say a procedure is reliable when:
    i) when H is true, it will most surely tell me it is true. And;
    ii) when H is false, it will most surely tell me it is false.

    Or, in terms of CI.
    i) most of the time the constructed CI will contain the true value.

    If the evidence was generated by a reliable procedure, I can have confidence in that evidence.

  13. anon,

    Now define “most surely” and “most of the time ” and how you know the “reliable” procedure will yield it and, for that matter, how you know when the procedure yields it so the procedure can be deemed “reliable”.

    I’m curious how knowing how good a parameter might be tells you anything beyond the quality of the parameter and, in particular, male a statement about H if H is anything other than “I have a good parameter” (given the data at hand).

    Andy ,

    No. Read JohnK’s comment

  14. John K,
    So true. I have a statistics book where the author correctly explains that the confidence interval has to do with repeated experiments and then proceeds to use confidence intervals through the rest of the book. It’s bizarre. He gives the correct explanation and then uses the confidence interval incorrectly throughout the book.

  15. So Neyman rejects inductive reasoning as a source of knowledge and so rejects all conclusions based on sampling. And evades this problem by proposing an infinite number of samples? Which nobody could ever draw so they skip that step? And then they pretend they did? So that’s alright then? How did this pass peer review? More, how did it become the standard?

  16. When students have a wobbly understanding of confidence intervals in the early days of a course, and after what is now revealed as the sleight-of-hand by the prof or the text, there are incredible periods of struggle trying to reconcile the two definitions, or the initial definition with the subsequent application. A student who has a more flexible outlook will likely be able to adjust to and accept the inconsistencies, but one who is more concerned with methodical mastery of the material will continue to have trouble with statistics (as generally taught).

  17. “So a tight confidence interval is one that rejects many effects I may
    find interesting to know are rejected.”
    – My line of argumentation: This is like saying that the parameter c has a probability distribution, which is, of course, a Bayesian interpretation. But, as we
    know, c is supposed to be a fixed parameter in frequentist statistics.
    What we actually have in frequentist statistics is a **sampling**
    distribution, which is not the same as a probability distribution of the
    parameter c itself. The difference *seems* to be subtle but is, in fact, substantial.

  18. SteveBrooklineMA

    November 11, 2014 at 2:01 pm

    Suppose our model distribution is uniform on [mu-1,mu+1]. Suppose we sample N times. Let m be the minimum sample and M the maximum. I claim a 100% confidence interval is [M-1,m+1]. Not only that, I claim the probability is 100% that mu is in [M-1,m+1]. I claim a narrower region is more precise. I claim it works for any finite N, but (roughly) as N goes to infinity precision becomes absolute. I claim I can say more than “it’s either in there or it isn’t.” I say… it’s in there!

Leave a Reply

Your email address will not be published.

*

© 2016 William M. Briggs

Theme by Anders NorenUp ↑