Philosophy

Confidence Intervals, Logic, Induction

Induction

“Because all the many flames observed before have been hot is a good reason to believe this flame will be hot” is an example of an inductive argument, and a rational one.

An inductive argument is an argument from contingent (not logically necessary) premises which are, or could have been, observed, to a contingent conclusion about something that has not been, and may not be able to be, observed. An inductive argument must also have its conclusion say about the unobserved something like what the premises says about the observed.

In classical, frequentist statistics inductive arguments are forbidden—not frowned upon, but disallowed. Even some Bayesians have adopted the skeptical belief that inductive arguments are “ungrounded”, or that there is a “problem” with induction. This is not the time to enter into a discussion of why these thoughts are false: David Stove’s masterpiece “The Rationality of Induction” can be consulted for particulars.

Anyway, only academics pretend to be mystified by induction, and only in writing. They never act as if induction is irrational. For example, I’ve never met a skeptical philosopher willing to jump off a tall building. I assume inductive arguments to be rational and unproblematic.

There are deductive arguments and non-deductive arguments; not all non-deductive arguments are inductive ones, though it is a common mistake to say so (perhaps because Carnap often made this slip). Logical probability can be used for any type of argument. Frequentist probability is meant to represent a substitute for deductive arguments in non-deductive contexts, somewhat in line with Popper’s ideas on falsification. We will talk about that concept in a different post.

Confidence Intervals

In frequentist statistics, a confidence interval (CI) is function of the data. Custom dictates a “95%” CI; though the size is irrelevant to its interpretation. Often, at least two data points must be in hand to perform the CI calculation. This, incidentally, is another limitation of frequentist theory.

The CI says something about the value of an unobservable parameter or parameters of a probability model. It does not say anything about observables, theories, or hypotheses. It merely presents an interval (usually contiguous) that relates to a parameter.

Its interpretation: If the “experiment” in which you collected your data were to be repeated a number of times that approached the limit of infinity, and in each of those experiments you calculated a CI, then 95% of the resulting (infinite) set of CIs will “cover” the true value of the parameter.

Please read that over until you have assimilated it.

Problems

Problem one: The “experiment” must be, but almost never is, defined rigorously. But even when it is, the idea that you could recreate an experiment that is identical to the milieu in which you collected your original data an infinite number of times is ludicrous. That milieu must be the same in each re-running—except that it must be “randomly” different. That “randomly” is allowed to remain vague and undefined.

Problem two, and the big one. Even if you can satisfy yourself that an infinite number of trials is possible, you are still confronted with the following fact. The CI you collected on your data has only one interpretation: either the true value of the parameter lies within it or it does not. Pause here. That is all you are ever allowed to say.

The italicized statement—a tautology—is so important, so crucial, so basic to the critique of frequentist theory that few can keep it in focus, yet nothing can be as simple.

Whatever interval you construct, no matter how wide or how small, the only thing you are allowed to say is that the true value of the parameter lies within it or it does not. And since any interval whatsoever also meets this tautological test—the interval [1, 2] for example—then the CI we have actually calculated in our problem means nothing.

Yet everybody thinks their CI means something. Further, everybody knows that as more data collected, the calculated CIs grow narrower, which seems to indicate that our confidence about where the true value of the parameter lies grows stronger.

This is false. Strictly false, in frequentist theory.

Making any definite statement about a CI other than the above-mentioned tautology is a mistake. The most common is to say that “there is a 95% chance that the parameter lies within the CI.” That interpretation is a Bayesian one.

The other, just mentioned, is to say that narrower CIs are more certain about the value of the parameter than are wider CIs. That is an inductive argument which attempts to bypass the implications of the tautology.

Neyman

The gentleman that invented confidence intervals, Dzerzij (Jerzy) Neyman knew about the interpretational problems of confidence intervals, and was concerned. But he was more concerned about inductive arguments, which he thought had no business in statistics.

Neyman tried to take refuge in arguments like this: “Well, you cannot say that there is a 95% chance that the actual value of the parameter is in your interval; but if statisticians everywhere were to use confidence intervals, then in the long run, 95% of their intervals will contain their actual values.”

The flaw in that workaround argument is obvious (make sure you see it). And so, with nowhere else to turn, in 1937 Neyman resorted to a dodge and said this: “ The statistician…may be recommended…to state that the value of the parameter…is within [the just calculated interval]” merely by an act of will.

Since that time, statisticians having been illegally willing CIs to mean more than they do.

Induction and CIs

You will often read of “numerical simulation experiments” in which a statistician tries out his new method of estimating a parameter. He will simulate a, necessarily finite, run of CIs where the true value of the parameter is known and note the percentage of simulated CIs that cover the parameter.

If the percentage is close to 95%, then the statistician will state that his procedure is good. He will convince himself that his method is giving proper results: that is—this is crucial—he will convince himself that his estimation method/theory is likely to be true.

Just think: he will use an inductive argument from his observed experimental data to infer that future CIs will be well behaved. But this is forbidden in classical statistics. You are nowhere allowed to infer the probability that a theory is true or false: nowhere.

Any such inference is the result of using induction.

Of course, classical statisticians everywhere use induction, especially when interpreting the results of studies. We just never seem to remember that the frequentist theory of probability forbids such things. Challenge two: find one study whose conclusions do not contain inductive arguments.

Logical Probability, Bayesian

Any theory of probability should be all-encompassing. It shouldn’t just work for the technical apparatus inside a probability model, and not work for events outside that limited framework. A proper theory should apply to its technical apparatus, its conclusions and the extrapolations made from them. The Bayesian and logical theories of probability, of course, are general and apply to statements of any kind.

Somehow, frequentists can use one form of probability for their models and then another for their interpretations of their models. This inconsistency is rarely noted; perhaps because it is more than an inconsistency: it is a fatal to the frequentist position.

Now, if you have ever had any experience with CIs, you know that they often “work.” That is, we can interpret them as if there were a 95% chance that the true value of the parameter lies withing them, etc.

This is only an artifact caused by the close association of Bayesian and classical theory, where the Bayesian procedure opts for “non-informative” priors. This is coincidental, of course, because the association fails to obtain in complex situations.

There are those who would reconcile frequentist and Bayesian theories. They say, “What we want are Bayesian procedures that have good frequentist properties.” In other words, they want Bayesian theory to operate at the individual problem level, but they want the compilation, or grouping, of those cases to exhibit “long-run” stability.

But this is merely Neyman’s mistake restated. If each individual problem is ideal or optimal (in whatever sense), then the group of them, considered as a group, is also ideal or optimal. Plus, you do not want to sacrifice optimality for your study for the tenuous goal of making groups of studies amicable to frequentist desire.

Recommendation

As always: we should cease immediately teaching frequentist mathematical theory to all but PhD students in probability. In particular, no undergraduates should hear of it; nor should casual users.

Next stop: p-values.

Categories: Philosophy, Statistics

11 replies »

  1. Did my last message get stuck in a filter somewhere?

    Matt:
    Are there any current statisticians who explicitly champion the frequentist position?

    Are there any significant practical implications of the differences in the two positions? I thought about doctors telling patients about the risks associated with a procedure. In such cases many of the immediately obvious moderating factors can be measured. However, it seems pretty transparent that no doctor could withstand much of a cross-examination as to the basis of his or her risk assessment.
    I would bet many insurance companies base their risk assessments on a frequentist position. I initially thought that the insurance company that is advertising accident forgiveness may have adopted a Bayesian approach – but on further reflections they may have simply looked at the frequency with which policy holders have multiple accidents.

  2. Bernie,

    (You accidentally posted it on the Ernie Harwell thread.)

    No: there are only a few statisticians left who exclusively champion frequentism. Most are compromisers. They concentrate—rightly, for the most part—of getting answers to practical questions. The philosophical debates don’t interest them.

    Because of this we have tremendous inertia. All our class notes and textbooks are in the frequentist tradition. Who has time to change them?

    I’d say we are at a point at which the introduction of new methods is perfectly timed. Nobody can ever remember to use confidence intervals and p-values properly—despite the hundreds of reminders that are written perennially. Too, our software has reached the point where implementations of modern methods are trivial.

    What the majority of students (of all ages, and at most levels) want are procedures that are easy to understand and to implement. Procedures which are logically coherent and are natural and easy to interpret.

    Bayesian (logical probability) procedures are simple: once you learn how to do it in one problem, it’s the same in every other problem. Just as in frequentism, there are always models to choose from, but there are no tests to choose. Testing, as we know, leads to cheating and overconfidence.

    The strongest point, besides internal coherence, is that Bayesian procedures can be used to talk directly about observables. We need to move away from this odd reliance on unobservable parameters for our inferences. Once more, we are always too sure of ourselves when speaking of parameters.

    And why do it wrong when doing it right is so much easier? Bayes is easier to teach and to retain.

    A huge problem we have as teachers is that we can’t make up our minds whether we’re teaching a mathematical or a empirico-philosophical classes. Mathematics at least has the advantage of being difficult. And it does form the guts of what we do. But that doesn’t mean we should teach it that way.

  3. OMG, this sounds serious. Do “frequentists” patrol the streets in uniforms and hit you in the face with a rifle butt if you don’t calculate your p-values in the approved way? (I am not talking about university professors and journal editors — I can show you my scars — but actual working people who use statistics).

    It might be less aggravating to argue against “1930’s statistical methods” instead of personalizing the issue. Have a beer and cool off there, Matt. It is 95% certain that there is no vast conspiracy of perverted and irrational “frequentists.” Even old dinosaurs such as myself who (without meaning any harm) are occasional users of confidence intervals and hypothesis tests would welcome clear expositions of alternatives offered by modern statistics.

    I think the prevalence of 1930’s methodology in the workplace is partly explained by ignorance of what has been developed since then. You are right — Stats 101 needs a serious overhaul, and the outdated textbooks need to be replaced. The same kind of problem exists in other fields too.

  4. JJD,

    You’re right: it’s my natural combative nature. Probably causes more troubles than solves problems.

    On the other hand, I’d say we’re in an epidemic of over certainty caused by poor practices. If it were constrained to academia, I wouldn’t care so much. But it’s reached the government and taken a nasty turn towards absolute assurance.

  5. Matt:
    I am assuming that ensemble runs of computer models with subsequent predictions and CIs are fruits of this poisoned tree?

  6. Bayesian statistics doesn’t really ever allow inference either. It tells you how to change your existing (prior) beliefs, but it’s never clear what those prior beliefs should be, at least, for the purposes of publication. Maybe you have some prior belief, but how many of your readers would agree? There’s no way of differentiating a reasonable prior (noninformative perhaps?) from a prior that says “I am highly confident that my pet theory for which I have no empirical evidence.”

    Conversely, in many actual experimental settings, the logic of frequentism makes a lot of sense. “I did an experiment and found that two things were different. I have done frequentist statistics and determined that somebody else would also find them different 95% of the time if they did my same experiment over again.” The inference comes afterwards. Since it seems that this experiment will produce this difference reliably, does this interfere with theories that wouldn’t predict such an effect?

    This might sound unprincipled, but actually Bayesian statistics only seems more principled because it builds some (though of course not all) of the unprincipled aspects of science (specifically, the prior and inferential model) into the statistics. I can see why some mathematically-inclined individuals might prefer this way of doing things, but I don’t see how it’s any better in principle.

  7. John,

    Ah, but there is the field of objective Bayesian statistics, otherwise known as logical probability, in which evidence can be used to specify the model and its parameters (PDF). This is not so with frequentist statistics, in which the model is always unspecified; always, that is, ad hoc, subjective and prone to abuse.

    There is no universal solution to uncertainty, though. No matter frequentist, subjective, or objective Bayesian, all inferential statements are conditional on the given model. Which is of far greater importance than priors set on parameters.

    You’ll find the Bayes can be split in two: subjective and objective. The former is more prone to insist on precise answers for all questions. Oddly, the objective branch does not and allows imprecision in its answers.

    Also, as the standard demonstration goes, the influence of the prior on the posterior parameter distribution is small to disappearing as your evidence increases. Even stronger, the priors’ influence on the posterior observable (predictive) distribution is nearly zip. And don’t neglect the correspondence between many frequentist procedures and Bayesian results assuming “flat” priors.

    And how about situations in which no data is possible; say, counterfactuals? There is no frequency of something that did not happen. But this is no bar to forming a logical probability on counterfactual.

    Your example is false. You say, “I did an experiment and found that two things were different. I have done frequentist statistics and determined that somebody else would also find them different 95% of the time if they did my same experiment over again.”

    You may not make that kind of claim in frequentist theory. You may, of course, in Bayesian theory; what you have said is an inductive inference.

    In frequentist theory, you may make statements conditional on the model’s parameters taking certain specified values (usually zero; but subjectively chosen that way to make it zero). But these are odd statements to make and do not relate directly to observables.

    Bernie,

    Well, we always have the “approximate” (inductive) inference available to us. Assuming the experiments are well done, and the (subjectively chosen) models used are reasonable, then the answers spit out can be interpreted in a (loose) inductive way.

    Think of it this way: nobody, even champions of frequentism, believes that they results they produce are exact. Everybody builds into the results a bit of fuzz. But this is not allowed in frequentist theory, because doing so means engaging in yet another inductive inference. Of the type, “I’ve seen many models before and none of them are perfect; they are all off by a little; therefore this one will be too.” Again, can’t say that in frequentism.

    And once more we have the embedding problem…which we’ll skip!

  8. Admittedly, I haven’t given careful thought (lazy me) to this post, so my comment below might seem a bit unorganized.

    Yes, you are correct. Some people simply are not interest; and those who are/was interested in the philosophy know that Bayesian vs. frequentist is an old debate. Moreover, I would say that most statisticians know of various concepts of probability.

    [T]he influence of the prior on the posterior parameter distribution is small to disappearing as your evidence increases.

    You mean, as your sample size increases… Hmm, I though Bayesian doesn’t use the large-sample theory. ^_^

    You may say there is such a thing called objective Bayesian, to me, the word “objective” is misleading.

    This is not so with frequentist statistics, in which the model is always unspecified; always, that is, ad hoc, subjective and prone to abuse.

    I am not sure what you meant here. The likelihood function is the essential component of statistical modeling, irrespective of the inference methods. Once you have modeled a theoretically, practically appropriate likelihood function, one may choose Bayesian, frequentist, or likekihood-ist, or nonparametric or whatever inference method. So why would the model be always unspecified in frequentist statistics?

    Many statisticians have chosen to use more than one method and then carefully compare the results. And there are happy coincidences where Bayesian and frequentist CIs look the same.

    As a whole, the basic idea the Bayesian is simple but it can get complicated computationally. We know that in practice people blindly run whatever models without checking model assumptions. Off the top my head, here is one worrisome problem about Bayesian analysis. It seems to me that people would conveniently employ conjugate priors. And out of numerous (infinite?) possible distributions, normal, gamma, beta etc just turn out to be the right choice??!! So, I am not sure whether frequentist or Bayeisan statistics are more prone to abuse.

    Well, I don’t wish to pick on Bayesian or frequentist I see them as different statistical tools. As far as I know, every methodology has its faults and merits.

    And this comment is too long!

  9. Why does everyone forget Fisher in these Frequentist/Bayesian debates? The greatest scientific statistician of the 20th Century was neither a Frequentist nor a Bayesian.

    Bradley Efron neatly delineates them in this lecture, summarized by a triangle, with one apex each for Frequentist, Bayesian, and Fisherian.

    http://www.uweb.ucsb.edu/~utungd00/fisher.pdf

  10. Mike B,

    Excellent point. Fisher’s views on induction were somewhat inconstant. Unlike the school that followed him, he allowed the usefulness of induction, but only up to a point. He accepted its use in developing mathematical theorems, for example (not for their proofs, of course). But he did not allow it to be formally quantified probabilistically. At this point, possibly recognizing the paradox and attempting to reconcile it, he developed his idea of fiducial probability. He wanted to avoid explicit use of evidence—that is to say, induction—and especially disliked the idea of “prior” distributions (I’ve never loved that name; it is misleading.)

    Of course, fiducial inference was never a success. It gave rise to many counterexamples, and did not provide actual probability measures. Fisher himself, in a departure from his usual self-confidence, never loved his own theory. He let it lie fallow.

    Fisher, too, disliked confidence intervals. Many of the standard complaints against them originated with him.

    And let’s not forget that Fisher’s p-values were built in such a way to approximate Popper’s falsification standard for probabilistic arguments, for which no falsification is possible. Fisher acknowledged this, but Popper never did.

    Unfortunately, we all know the problems of p-values. I plan on discussing these in another article. I’ll point out an little-known argument against their use.

  11. Matt:
    Can I suggest that you use concrete examples to illustrate the differences in the different positions. After a while, and without spending significant time expanding my technical vocabulary, the practical significance of the discussion gets obscured. After reading about the work of Godel, Heisenberg, Arrow and Briggs ( ;>) ), I am convinced that too many people are too certain of too many things … what’s next?

Leave a Reply

Your email address will not be published. Required fields are marked *