## Confidence Intervals, Logic, Induction

**Induction**

“Because all the many flames observed before have been hot is a good reason to believe *this* flame will be hot” is an example of an inductive argument, and a rational one.

An inductive argument is an argument from contingent (not logically necessary) premises which are, or could have been, observed, to a contingent conclusion about something that has not been, and may not be able to be, observed. An inductive argument must also have its conclusion say about the unobserved something like what the premises says about the observed.

In classical, frequentist statistics inductive arguments are forbidden—not *frowned upon*, but *disallowed*. Even some Bayesians have adopted the skeptical belief that inductive arguments are “ungrounded”, or that there is a “problem” with induction. This is not the time to enter into a discussion of why these thoughts are false: David Stove’s masterpiece “The Rationality of Induction” can be consulted for particulars.

Anyway, only academics pretend to be mystified by induction, and only in writing. They never *act* as if induction is irrational. For example, I’ve never met a skeptical philosopher willing to jump off a tall building. I assume inductive arguments to be rational and unproblematic.

There are deductive arguments and non-deductive arguments; not all non-deductive arguments are inductive ones, though it is a common mistake to say so (perhaps because Carnap often made this slip). Logical probability can be used for any type of argument. Frequentist probability is meant to represent a substitute for deductive arguments in non-deductive contexts, somewhat in line with Popper’s ideas on falsification. We will talk about that concept in a different post.

**Confidence Intervals**

In frequentist statistics, a confidence interval (CI) is function of the data. Custom dictates a “95%” CI; though the size is irrelevant to its interpretation. Often, at least two data points must be in hand to perform the CI calculation. This, incidentally, is another limitation of frequentist theory.

The CI says something about the value of an unobservable parameter or parameters of a probability model. It does *not* say anything about observables, theories, or hypotheses. It merely presents an interval (usually contiguous) that relates to a parameter.

Its interpretation: If the “experiment” in which you collected your data were to be repeated a number of times that approached the limit of infinity, and in each of those experiments you calculated a CI, then 95% of the resulting (infinite) set of CIs will “cover” the true value of the parameter.

Please read that over until you have assimilated it.

**Problems**

Problem one: The “experiment” must be, but almost never is, defined rigorously. But even when it is, the idea that you could recreate an experiment that is identical to the milieu in which you collected your original data an infinite number of times is ludicrous. That milieu must be the same in each re-running—except that it must be “randomly” different. That “randomly” is allowed to remain vague and undefined.

Problem two, and the big one. Even if you can satisfy yourself that an infinite number of trials is possible, you are still confronted with the following fact. The CI you collected on your data has only one interpretation: *either the true value of the parameter lies within it or it does not*. Pause here. That is *all* you are ever allowed to say.

The italicized statement—a tautology—is so important, so crucial, so basic to the critique of frequentist theory that few can keep it in focus, yet nothing can be as simple.

Whatever interval you construct, no matter how wide or how small, the *only thing* you are allowed to say is that the true value of the parameter lies within it or it does not. And since any interval whatsoever also meets this tautological test—the interval [1, 2] for example—then the CI we have actually calculated in our problem *means nothing*.

Yet everybody thinks their CI means something. Further, everybody knows that as more data collected, the calculated CIs grow narrower, which seems to indicate that our confidence about where the true value of the parameter lies grows stronger.

This is false. *Strictly false*, in frequentist theory.

Making any definite statement about a CI other than the above-mentioned tautology is a mistake. The most common is to say that “there is a 95% chance that the parameter lies within the CI.” That interpretation is a Bayesian one.

The other, just mentioned, is to say that narrower CIs are more certain about the value of the parameter than are wider CIs. That is an inductive argument which attempts to bypass the implications of the tautology.

**Neyman**

The gentleman that invented confidence intervals, Dzerzij (Jerzy) Neyman knew about the interpretational problems of confidence intervals, and was concerned. But he was more concerned about inductive arguments, which he thought had no business in statistics.

Neyman tried to take refuge in arguments like this: “Well, you cannot say that there is a 95% chance that the actual value of the parameter is in *your* interval; but if statisticians everywhere were to use confidence intervals, then *in the long run*, 95% of *their* intervals will contain their actual values.”

The flaw in that workaround argument is obvious (make sure you see it). And so, with nowhere else to turn, in 1937 Neyman resorted to a dodge and said this: “ The statistician…may be recommended…to state that the value of the parameter…is within [the just calculated interval]” merely by an act of will.

Since that time, statisticians having been illegally willing CIs to mean more than they do.

**Induction and CIs**

You will often read of “numerical simulation experiments” in which a statistician tries out his new method of estimating a parameter. He will simulate a, necessarily finite, run of CIs where the true value of the parameter is known and note the percentage of simulated CIs that cover the parameter.

If the percentage is close to 95%, then the statistician will state that his procedure is good. He will convince himself that his method is giving proper results: that is—this is crucial—he will convince himself that his *estimation method/theory is likely to be true*.

Just think: he will use an inductive argument from his observed experimental data to infer that future CIs will be well behaved. But this is forbidden in classical statistics. You are nowhere allowed to infer the probability that a theory is true or false: nowhere.

Any such inference is the result of using induction.

Of course, classical statisticians everywhere use induction, especially when interpreting the results of studies. We just never seem to remember that the frequentist theory of probability forbids such things. Challenge two: find one study whose conclusions do not contain inductive arguments.

**Logical Probability, Bayesian**

Any theory of probability should be all-encompassing. It shouldn’t just work for the technical apparatus inside a probability model, and not work for events outside that limited framework. A proper theory should apply to its technical apparatus, its conclusions and the extrapolations made from them. The Bayesian and logical theories of probability, of course, are general and apply to statements of any kind.

Somehow, frequentists can use one form of probability for their models and then another for their interpretations of their models. This inconsistency is rarely noted; perhaps because it is more than an inconsistency: it is a fatal to the frequentist position.

Now, if you have ever had any experience with CIs, you know that they often “work.” That is, we can interpret them *as if* there were a 95% chance that the true value of the parameter lies withing them, etc.

This is only an artifact caused by the close association of Bayesian and classical theory, where the Bayesian procedure opts for “non-informative” priors. This is coincidental, of course, because the association fails to obtain in complex situations.

There are those who would reconcile frequentist and Bayesian theories. They say, “What we want are Bayesian procedures that have good frequentist properties.” In other words, they want Bayesian theory to operate at the individual problem level, but they want the compilation, or grouping, of those cases to exhibit “long-run” stability.

But this is merely Neyman’s mistake restated. If each individual problem is ideal or optimal (in whatever sense), then the group of them, considered as a group, is also ideal or optimal. Plus, you do not want to sacrifice optimality for your study for the tenuous goal of making groups of studies amicable to frequentist desire.

**Recommendation**

As always: we should cease immediately teaching frequentist mathematical theory to all but PhD students in probability. In particular, no undergraduates should hear of it; nor should casual users.

Next stop: p-values.