Skip to content

Category: Philosophy

The philosophy of science, empiricism, a priori reasoning, epistemology, and so on.

May 5, 2010 | 11 Comments

Confidence Intervals, Logic, Induction


“Because all the many flames observed before have been hot is a good reason to believe this flame will be hot” is an example of an inductive argument, and a rational one.

An inductive argument is an argument from contingent (not logically necessary) premises which are, or could have been, observed, to a contingent conclusion about something that has not been, and may not be able to be, observed. An inductive argument must also have its conclusion say about the unobserved something like what the premises says about the observed.

In classical, frequentist statistics inductive arguments are forbidden—not frowned upon, but disallowed. Even some Bayesians have adopted the skeptical belief that inductive arguments are “ungrounded”, or that there is a “problem” with induction. This is not the time to enter into a discussion of why these thoughts are false: David Stove’s masterpiece “The Rationality of Induction” can be consulted for particulars.

Anyway, only academics pretend to be mystified by induction, and only in writing. They never act as if induction is irrational. For example, I’ve never met a skeptical philosopher willing to jump off a tall building. I assume inductive arguments to be rational and unproblematic.

There are deductive arguments and non-deductive arguments; not all non-deductive arguments are inductive ones, though it is a common mistake to say so (perhaps because Carnap often made this slip). Logical probability can be used for any type of argument. Frequentist probability is meant to represent a substitute for deductive arguments in non-deductive contexts, somewhat in line with Popper’s ideas on falsification. We will talk about that concept in a different post.

Confidence Intervals

In frequentist statistics, a confidence interval (CI) is function of the data. Custom dictates a “95%” CI; though the size is irrelevant to its interpretation. Often, at least two data points must be in hand to perform the CI calculation. This, incidentally, is another limitation of frequentist theory.

The CI says something about the value of an unobservable parameter or parameters of a probability model. It does not say anything about observables, theories, or hypotheses. It merely presents an interval (usually contiguous) that relates to a parameter.

Its interpretation: If the “experiment” in which you collected your data were to be repeated a number of times that approached the limit of infinity, and in each of those experiments you calculated a CI, then 95% of the resulting (infinite) set of CIs will “cover” the true value of the parameter.

Please read that over until you have assimilated it.


Problem one: The “experiment” must be, but almost never is, defined rigorously. But even when it is, the idea that you could recreate an experiment that is identical to the milieu in which you collected your original data an infinite number of times is ludicrous. That milieu must be the same in each re-running—except that it must be “randomly” different. That “randomly” is allowed to remain vague and undefined.

Problem two, and the big one. Even if you can satisfy yourself that an infinite number of trials is possible, you are still confronted with the following fact. The CI you collected on your data has only one interpretation: either the true value of the parameter lies within it or it does not. Pause here. That is all you are ever allowed to say.

The italicized statement—a tautology—is so important, so crucial, so basic to the critique of frequentist theory that few can keep it in focus, yet nothing can be as simple.

Whatever interval you construct, no matter how wide or how small, the only thing you are allowed to say is that the true value of the parameter lies within it or it does not. And since any interval whatsoever also meets this tautological test—the interval [1, 2] for example—then the CI we have actually calculated in our problem means nothing.

Yet everybody thinks their CI means something. Further, everybody knows that as more data collected, the calculated CIs grow narrower, which seems to indicate that our confidence about where the true value of the parameter lies grows stronger.

This is false. Strictly false, in frequentist theory.

Making any definite statement about a CI other than the above-mentioned tautology is a mistake. The most common is to say that “there is a 95% chance that the parameter lies within the CI.” That interpretation is a Bayesian one.

The other, just mentioned, is to say that narrower CIs are more certain about the value of the parameter than are wider CIs. That is an inductive argument which attempts to bypass the implications of the tautology.


The gentleman that invented confidence intervals, Dzerzij (Jerzy) Neyman knew about the interpretational problems of confidence intervals, and was concerned. But he was more concerned about inductive arguments, which he thought had no business in statistics.

Neyman tried to take refuge in arguments like this: “Well, you cannot say that there is a 95% chance that the actual value of the parameter is in your interval; but if statisticians everywhere were to use confidence intervals, then in the long run, 95% of their intervals will contain their actual values.”

The flaw in that workaround argument is obvious (make sure you see it). And so, with nowhere else to turn, in 1937 Neyman resorted to a dodge and said this: “ The statistician…may be recommended…to state that the value of the parameter…is within [the just calculated interval]” merely by an act of will.

Since that time, statisticians having been illegally willing CIs to mean more than they do.

Induction and CIs

You will often read of “numerical simulation experiments” in which a statistician tries out his new method of estimating a parameter. He will simulate a, necessarily finite, run of CIs where the true value of the parameter is known and note the percentage of simulated CIs that cover the parameter.

If the percentage is close to 95%, then the statistician will state that his procedure is good. He will convince himself that his method is giving proper results: that is—this is crucial—he will convince himself that his estimation method/theory is likely to be true.

Just think: he will use an inductive argument from his observed experimental data to infer that future CIs will be well behaved. But this is forbidden in classical statistics. You are nowhere allowed to infer the probability that a theory is true or false: nowhere.

Any such inference is the result of using induction.

Of course, classical statisticians everywhere use induction, especially when interpreting the results of studies. We just never seem to remember that the frequentist theory of probability forbids such things. Challenge two: find one study whose conclusions do not contain inductive arguments.

Logical Probability, Bayesian

Any theory of probability should be all-encompassing. It shouldn’t just work for the technical apparatus inside a probability model, and not work for events outside that limited framework. A proper theory should apply to its technical apparatus, its conclusions and the extrapolations made from them. The Bayesian and logical theories of probability, of course, are general and apply to statements of any kind.

Somehow, frequentists can use one form of probability for their models and then another for their interpretations of their models. This inconsistency is rarely noted; perhaps because it is more than an inconsistency: it is a fatal to the frequentist position.

Now, if you have ever had any experience with CIs, you know that they often “work.” That is, we can interpret them as if there were a 95% chance that the true value of the parameter lies withing them, etc.

This is only an artifact caused by the close association of Bayesian and classical theory, where the Bayesian procedure opts for “non-informative” priors. This is coincidental, of course, because the association fails to obtain in complex situations.

There are those who would reconcile frequentist and Bayesian theories. They say, “What we want are Bayesian procedures that have good frequentist properties.” In other words, they want Bayesian theory to operate at the individual problem level, but they want the compilation, or grouping, of those cases to exhibit “long-run” stability.

But this is merely Neyman’s mistake restated. If each individual problem is ideal or optimal (in whatever sense), then the group of them, considered as a group, is also ideal or optimal. Plus, you do not want to sacrifice optimality for your study for the tenuous goal of making groups of studies amicable to frequentist desire.


As always: we should cease immediately teaching frequentist mathematical theory to all but PhD students in probability. In particular, no undergraduates should hear of it; nor should casual users.

Next stop: p-values.

May 2, 2010 | 13 Comments

The Truth of Things: When Probability & Statistics Cannot Be Used

I am always struggling with (my limited ability of) finding ways to describe the philosophy behind logical probability, especially to people who have a difficult time unlearning classical frequentist theory. This post is more for me to test a sketch of an explanation than to be complete explication of that theory. I am writing to those who already know statistics.

If a theory—or hypothesis, argument, whatever—cannot be deduced it remains uncertain. Formally or informally, probability is used to quantify this uncertainty.

Consider your trial for murder. Your guilt must be established in the collective mind of the jury “beyond reasonable doubt.” That phrase acknowledges that the certainty of your guilt or innocence is unattainable—in their minds—but its probability can still be established.

Incidentally, whether the phrase “beyond reasonable doubt” is a historical accident or the result of careful logical reasoning is irrelevant. It’s common meaning is enough for us.

Through an obviously informal process, each jury member begins the trial with an idea of your guilt, which is modified as each new piece of evidence arises, and through discussions with other jury numbers. There are mathematical ways to model this process, but at best these models are crude idealizations.

Bayes’s formula can illustrate how jurors update their evidential probabilities, but since nobody—even the jurors—knows how to verbalize how each piece of evidence relates to the probability of guilt, these models aren’t much help.

A probability model can be written in the form of a statement: “Given this background evidence, my uncertainty in this observable is quantified by this equation.”

The background evidence is crucial: it is always there, usually (unfortunately!) tacitly. All statements made, no matter how far downstream in analysis, are conditional on this evidence.

You possess different background evidence than does the jury. Your evidence allows you to state “I did it” or “I did not do it.” In other words, this special case allows you—and only you—to say, “Given my experience, the probability that I did the deed is zero”; or one, as the case might be.

Deduction then, is a trivial form of probability model.

The observable is in this case is the crime: in statement form “You committed the murder.” The truth of that observable is not knowable with certainty to the jurors given any probability model.

The probability model itself is where the confusion comes in. It cannot exist for this, or for any, unique situation.

A classical model a statistician might incorrectly use to analyze the jury’s collective mind is “Their uncertainty in your guilt is quantified by a Bernoulli distribution.” This model a simple mathematical form, which is this: θ, where 0 < θ < 1. Notice that those bounds are strict. People mistakenly call the parameter θ of the Bernoulli "the probability" (of your guilt). It is not---unless the parameter θ has been deduced and equal to a precise number (not a range). If we do not know that value of θ, then it is just a parameter. In itself, it means nothing.

The probability (of your guilt) can be found by accounting for the uncertainty in the parameter. This is accomplished by integrating out the uncertainty in θ—essentially, the probability (of guilt) is a weighted average of possible values of θ given the evidence and background information.

But just think: we do not need θ. We have a unique situation—your trial!—and the jury, not you, ponder your probability of guilt; they certainly do not invoke an unobservable parameter. The trial evidence modifies actual probabilities, not parameters.

Now, if we wanted to analyze specific kinds of trials—note the plural: that “s” changes all—then and only then, and as long as we can be exact in the kinds of trials we mean, we can model trial outcomes.

This model is useless for outcomes of trials we have already observed. And why? Because the evidence we have is the outcomes of those trials—whose outcomes we know! Silly to point out, right? But surprisingly, this easy fact, and its immediate consequences, is often forgotten.

Another way to state this: We only need to model uncertainty for events which are uncertain. We can model your trial, but only assuming it is part of a set of (finite!) trials, the nature of which we have defined. The nature of the set-model tells us little, though, about your trial with your unique evidence.

The key is that there does not exist in the universe a unique definition for kinds of trials. We have to specify the definition in all its particulars. This, of course, becomes that background information we started with. Under which definition—exactly!—does your trial lie?

It is the uncertainty of ascriptions of guilt in those future trials that is of interest, and not of unobservable parameters.

Oh, remind me to tell you how mathematical notation commonly used in probability & statistics interferes with clear thinking.

April 30, 2010 | 15 Comments

Is Experimental Economics Irrational?

Everybody knows there’s no such thing as money. So how come everybody acts like it’s real?

In particular, why do economists and other similar creatures find the lack of “rationality” curious when reviewing the transactions, and game-theory simulations of transactions, between real people?

There do exist bits of shiny metal, certain organic byproducts, and slips of paper that are called “money.” But these objects have no intrinsic value. Money is a concept, not a thing. It is a proxy for agreements between people, a mechanism to ease the trading of things that do have value.

Like I said, everybody knows this. So why has it been so difficult—why did it take so long—to see the logical consequences of this truth? Why, that is, is there the consternation over the lack of “rationality” when it comes to theories of money.

Use an example of buying a six-pack of beer from a bodega in New York City. Not some homeopathic brew like Coors Light. Real beer, like Brooklyn Brewery’s IPA.

The Korean lady in charge can announce that the beer is “Regular price $10; Today 10% off” or she can say “Regular price $8, plus New York City health tax surcharge of $1.” (This example is not fictional: NYC is always pegging up its sin taxes.)

Which would you prefer? According to classic economic and game theories, you’re not supposed to have a preference. Any deviation from indifference is considered “irrational” because, either way, you’re out nine bucks. Either way gets you the six-pack.

But nobody buys just a six-pack. You don’t “buy” anything, actually. You make an agreement between you and the shopkeeper; or, more realistically, between you and your crew of family and friends against the shopkeeper and her organization.

Your contract negotiations are short; much is agreed upon before you walk into the store. When our lady offers the beer for $1 less than usual, there is at least the appearance that she is giving a gift. What I receive is the beer, plus the good feeling that I am being treated nicely.

As all marketers know, I am negotiating for both the beer and the experience of buying it. Management with experience in negotiating with unions know experience counts. What often becomes a sticking point in these, obviously more formal, negotiations is not money but respect.

A union will sometimes accept fewer dollars in return for more autonomy, or better toilets, or anything that awards its members more esteem from the suits. This makes sense—it is rational—because the non-economist union members know that money isn’t everything.

Lack of respect—between me and my rapacious government—is why I would be in a bad mood after shelling out eight bucks for beer plus yet another one for tax.

That extra dollar might be so irksome that I am willing to take a car to Jersey—New Jersey!—to give my $8 to a different family-owned organization. I’d pay more money for this, but I’d receive the additional experience of being able to shop for beer and wine simultaneously; an experience New York State forbids. (Oh, yes.)

Over the past decade, the field of “experimental economics” has grown fat. It’s the same old economics, but married to the more practical mathematics of game theory, with the addition of college students corralled into prisoner’s dilemmas.

It was from these fields that economists are finally accepting that money isn’t real. Only they don’t put it that way. The say, in wonderment, that “man is not rational”—by which they mean that we don’t function as if money were real.

They carry out various simulations and discover, like our example above, that the “optimal” solution is often neglected for an “irrational” one. But “optimal” means quantitatively optimal given that money is real.

We’ll have to talk more about this, but these experiments suffer from an irremovable fault. Since money is not real, and since our economic transactions are really just negotiations and agreements, then the experimenters can never remove themselves from the experiment. They are just as much a part of it as the volunteers are.

Those volunteers will, of course, react differently to different experimenters. The hope is that the differences in oddities, irascibilities, quirks, and other weirdnesses of these volunteer-experimenter interactions will even out somehow. This is a matter of faith, and misplaced faith at that.

Even when they recognize this, it will difficult to shake economists loose from money. It’s so quantitative! It can be p-valued, plotted and pie-charted, set into percentages. Mostly, it can be modeled with soothing mathematics.

But how do you quantify my willingness to drive to Jersey to avoid a tax, or a student’s distraction due to the “teacher pants” the experimental economist wears to the game?

You cannot. So—once more: everybody all together now—in the end, the conclusions will be too certain, too sure.


The idea of this post came from Karl Sigmund’s interesting review of Herbert Gintis’s The Bounds of Reason: Game Theory and the Unification of the Behavioral Sciences. Linked on—where else?—Arts & Letters Daily.

April 16, 2010 | 37 Comments

Randomness is a Matter of Information

How many pads of paper do I have on my desk right now? How many books are on my shelves this minute?

You don’t know the answer to any of these questions, just as you don’t know if the Tigers will beat the Marlins tonight, whether IBM’s stock price will be higher at the close of today’s bell, and whether the spin of an outermost electron in my pet ytterbium atom is up or down.

You might be able to guess—predict—correctly any or all of these things, but you do not know them. There is some information available that allows you to quantify your uncertainty.

For example, you know that I can have no pads, or one, or two, or some other discrete number, certainly well below infinity. The number is more likely to be small rather than large. And since we have never heard of a universal physical law of “Pads on Desks”, this is about as good as you can do.

In a severe abuse of physics language, we can say that, to you, there are exactly no pads, exactly one pad, exactly two pads, and so forth, where each possibility exists in a superposition until…what? Right: until you look.

I know how many pads of paper I have because, according to my sophisticated measurement, there are three. And now, according your new information, the probability that there are three has collapsed to one. Given this observation—and accepting the observation is without error and given the belief in our mental stability—the event is no longer random, but known.

The point of this tedious introduction is to prove to you that “randomness” merely means “unknown.” Probability, and its brother randomness, are measures of information. What can be random to you can be known to me.

An event could be random to both of us, but that does not mean that we have identical information that would lead us to quantify our probabilities identically. For example, the exact number of books I have on my shelf is unknown to me and you: the event is random to both of us. But I have different information because I can see the number of shelves and can gauge their crowdedness, whereas you cannot.

A marble dropping in a roulette wheel is random to most of us. But not to all. Given the initial conditions—speed of the wheel, spin and force on the marble, where the marble was released, the equations of motion, and so forth—where the marble rests can be predicted exactly. In other words, random to thee but not to me.

I am happy to say that Antonio Acin, of the Institute of Photonic Sciences in Barcelona, agrees with me. On NPR, he said, “If you are able to compute the initial position and the speed of the ball, and you have a perfect model for the roulette, then you can predict where the ball is going to finish — with certainty.” (My aunt Kayla sent me this story.)

The story continues: “[Acin] says everything that appears random in our world may just appear random because of lack of knowledge.” Amen, brother Antonio.

A Geiger counter measures ionizing radiation, such as might occur in a lump of uranium. That decay is said to be “random”, because we do not have precise information on the state of the lump: we don’t know where each atom is, where the protons and so forth are placed, etc. Thus, we cannot predict the exact times of the clicks on the counter.

But there’s a problem. “You can’t be certain that the box the counter is in doesn’t have a mechanical flaw…” In other words, information might exists that allows the clicks to be semi-predictable, in just the same way as the number of books on my selves are to me but not to you.

So Acin and a colleague cobbled together ytterbium atoms to produce “true” randomness, by which they mean the results of an electron being “up” or “down” cannot be predicted skillfully using any information.

In their experiment, the information on the ytterbium atoms’ quantum (which means discrete!) state is not humanly accessible, so we can never do better than always guessing “up”1.

It is misleading to say that they are “generating” randomness—you cannot generate “unknowness.” Instead, they have found a way to block information. Information is what separates the predictable from the unpredictable.

The difference is crucial: failing to appreciate it accounts for much of the nonsense written about randomness and discrete mechanics.


1Brain teaser for advanced readers. Acin’s experiment generates an “up” or “down”, each occurring half the time unpredictably. Why is guessing “up” every time better than switching guesses between “up” and “down”?

Update This is what happens when you write these things at 5 in the morning. The teaser is misspecified. It should read:

Acin’s experiment generates an “up” or “down”, each occurring as they may. When is guessing “up” (or “down” as the case might be) every time better than switching guesses between “up” and “down”?

You will see that I idiotically gave away the answer in my original, badly worded version.