## Why probability isn’t relative frequency: redux

(Pretend, if you have, that you haven’t read my first weak attempt. I’m still working on this, but this gives you the rough idea, and I didn’t want to leave a loose end. I’m hoping the damn book is done in a week. There might be some Latex markup I forgot to remove. I should note that I am more than half writing this for other (classical) professor types who will understand where to go and what some implied arguments mean. I never spend much time on this topic in class; students are ready to believe anything I tell them anyway. )

For frequentists, probability is defined to be the frequency with which an event happens in the limit of “experiments” where that event can happen; that is, given that you run a number of “experiments” that approach infinity, then the ratio of those experiments in which the event happens to the total number of experiments is *defined* to be the probability that the event will happen. This obviously cannot tell you what the probability is for your well-defined, possibly unique, event happening now, but can only give you probabilities in the limit, after an infinite amount of time has elapsed for all those experiments to take place. Frequentists obviously never speak about propositions of unique events, because in that theory there *can be no unique events*. Because of the reliance on limiting sequences, frequentists can never know, with certainty, the value of any probability.

There is a confusion here that can be readily fixed. Some very simple math shows that if the probability of A is some number p, and it’s physically possible to give A many chances to occur, the relative frequency with which A does occur will approach the number p as the number of chances grows to infinity. This fact—that the relative frequency sometimes approaches p—is what lead people to the backward conclusion that probability *is* relative frequency.

Logical probabilists say that sometimes we can deduce probability, and both logical probabilists and frequentists agree that we can use the relative frequency (of data) to help guess something about that probability if it cannot be deduced^{1}. We have already seen that in some problems we can deduce what the probability is (the dice throwing argument above is a good example). In cases like this, we do not need to use any data, so to speak, to help us learn what the probability is. Other times, of course, we cannot deduce the probability and so use data (and other evidence) to help us. But this does not make the (limiting sequence of that) data *the* probability.

To say that probability is relative frequency means something like this. We have, say, observed some number of die rolls which we will use to inform us about the probability of future rolls. According to the relative frequency philosophy, those die rolls we have seen are embedded in an infinite sequence of die rolls. Now, we have only seen a finite number of them so far, so this means that most of the rolls are set to occur in the future. When and under what conditions will they take place? How will those as-yet-to-happen rolls influence the actual probability? Remember: these events have not yet happened, but the totality of them *defines* the probability. This is a very odd belief to say the least.

If you still love relative frequency, it’s still worse than it seems, even for the seemingly simple example of the die toss. What *exactly* defines the toss, what explicit reference do we use so that, if we believe in relative frequency, we can define the limiting sequence?^{2}. Tossing *just this* die? *Any* die? And how shall it be tossed? What will be the temperature, dew point, wind speed, gravitational field, how much spin, how high, how far, for what surface hardness, what position of the sun and orientation of the Earth’s magnetic field, and on and on to an infinite list of exact circumstances, none of them having any particular claim to being the right reference set over any other.

You might be getting the idea that *every* event is unique, not just in die tossing, but for everything that happens— every physical thing that happens does so under very specific, unique circumstances. Thus, nothing can have a limiting relative frequency; there are no reference classes. Logical probability, on the other hand, is not a matter of physics but of information. We can make logical probability statements because we supply the exact conditioning evidence (the premises); once those are in place, the probability follows. We do not have to include every possible condition (though we can, of course, be as explicit as we wish). The goal of logical probability is to provide conditional information.

The confusion between probability and relative frequency was helped because people first got interested in frequentist probability by asking questions about gambling and biology. The man who initiated much of modern statistics, Ronald Aylmer Fisher^{3}, was also a biologist who asked questions like “Which breed of peas produces larger crops?” Both gambling and biological trials are situations where the relative frequencies of the events, like dice rolls or ratios of crop yields, can very quickly approach the actual probabilities. For example, drawing a heart out of a standard poker deck has logical probability 1 in 4, and simple experiments show that the relative frequency of experiments quickly approaches this. Try it at home and see.

Since people were focused on gambling and biology, they did not realize that some arguments that have a logical probability do not equal their relative frequency (of being true). To see this, let’s examine one argument in closer detail. This one is from Sto1983, Sto1973 (we’ll explore this argument again in Chapter 15):

Bob is a winged horse

————————————————–

Bob is a horse

The conclusion given the premise has logical probability 1, but has no relative frequency because there are no experiments in which we can collect winged horses named Bob (and then count how many are named Bob). This example, which might appear contrived, is anything but. There are many, many other arguments like this; they are called *couterfactual arguments*, meaning they start with a premise that we know to be false. Counterfactual arguments are everywhere. At the time I am writing, a current political example is “If Barack Obama did not get the Democrat nomination for president, then Hillary Clinton would have.” A sad one, “If the Detroit Lions would have made the playoffs last year, then they would have lost their first playoff game.” Many others start with “If only I had…” We often make decisions based on these arguments, and so we often have need of probability for them. This topic is discussed in more detail in Chapter 15.

There are also many arguments in which the premise is not false and there does or can not exist any relative frequency of its conclusion being true; however, a discussion of these brings us further than we want to go in this book.^{4}

Haj1997 gives examples of fifteen—count `em—fifteen more reasons why frequentism fails and he references an article of fifteen more, most of which are beyond what we can look at in this book. As he says in that paper, “To philosophers or philosophically inclined scientists, the demise of frequentism is familiar”. But word of its demise has not yet spread to the statistical community, which tenaciously holds on to the old beliefs. Even statisticians who follow the modern way carry around frequentist baggage, simply because, to become a statistician you are *required* to first learn the relative frequency way before you can move on.

These detailed explanations of frequentist peculiarities are to prepare you for some of the odd methods and the even odder interpretations of these methods that have arisen out of frequentist probability theory over the past ~ 100 years. We will meet these methods later in this book, and you will certainly meet them when reading results produced by other people. You will be well equipped, once you finish reading this book, to understand common claims made with classical statistics, and you will be able to understand its limitations.

(One of the homework problems associated with this section)

{\sc extra} A current theme in statistics is that we should design our procedures in the modern way but such that they have good relative frequency properties. That is, we should pick a procedure for the problem in front of us that is not necessarily optimal for that problem, but that when this procedure is applied to similar problems the relative frequency of solutions across the problems will be optimal. Show why this argument is wrong.

———————————————————————

^{1}The guess is usually about a parameter and not the probability; we’ll learn more about this later.

^{2}The book by \citet{Coo2002} examines this particular problem in detail.

^{3}While an incredibly bright man, Fisher showed that all of us are imperfect when he repeatedly touted a ridiculously dull idea. Eugenics. He figured that you could breed the idiocy out of people by selectively culling the less desirable. Since Fisher also has strong claim on the title Father of Modern Genetics, many other intellectuals—all with advanced degrees and high education—at the time agreed with him about eugenics.

^{4}For more information see Chapter 10 of \citet{Sto1983}.