Why probability isn’t relative frequency: redux

(Pretend, if you have, that you haven’t read my first weak attempt. I’m still working on this, but this gives you the rough idea, and I didn’t want to leave a loose end. I’m hoping the damn book is done in a week. There might be some Latex markup I forgot to remove. I should note that I am more than half writing this for other (classical) professor types who will understand where to go and what some implied arguments mean. I never spend much time on this topic in class; students are ready to believe anything I tell them anyway. )

For frequentists, probability is defined to be the frequency with which an event happens in the limit of “experiments” where that event can happen; that is, given that you run a number of “experiments” that approach infinity, then the ratio of those experiments in which the event happens to the total number of experiments is defined to be the probability that the event will happen. This obviously cannot tell you what the probability is for your well-defined, possibly unique, event happening now, but can only give you probabilities in the limit, after an infinite amount of time has elapsed for all those experiments to take place. Frequentists obviously never speak about propositions of unique events, because in that theory there can be no unique events. Because of the reliance on limiting sequences, frequentists can never know, with certainty, the value of any probability.

There is a confusion here that can be readily fixed. Some very simple math shows that if the probability of A is some number p, and it’s physically possible to give A many chances to occur, the relative frequency with which A does occur will approach the number p as the number of chances grows to infinity. This fact—that the relative frequency sometimes approaches p—is what lead people to the backward conclusion that probability is relative frequency.

Logical probabilists say that sometimes we can deduce probability, and both logical probabilists and frequentists agree that we can use the relative frequency (of data) to help guess something about that probability if it cannot be deduced1. We have already seen that in some problems we can deduce what the probability is (the dice throwing argument above is a good example). In cases like this, we do not need to use any data, so to speak, to help us learn what the probability is. Other times, of course, we cannot deduce the probability and so use data (and other evidence) to help us. But this does not make the (limiting sequence of that) data the probability.

To say that probability is relative frequency means something like this. We have, say, observed some number of die rolls which we will use to inform us about the probability of future rolls. According to the relative frequency philosophy, those die rolls we have seen are embedded in an infinite sequence of die rolls. Now, we have only seen a finite number of them so far, so this means that most of the rolls are set to occur in the future. When and under what conditions will they take place? How will those as-yet-to-happen rolls influence the actual probability? Remember: these events have not yet happened, but the totality of them defines the probability. This is a very odd belief to say the least.

If you still love relative frequency, it’s still worse than it seems, even for the seemingly simple example of the die toss. What exactly defines the toss, what explicit reference do we use so that, if we believe in relative frequency, we can define the limiting sequence?2. Tossing just this die? Any die? And how shall it be tossed? What will be the temperature, dew point, wind speed, gravitational field, how much spin, how high, how far, for what surface hardness, what position of the sun and orientation of the Earth’s magnetic field, and on and on to an infinite list of exact circumstances, none of them having any particular claim to being the right reference set over any other.

You might be getting the idea that every event is unique, not just in die tossing, but for everything that happens— every physical thing that happens does so under very specific, unique circumstances. Thus, nothing can have a limiting relative frequency; there are no reference classes. Logical probability, on the other hand, is not a matter of physics but of information. We can make logical probability statements because we supply the exact conditioning evidence (the premises); once those are in place, the probability follows. We do not have to include every possible condition (though we can, of course, be as explicit as we wish). The goal of logical probability is to provide conditional information.

The confusion between probability and relative frequency was helped because people first got interested in frequentist probability by asking questions about gambling and biology. The man who initiated much of modern statistics, Ronald Aylmer Fisher3, was also a biologist who asked questions like “Which breed of peas produces larger crops?” Both gambling and biological trials are situations where the relative frequencies of the events, like dice rolls or ratios of crop yields, can very quickly approach the actual probabilities. For example, drawing a heart out of a standard poker deck has logical probability 1 in 4, and simple experiments show that the relative frequency of experiments quickly approaches this. Try it at home and see.

Since people were focused on gambling and biology, they did not realize that some arguments that have a logical probability do not equal their relative frequency (of being true). To see this, let’s examine one argument in closer detail. This one is from Sto1983, Sto1973 (we’ll explore this argument again in Chapter 15):

Bob is a winged horse
————————————————–
Bob is a horse

The conclusion given the premise has logical probability 1, but has no relative frequency because there are no experiments in which we can collect winged horses named Bob (and then count how many are named Bob). This example, which might appear contrived, is anything but. There are many, many other arguments like this; they are called couterfactual arguments, meaning they start with a premise that we know to be false. Counterfactual arguments are everywhere. At the time I am writing, a current political example is “If Barack Obama did not get the Democrat nomination for president, then Hillary Clinton would have.” A sad one, “If the Detroit Lions would have made the playoffs last year, then they would have lost their first playoff game.” Many others start with “If only I had…” We often make decisions based on these arguments, and so we often have need of probability for them. This topic is discussed in more detail in Chapter 15.

There are also many arguments in which the premise is not false and there does or can not exist any relative frequency of its conclusion being true; however, a discussion of these brings us further than we want to go in this book.4

Haj1997 gives examples of fifteen—count `em—fifteen more reasons why frequentism fails and he references an article of fifteen more, most of which are beyond what we can look at in this book. As he says in that paper, “To philosophers or philosophically inclined scientists, the demise of frequentism is familiar”. But word of its demise has not yet spread to the statistical community, which tenaciously holds on to the old beliefs. Even statisticians who follow the modern way carry around frequentist baggage, simply because, to become a statistician you are required to first learn the relative frequency way before you can move on.

These detailed explanations of frequentist peculiarities are to prepare you for some of the odd methods and the even odder interpretations of these methods that have arisen out of frequentist probability theory over the past ~ 100 years. We will meet these methods later in this book, and you will certainly meet them when reading results produced by other people. You will be well equipped, once you finish reading this book, to understand common claims made with classical statistics, and you will be able to understand its limitations.

(One of the homework problems associated with this section)
{\sc extra} A current theme in statistics is that we should design our procedures in the modern way but such that they have good relative frequency properties. That is, we should pick a procedure for the problem in front of us that is not necessarily optimal for that problem, but that when this procedure is applied to similar problems the relative frequency of solutions across the problems will be optimal. Show why this argument is wrong.

———————————————————————
1The guess is usually about a parameter and not the probability; we’ll learn more about this later.

2The book by \citet{Coo2002} examines this particular problem in detail.

3While an incredibly bright man, Fisher showed that all of us are imperfect when he repeatedly touted a ridiculously dull idea. Eugenics. He figured that you could breed the idiocy out of people by selectively culling the less desirable. Since Fisher also has strong claim on the title Father of Modern Genetics, many other intellectuals—all with advanced degrees and high education—at the time agreed with him about eugenics.

4For more information see Chapter 10 of \citet{Sto1983}.

21 Comments

  1. Better, much better, and I applaud the inclusion of real-life examples of counter-factual arguments, but I guess I have to wait for another book to understand how that translates to premises that aren’t false.

    Another quip I have is this little freudian slip of elitism pretensiousness from the author in this paragraph (don’t take this the bad way, mr Briggs, I beg ya!):

    Haj1997 gives examples of fifteen—count `em—fifteen more reasons why frequentism fails and he references an article of fifteen more, most of which are beyond what we can look at in this book.

    Bokay, but apart from the argument from authority there, there’s no juice. I now have learned that there are fifteen plus fifteen reasons why frequentism fails, but I am not even fed up with a “passerby” example of one of them, I am just being told that I should count (com’on, count’em!) the number of reasons, as if that somehow raised the probability of my endorsement towards bayesianism. Hint, it does not. I only need one good reason, and if there are just one or two good ones inside those 15, then I’m bought. It’s hardly a matter of reason counting.

    Worse, it only makes me raise my eyebrows. It’s dangerously similar to when I read someone stating that there are 400 reasons why evolution is wrong, or that you have 3000 scientists backing up global warming, or 200 signing a paper on how Einstein was wrong, etc., if you get my grip.

  2. Luis,

    You are right. I’m dropping the snideness.

    I’ll try to fix up the argument from authority, but to some extent this is impossible in an introductory book. For example, later I show mathematical results which I can not prove using the limited math required of my readers. These, too, are arguments from authority.

    I do provide the reference so that readers not satisfied with my gloss can consult the original. That reference is: Hajek, A, 1997. “Mises redux” — redux: fifteen arguments against finite frequentism, Erkenntnis, 45, 209-227. Hajek also answers the question “Why do many arguments? Are you nervous?”

    But I agree that I presented it badly, so I’ll fix it.

    Thanks.

    I have also added this:

    In his A Philosophical Essay on Probabilities Laplace (1996), also quoted in (Tipler, 2008), opened his remarks with

    All events, even those which on account of their insignificance do not seem to follow the great lawas of nature, are a result of it just as necessarily as the revolutions of the sun. In ignorance of the ties which unite such events to the entire system of the universe, they have been made to depend upon final causes or upon hazard, according as they occur and are repeated with regularity, or appear without regard to order; but theswe imaginary causes have gradually receded with the widening bounds of knowledge and disappear entirely before sound philosophy, which see in the only the expression of our ignorance of the true causes.

    That is, probability is a measure of ignorance, or information, and is not a physical entity.

  3. It’s a nice read. The part about Bob being a winged horse made me curious to see more. I was wondering though if “the frequentist” is a bit of a straw man. Does anyone really define probability this way any more? The only adequate definition of probability with which I am familiar is the modern one coming from measure theory.

    When I read the Bob/Horse example, I have in my mind the set of all horses, of which winged-horses is an empty subset. If, as you say, the logical probability above is 1, then since the empty set is a subset of all sets, “‘bob is a winged horse’ implies ‘bob is a pumpkin'” should also have a logical probability of 1. Is that right?

  4. I was doing great till the quote from Laplace. If probability is a measure of our ignorance then it has to be subjective since my ignorance is different to yours. Or are you using “subjective” in a slightly different way?

    Rich

  5. Rich:
    Probability has to be a measure of ignorance since the unknown is the unknown. The future, past or present event that is not yet understood or realised is still subject to speculation, logical, objective or subjective.
    Probability must be subject to evidence or premises. These may be more or less subjective in their nature. Even well measured data must be subject to error.
    So the probability is calculated from the premise or evidence. The calculation mathematical or logical is not necessarily subjective.
    Makes perfect sense to me.

  6. A minor editing error: the example referred to in paragraph 4 “(the dice throwing argument above is a good example)” has been removed.

    bob

  7. Mr Briggs, I’m glad to be helpful. Just a tip. You don’t need to include everything. If too much is too much, then don’t feel obliged to put it there. Just don’t treat your reader badly, I mean you even promise to give some examples in this sentence:

    most of which are beyond what we can look at in this book

    …but then you escape and close the parenthesis without giving out where we can look at in this book the “least of which”. I feel kinda cheated.

    Sorry for the quibble. I think the rest of the text is excellent. I just don’t like it very much when people assert that theory X is wrong but then again not gonna give ya examples of why because it’s in other books… so why buy this one? A book isn’t a html. Contrast that with Gould’s book “Mismeasure of Man”, where he spends his whole book explaining why his opponents are wrong… while explaining his point.

    And while I don’t know the size of your book, I think you are fully capable of explaining it succintly.

  8. Luis,

    Change “most” to “all”.

    Anyway, I later cover some of the ideas, in great detail, as they apply to actual statistical practice (p-values, confidence intervals, point estimates, parameter reliance, and so on).

    Rich,

    No. Sometimes we have the same—identical—prior evidence and so we must share the same probability of the conclusion. Probability is not subjective.

    Schnoerkelmanon,

    Thanks.

  9. Is the universe made up of just “frequentists” and “logical probablist”? How many other “types” of probablists are there?

    Also it would be intriguing to hear your take on how insurance companies “make their bets.”

  10. Noblesse,

    If it is, then a fortiori, at least according to Tipler, one of the biggest advocates of the MW hypothesis, logical probability is the only explanation because every physical thing, including movement at the very small, becomes deterministic. Go to arxiv.org and look up “Tipler” for me. For example, this paper.

    Bernie,

    Many. Popper has his propensity theory, there are various types of comparative probability, some argue for complexity, there is upper-lower probabilities, fuzzy logic, non-classical (i.e. non-dichotomous) logics, Shaefer and others have belief functions, computer scientists like Halpern have learning functions, and on and on.

    Books and books exist on these topics. I am not convinced that any of them offer more than classical logical probability. Several try to incorporate decision analysis and probability into one “thing”, but I don’t see the necessity of that.

    In statistics, the largest by far are frequentism and subjective Bayesianism. Any other flavors are trivially small (in use) compared to these.

    Insurance is an interesting topic. I hope to cover it soon.

  11. Dear Briggs,

    I recently got my hands on Edwin Jaynes´ article “Where do we stand on maximum entropy” from the late 70s. I was particularly surprised about his description of the history of the dispute between frequentists and the followers of Bernoulli´s original use of probability. We have it all there: personal insults instead of rational arguments, consensus of experts, and so on (interesting similarities to actual debates about forthcoming end-of-world scenarios…). However, we also see intellectual integrity, namely Wald (do the similarities end here?).

    What I would like to know is, does the definition/use of probability by Jaynes match the interpretation you are giving here? For me, it certainly seems that way. And what is the connection to the measure theoretic definition (Kolmogorov axioms, Borel algebras,…)? In the probability theory books I find in our university library, you usually get the axioms and derive tons of results, but no one actually bothers to clarify what probability really means… (it seems we got the wrong books… looking forward to seeing you finishing yours! ;-))

  12. OK. We seem to have gone in a circle. Evidence may be subjective but, once you have it, the probability must be logically determined from it. I’m getting the feeling that I’m examining the crackelure while missing the Mona Lisa. When’s the book out?

    Rich

  13. Nick,

    It’s safe to say that I learned my probability from Jaynes (and similarly-minded authors; I never met him). So, yes, his view and mine are the same.

    Kolmogorov wrote his axioms in terms of events (functions on measurable spaces, sigma algebras, etc.), and Cox wrote his on propositions (plain English statements and so on). Now, it is possible to show that propositions can be mathematically turned into events, so there is a lot of overlap.

    However, typical probability books go right for the math and bypass propositions, so you never get the idea of how probability is useful in these situations. And since infinite sequences are everything in continuous mathematics, you can easily get the idea that probability only works mathematically.

    I highly, heartily recommend Jayne’s book “Probability Theory”. You need at least calculus to read it. My book, which has probability in it, is more concerned with applied probability, i.e. statistics, so I have material that Jaynes does not cover. Of course, I do not prove anything rigorously in my book. No more math than high school algebra is required.

    The writing of my book is finally finished. Now it’s clean up and small detail time!

  14. Rich,

    Not at all. There are several situations where the evidence itself can be deduced (obviously from simpler evidence).

    It might not seem like a big deal, but it is key to understand that once you and I agree on the premises, we must agree on the probability.

    If, say in some political situation, we disagree on the probability of a proposition (e.g. “John McCain will win”) it thus must mean that we disagree on the evidence. If we want to reconcile our probabilities, this means we must reconcile our evidence, piece by piece. This, as we all know, can be a very illuminating process.

  15. Matt:
    For what it is worth, I was thinking about your dice example, the frequentist position and their implicit assumption that “all other things, beyond those specified, have no impact on the outcome.”
    Now for a 6-sided regular dice we can specify a priori the probability that a particular face will be in a particular pre-defined aspect, e.g., for a six sided regular dice, horizontal to surface on which it lands and visible. One can then imagine additional regular solid objects with additional numbers of sides. Again one can calculate a priori the probability that a particular side will be in a particular aspect. As we increase the number of sides though our simple a priori model for calculating the probability of a side taking a particular aspect (i.e. 1/number of sides ) becomes inadequate since there becomes an increasing probability that our regular object might land not on a side but on an edge. This outcome will be more likely if the surface on which the dice lands is not flat and hard and horizontal, e.g., a carpet and on the ratio of the surface area of “edges” to those of “sides”. The implicit assumption of “all other things have no impact” is problematic and it follows that frequentists are more certain than they should be! (Tossing a “fair” coin that lands on a soft or muddy surface also means that the probability of heads or tails is less than .50. For us Brits there is also the case of the 50P coin with facets on the edge of the coin – not to mention the threepenny piece?)

    Insurance underwriting or risk assessment, I would guess, is about specifying all those things that do have an impact on the outcome.

  16. Nitpicking comments on the revised article.

    1.
    Ach. In the earlier part of the article, a bit too much of the text is enclosed in brackets, which makes it difficult for the reader to follow.

    2.
    Is this one of the points you are trying to make in the article? “Since people were focused on gambling and biology, they did not realize that some arguments that have a logical probability [that does] not equal their relative frequency (of being true).” Since it is simply used as a preface to an example, I am not sure if it was important.

    I’m also not sure if there were are any other points you were trying to make? If there were, can you word them such that they are easier to discern. The bright guys here will know what you are talking about, but casual readers such as students are likely to have the same difficulty that I am having. (Alternatively, what sentences are the students supposed to highlight so they can memorise them for the exam?)

    3.
    You homework problem is wonderfully thought provoking – at least as I interpreted it. Was this what you meant?:

    “A current theme in statistics is that we should design our [experiments / methods / actions / data collection] in the modern way but such that [the results] have good relative frequency properties. That is, we should pick an [action / experiment / data collection method] for the problem in front of us that is not necessarily optimal for that problem, but that when this [action / experiment / data collection method] is applied to similar problems the relative frequency of [successful resolution] across the problems will be optimal. Show why this argument is wrong.”

    Note: The last time I reviewed and commented on someone else’s stuff – which happened to be my mother’s MBA thesis – I found out that (a) books can be quite aerodynamic when properly wielded and (b) my mother has disconcertingly good aim.

  17. Steve,

    No, it’s not supposed to be. Jaynes died before finishing the book, so I’d recommend buying the actual book because Larry Bretthorst has extensively edited it. Bretthorst also added some homework problems.

    Be aware though, that you will need a good understanding of calculus to get through the book. Jaynes had a big brain and for some methods he would jump multiple steps through proofs, steps that were probably second thought to him, but for us are not.

  18. Jinnah,

    1. Ok, thanks.

    2. This paragraph especially is for professor types and not so much for students.

    I don’t actually spend much time teaching any of this section in class. Like I said, students are usually happy to believe anything I tell them, and since my course around this book is an introductory one, they will not have seen much or any frequentist material before.

    I also don’t like exams in statistics. You haven’t seen the preface, but I require everybody to collect, store, manipulate, explore, analyze, and explain their own datasets. The final grade is based on this project. I have not discovered any other way to effectively teach statistics.

    3. Yep, pretty much it.

Leave a Comment

Your email address will not be published. Required fields are marked *