It is well to collect cogent proofs of frequentism’s failings so that supporters of that theory can look upon them and find joy.
Alan Hájek has done yeoman service in this regard with two papers listing 30 arguments against the relative frequency theory of probability.1,2 These do not exhaust the criticisms, nor are all (as he admits) strong, but they are a good start. Today we’ll draw from his “Fifteen Arguments Against Hypothetical Frequentism” and include some of my own. Hájek defines hypothetical relative frequentism as:
The probability of an attribute A in a reference class B is p [if and only if] the limit of the relative frequency of A’s among the B’s would be p if there were an infinite sequence of B’s.
Below is my numbering, not Hájek’s. I skip some of his more technical criticisms which are not of interest to a general audience (such as those referring to Carnap’s “c-dagger” or to facts about uncountable sets) or are not quite the ticket (about different limits for a named sequence, as I think these mix up causality and evidence of the same). I also do not hold with his alternative to frequentism, but that is another matter. This list is also not complete, and essays could be written for each point, but this is enough to get us started.
Before we begin, the natural question is why does it seem that frequentism sometimes works? The answer: why does any approximation work? When frequentist methods heed close to the real definition of probability, they behave well, but the farther away they venture, the worse they get. It is not as if frequentists are bad people trying to pull the p-values over people’s eyes. It’s that they relying on a theory which has no bearing to reality, a harsh claim justified below.
Most “frequentists” implicitly know this, and tacitly and unthinkingly reject the idea of infinite sequences in practice without realizing that they have kicked out their theoretical support, i.e. that they are not using frequentism. Plus, most users of statistics haven’t the training to know details of the theory which guides them. They have memorized “Wee p-values are good”, which is all that is needed for success.
When rebutting, be sure not to invoke the So’s-Your-Old-Man fallacy, which in this case would have the form, “Oh yeah? Frequentism may stink, but what about improper priors, fella!” You have not proven frequentism is swell because some other version of probability has failings; indeed, you have admitted frequentism is dead. Help us by using the numbering, too.
1 In order to know the probability of any proposition, we have to observe an infinite sequence. There are no observed or observable infinite sequences of anything. We can imagine such sequences—we can imagine many things!—but we can never see one. Therefore, we can never know the probability of any proposition.
Hájek: “any finite sequence—which is, after all, all we ever see—puts no constraint whatsoever on the limiting relative frequency of some attribute.” The finite observed sequence may equal 0.9, but the limit may evince 0.2. Who knows? As Keynes famously said about waiting to know a frequentist probability in the long-run, i.e. the preferred euphemism for infinity, “In the long run we shall all be dead.”
In order to imagine an infinite sequence, we also, as Hájek emphasizes, must imagine a universe “utterly bizarre” and totally alien to ours. “We are supposed to imagine infinitely many radium atoms: that is, a world in which there is an infinite amount of matter (and not just the 1080 or so atoms that populate the actual universe, according to a recent census).” Universes with infinite matter are impossible (not unlikely: impossible) on any physics that I have heard of, but which are required is frequentism is true.
If you do not see this first criticism as damning, you have not understood frequentism. You have said to yourself that “Very large sequences are close enough to infinity.” No, they are not. Not if frequentism is to retain its mathematical and philosophical justification.
As you’ll see, the main critique of frequentism is that it confuses ontology and epistemology, i.e. existence with knowledge of the same.
2 If our premises are E = ‘This is an n-output machine with just one output labeled * which when activated must show an output, and this is an output before us’, the logical probability of Q = ‘An * shows’ is 1/n. A frequentist may assert that probability for use in textbook calculations (e.g. which he often does, say, in demonstrating the binomial for multiple throws of hypothetical dice), but in strict accordance with his theory he has made a grievous error. He has to wait for an infinite sequence of activations first before he knows any probability.
The only way to get started in frequentism is to materialize probability out of thin air, on the basis on no evidence except imagination. Probabilities may be guessed correctly, but never known.
3 In the absence of an infinite sequence, a finite sequence is often used as a guess of the probability. But notice that this is to accept the logical definition, which in this case is, given only E = ‘The observed finite relative frequency of A’ the probability of Q = ‘This new event is A’ is approximately equal to the observed relative frequency. (Notice that both Bayes and logical probability have no difficulty taking finite relative frequencies as evidence.)
For a frequentist to agree to that, he first has to wait for an infinite sequence of observed-relative-frequencies-as-approximations before he can know the probability that P = ‘Pr(Q | E) is approximately equal to the observed fine relative frequency’ is high or 1. Nothing short of infinity will do before he can know any approximation is reasonable. Unless he only takes a finite sequence of approximations and uses that as evidence for the probability all finite sequences are good approximations, but then he is stuck in an infinite regress of justifications.
4 Hájek: “we know for any actual sequence of outcomes that they are not initial segments of collectives, since we know that they are not initial segments of infinite sequences—period.” This follows from above: even if we accept that infinite collectives exist, how do we know the initial segments of those collectives are well behaved? “It is not as if facts about the collective impose some constraint on the behavior of the actual sequence.”
If hypothetical frequentism is right, to say any sub-sequence (Von Mises’s more technical definition relies on infinite sub-sequences embedded in infinite sequences, which is a common method in analysis; here I mean finite sub-sequence) is “like” the infinite collective is to claim that the infinite collective, which is not yet generated, “reaches back” and causes the probabilities to behave. And this is impossible. In other words, something else here and now is causing that sequence to take the values it does, and probability should be a measure of our knowledge of that here-and-now causality.
5 Hájek: “For each infinite sequence that gives rise to a non-trivial limiting relative frequency, there is an infinite subsequence converging in relative frequency to any value you like (indeed, infinitely many such subsequences). And for each subsequence that gives rise to a non-trivial limiting relative frequency, there is a sub-subsequence converging in relative frequency to any value you like (indeed, infinitely many subsubsequences). And so on.”
And how, in our finite existence, do we know which infinite subsequence we are in? Answer: we cannot. The problem with infinities is anything possible can and will happen.
6 Our evidence is E = ‘One unique never-before-seen Venusian mlorbid will be built. It has n possible ways of self-destructing once it is activated. It must be activated and must self-destruct. X is one unique way it might self-destruct.” The probability of Q = ‘X is the way this one-of-a-kind mlorbid will self-destruct’ is unknown, unclassifiable, and unquantifiable in frequency theory. In logical probability it is 1/n. Even if we can imagine an infinite collective of mlorbids, there is no way to test the frequency because Venusians build no machines. No sequence can ever be observed.
“Von Mises famously regarded single case probabilities as ‘nonsense’ (e.g. 1957, p. 17).” Yet, of course, all probabilities are for unique or finite sequences of events.
David Stove listed this as a key criticism against frequentism. The sequence into which a proposition must be embedded is not unique. Take Q = ‘Hillary Clinton wins the next presidency.’ Into which sequence does this unambiguously belong? All female leaders? All female elected leaders? All male or female leaders elected in Western democracies? All presidential elections of any kind? All leadership elections of any kind? All people name Hillary with the tile of president? And on and on and on. Plus none of these can possibly belong to an infinite collective. Of course, if probability is logical, each premise naturally leads to a different probability.
7 Hájek: “Consider a man repeatedly throwing darts at a dartboard, who can either hit or miss the bull’s eye. As he practices, he gets better; his probability of a hit increases…the joint probability distribution over the outcomes of his throws is poorly modeled by relative frequencies—and the model doesn’t get any better if we imagine his sequence of throws continuing infinitely.”
Have to be careful about causality here, but the idea is sound. The proposition is Q = ‘The man hits the bull’s eye.’ What changes each throw is our (really unquantifiable) evidence. The premises for the n-th throw are not the same as for the n+1-th throw. Hájek misses that in his notation, and lapses in the classical language of “independence”, which is a distraction. The point is that each throw is necessarily a unique event conditioned on the premise that practice brings improvements. The man can never go back (on these premises) so there is no way to embed any given throw into a unique infinite collective.
8 Our Q = “If the winter of 1941 was mild, Hitler would have won the war.” There are many ways of imagining evidence to support Q to varying degrees (books have been written!). But there is no relative frequency, not infinite and not even finite. No counterfactual Q has any kind of relative frequency, but counterfactuals are surely intelligible and common. A bank manager will say, “If I had made the loan to him, he would have defaulted”, a proposition which might be embedded in a finite sequence, but the judgement will have no observations because no loans will have been made. The logical or Bayesian view of probability handles counterfactuals effortlessly.
9 If the evidence is E = ‘Quite a few Martians wear hats and George is a Martian’ the probability of Q = ‘George wears a hat’ is not uniquely quantifiable and has no frequency infinite or finite. But it has a logical probability (which isn’t a single number). Most evidence comes to us vaguely and it is stated ambiguously such that unique probabilities are impossible to assign.
Update Note to the mathematically minded, especially in regards to criticisms 1-3. If we assume we know a probability, we can compute how good a finite approximation of that probability is, which is essentially what frequentist practice boils down to. But since, if frequentism is true, we can never know any probabilities, we can never know how good any approximation in practice is. Frequentism means flying blind while saying, “Ain’t that a pretty view?”
———————————————————–
1Hájek, A. (1997). ‘Mises Redux’—Redux: Fifteen arguments against finite frequentism, Erkenntnis 45, 209–227.
2Hájek, A. (2009). Fifteen Arguments Against Hypothetical Frequentism, Erkenntnis 70, 211–235.
Even though I am not sure if I really know what frequentism is, and even spell check doesn’t recognize it, it would seem that some probabilities can be empirically tested. The Monty Hall problem, for example, or intersecting chords for another.
Scotian,
Good point. They can—sort of. Decisions can be checked. Assuming no errors in calculation, probabilities given fixed evidence are true and don’t need checking.
The Monty Hall problem is a case where there are two different sets of evidence, both of which give true probabilities. But one of these sets of evidence gives a true probability that leads to better decisions.
The two sets, incidentally, are found on the Classic Posts page (search Monty Hall).
These so-called criticisms, holdovers from logical positivism, have nothing to do with frequentist statistical inference. Probability models are useful for frequentist inference insofar as they let us compute relative frequencies. In problems of statistical inference, these are relevant for assessing error probabilities and error rates, which are relevant, in turn, insofar as they enable us to assess and control the capabilities of methods to detect and discriminate errors and biases in inference. People shouldn’t confuse frequentist statistics with the so-called frequentist definition of probability. That’s why I prefer focusing on the key feature of this approach to inference (the focus on error probabilities and error rates), rather than the defn of prob.
Mayo,
Not only are these criticisms “so-called”, they are actual genuine sold-gold criticisms, too. (With which it appears you agree?) And they have everything to do with frequentist statistical practice, which draws on frequentist probability theory for its derivations.
But you say not. Could you then provide a concrete example of a frequentist statistics procedure you think well done? And how it does not rely on the frequentist definition of probability?
Briggs
How does a poor non-professional-statistician make decisions? The whole Bayesian construct seems to depend on a prior decision (pun intended) – the subjective/objective-prior debate seems to illustrate this.
You may rail, but us simple (maybe over-qualified) chemists need a tool (sharp or not!). We accept (or, at least. allow for) your strictures about p-values, but how do we test hypotheses? Simply? Without vanishing up our own philosophical fundamental apertures?
Frequentism is, at least (almost), understandable.
Hamish,
Good questions. How do you decide whether to carry an umbrella?
Frequentism practice is surely understandable; take, say, p-values. Wee? Then do X. Not wee? Then do Y. All the work is done for you.
With that comes over-confidence and mistakes. The point is that decisions are made easier than they should be.
But perhaps this discussion would do better with an example. Have you one in mind?
Most days, it has rained in London in the winter. Most days, it has not rained in London in the summer. I tend to carry an umbrella in the winter. Sometimes, I’m wrong – I get wet. The penalty for being ultra -cautious (and here speaks experience) is leaving the umbrella on the train. No mention of Bayes. Oh, I ignore the Met Office – too much experience!
Hamish,
Just so. No mention of Bayes needed. No need of quantifying unquantifiable probabilities for decisions with unquantifiable consequences. Yet still you were able to make decisions, and even get good at it.
I of course espouse the logical view of probability, and not Bayes per se. It more naturally fits in with real-life decisions because it doesn’t insist on quantification. But when a situation can be quantified, it tends to agree with the objective Bayes (though it is fuzzier on the edges).
Frequentism is a contributor to scientism, as is Bayes most of the time. All is not number.
But again, maybe we’d do better with a more “scientific” example?
Briggs
Your first paragraph, I thoroughly agree.
Your last paragraph, not so much. “Scientism”? (from your third?).
Briggs:
Ah, that just clicked into place for me.
And that really brings it home. I mostly follow the rest but stumble at (8), particularly the concluding sentence. How would the logical or Bayesian view better handle “the judgement will have no observations because no loans will have been made”?
From your response to Hamish:
I would like to see a real world example where the Bayesian approach produced a more skilful prediction than a frequentist method over the same observables. If you’ve already done so in a prior post, please feel free to direct my question there.
It would also be great if you knew a real world example where either approach produced similarly skilful predictions.
Thanks.
Brandon, Hamish,
In the interest of time (I’m tied up at moment), this for an example.
http://arxiv.org/abs/1201.3611
But even I don’t love all of it anymore.
B,
You can’t observe what you don’t do. If you make no loans worried about defaults, you’ll never know where you were wrong, i.e. where the loan would have been repaid.
Thanks for the link Dr. Briggs. Re: tied up; of course. Consider my now explicit request for a future post as your time and interest permits.
Not-observed was the point I was driving at, and why I am tripping on, “The logical or Bayesian view of probability handles counterfactuals effortlessly. ” The implicit argument is that a Bayesian approach would yield better loan default risk given the same credit history data. I’m not groking the “how” or “why” … or even sure that’s your argument.
Briggs
Hey, we’re human. We can’t observe infinite sequences. (How many of the arguments above use the word[s] ‘infinit[~]?) The best we can do is take a sample (=use what we can) – what we (given our limited lifespan &c.) can see (pick your tense).
Come back to my original question (accepting all your caveats). What’s a poor chemist to do? I ask this in all humility, and genuinely. It would seem to me that frequentism (allowing for obvious pitfalls), helps. Assistance (in a chemical issue) is all I ask. If it doesn’t work for the umbrella, it doesn’t work.
Bayes vs. frequentism may provide intellectual satisfaction. How does it help me with the ‘umbrella’ problem (which you introduced)?
Hamish,
Let’s have more particulars on the example you have in mind.
Brandon, Hamish
I really enjoy this blog when Briggs talks about probability. Very insightful! Nevertheless, when you ask a direct question he never answers, as in never.
Joe,
Which question do you think is unanswered?
Update Unless you think it’s the umbrella question. But Hamish has answered it himself! Complete and with no mathematics he has hit upon the very solution when to carry and when not. No p-values were needed, no posteriors. What more could you want about carrying an umbrella?
About the chemistry, I haven’t heard a direct question yet.
Joe: I’ve gotten at least 3 answers to direct questions, and seen him respond to others direct queries as well. So never is overstating. “Less frequently than one might wish” is more defensible. But what one wishes Briggs to do is not for Briggs to control. 🙂
Cheers.
Briggs: unanswered for me, “The implicit argument is that a Bayesian approach would yield better loan default risk given the same credit history data. I’m not groking the “how†or “why†… or even sure that’s your argument.”
But I don’t wish to ever be demanding.
Brandon,
I must be explaining badly. I don’t advocate Bayesian solutions per se. I am not a Bayesian (except in the sense that I use the math). I do say, and have said many many times, not all probability is quantifiable. Not all risk is quantifiable. Not all loss and gain is quantifiable. But that does not mean we cannot reach some kind of understanding, as we do in real life.
I often have the idea that people are looking for a New & Improved! formula which accepts data and spits out superior answers. What I’m selling is the sad news that such a formula doesn’t exist. Not always. That we are over-certain in too many things, that we grasp onto and invent quantities simply because they’re quantities. This is scientism.
“But we have to make a decision. So what do we do?”
Well, what decision? And how are you deciding now?
Briggs, I’m reading your paper right now. Background already gained from your paper advocating for a switch to Bayes theory in curriculum helps.
My math redardation is real and working against me, very much mitigating “badly”. Your past use of Bayes vs. frequentist probability theory in statistics in the context of polemics against scientism particularly compounds things for me — hard to keep biases at bay and remain as dispassionately objective as I’d like on those posts.
Not your problem, mine. Among many the many reasons I come here is to specifically to work at being more rational on controversial topics.
PS: “That we are over-certain in too many things, that we grasp onto and invent quantities simply because they’re quantities. ”
Yes, 100% with you on that; was already there before I started reading you. The way you’ve argued it in the past has been most welcome and helpful.
At 4:19p.m. 5Jun’14, and elsewhere, I think you answer my question (obliquely, it’s true). I could give you ~137 specific chemical questions, but I think you’d say “use your commonsense – not everything is quantifiable”. I’m sure you’d cut me to ribbons on most of them. You haven’t (to me) made a case for Bayes, but that doesn’t matter.
Chemists basically work with models – the simpler, the better – we’re dreadfully practical/pragmatic/empirical/whatever. My question was always general – the umbrella works for me.
Late here, so beddy-byes. See you in the morning.
Hamish:
Which I think may be a valid argument for frequentist approaches over Bayesian. But only if the maths are easier, and then only if the easier maths do not unduly degrade the robustness of conclusions.
But I personally can’t ask those questions without understanding the maths better than I do, so I haven’t yet asked.
Hamish,
Well, I haven’t here, in this post, advocated for Bayes. I’ve shown frequentism is faulty and fallacious.
But email with a paper or question. We can go over it.
B,
If frequentism is faulty, then why do its methods “work”? Like I said above, it’s when it approximates what is right. That’s what we’re after.
Briggs,
Yep, totally with you on that, been there for some time now. Why use one over the other is my question. I suspect ease of calculation might be one, but not sure. I may have more homework to do before I could parse other answers.
Hamish,
You’ll be up before I am, but briefly and without knowing anything about your problem:
1) Use whatever “objective” Bayesian procedure is usual with real probability priors (no “flat” ones).
2) Use the model in its “predictive” sense, i.e. integrate out the uncertainty of the parameters
3) Create “scenarios” of your observable variables and ascertain what they mean to the uncertainty in the “Y”.
See my book (link on right), or the Classic Posts page for predictive stats.
William
Universes with infinite matter are impossible (not unlikely: impossible) on any physics that I have heard of
This is incorrectly formulated.
Finite universes can not contain infinite matter even if they can contain infinite matter densities.
Admitting the finite life of our Universe (e.g a beginning) , the observable Universe is precisely an example of a finite Universe (size some 26 billions light years) which cannot contain infinite matter.
However there is no certitude that the Universe (without the adjective observable) is finite so that it could very well also have infinite matter.
This doesn’t invalidate your argument. It only compels to make it more accurate – to measure a property of an infinite collection of atoms, one would have to go beyond the observable Universe what implies a faster than light velocity.
As this is imposible according to special relativity, it is irrelevant whether the Universe contains infinite matter or not. The only relevant facts are restricted to the finite observable Universe and this one has necessarily finite matter.
Tom,
Thank you.
Dear Doc:
I have to disagree – guess I’m a hate filled bigot after all! – again. Mostly I either agree with you or know that I don’t understand, so this is a bit unusual – however, it seems to me that the critique of pure frequentism you offer here amounts to saying that it’s wrong because it’s useless except, perhaps, as an example of the impractical.
That’s wrong – it’s logical and reasonable and perfectly consistent, just not about probability as most people understand it. Basically, if probability were defined a la Cournot et al, all estimates for any P(e) would approach the same limit: 0.5. With all due respect to Poisson there is nothing fishy about that because “p=0.5” can be interpreted either as saying that we have no information or as saying that all outcomes are equally likely – and both are direct consequences of assuming anything infinite exists.
In “The art of the endgame” Paul Keres says, “A bad plan is better than no plan at all”. If I remember correctly, his reason is that a plan will focus your efforts and provide an organising principle which will make your play more effective than just random moves. Maybe we can say the same about frequentism. At least you get to make a decision instead of just sitting by and pointing out that the uncertainty is much larger than we thought.
The same can be said for astrology and reading tea-leaves. Effort is focused instead of diffused and nobody can be sure of the right answer anyway.
The problem in this case is that nobody sees frequentist statistics as a handy though flawed way out of indecision. They see it as The Answer. As they do with astrology and tea-leaves of course.
Pingback: Nine Counter-Arguments To Frequentism | Skills ...
Briggs, the kind of “frequentism” that Hajek defines (in order to tear down) seems to me to be a bit of a straw man.
According to Wikipedia in a quote (unreferenced!) from Kolmogorov:
“The basis for the applicability of the results of the mathematical theory of probability to real ‘random phenomena’ must depend on some form of the frequency concept of probability, the unavoidable nature of which has been established by von Mises in a spirited manner.” It may be that von Mises or someone else actually proposed the ill-posed definition that Hajek tears down but I think it has been well out of the mainstream for a long time, and I think the intent of Kolmogorov (at least as interpreted nowadays) was not to endorse “limiting frequency” as a *definition* of probability but rather as a condition whose apparent violation in a large sample would by the Law of Large Numbers be expected to be an extremely rare occurrence.
DMayo has noted above that the main emphasis of frequentist statistics is on the study of error rates, and I would go so far as to insist on the idea that a statement of “probability” only makes sense in situations where it can be tested against a sequence or ensemble of experiments or games. This does not claim to define probability as anything other than an *expected* frequency (which may in turn be based on other expected frequencies starting if you like with some assumption of equally likely cases or other “priors”). The only quarrel such a position has with Bayesianism or with your “logical probability” theory is in cases where the probability claim is not open to some kind of frequentist test. It seems to me that you and the Bayesians will always agree with the frequentists whenever the frequentists claim to have an answer, and so it is you rather than the frequentists who is more likely to claim the knowledge of probabilities when such a claim is without foundation.
P.S. A frequentist of the kind I imagine is not precluded from betting on an event like a presidential election because, even though the event “Hilary Clinton wins in 2016” is not testable by repetition, the event “I make a correct political prediction” or, more fully, “When I feel my current level of confidence in such an event it comes to pass with relative frequency p” is so testable and so can be used as the basis for a rational wager.
For readers wanting the other side of the story here, I give a defense of frequentism over at https://rgmccutcheon.wordpress.com/2015/02/09/hajeks-fifteen-non-starters-against-hypothetical-frequentism/