There Is No Prior? What’s A Bayesian To Do? Relax, There’s No Model, Either

There Is No Prior? What’s A Bayesian To Do? Relax, There’s No Model, Either

I saw colleague Deborah Mayo casting, or rather trying to cast, aspersions on Bayesian philosophy by saying there is “no prior”.

Bayesians might not agree, but it’s true. Mayo’s right. There is no prior.

There’s no prior, all right, but there’s no model, either. So into the tubes goes frequentism right ahead of (subjective) Bayes.

We here at WMBriggs.com adopt a third way: probability. Probability is neither frequentist nor Bayesian, as outlined in that magnificent book Uncertainty: The Soul of Modeling, Probability & Statistics.

Now specifically, Mayo tweeted “There’s no such thing a ‘the prior’! The onus on Bayesians is to defend & explain the meaning/obtainability of the ones they prefer out of many, types (even within a given umbrella. Is the principle of indifference snuck in?”

Way subjective Bayes works is to suppose an ad hoc, pulled from some mysterious dark realm, parameterized continuous probability model. Which is exactly the same way frequentism starts. Bayes puts ad hoc probabilities on the parameters, frequentism doesn’t. Bayes thus had two layers of ad hociness. Frquentism has at least that many, because while frequentism pretends there is no uncertainty in the parameters, except that which can be measured after infinite observations are taken, frequentism adds testing and other fallacious horrors on top of the ad hoc models.

The models don’t exist because they’re made up. The priors are also made up, and so they don’t exist, either. But a frequentist complaining about fiction to a (subjective) Bayesian is like a bureaucrat complaining about the size of government.

Frequentists and, yes, even objective Bayesians believe probability exists. That it’s a thing, that it’s a property of any measurement they care to conjure. Name a measurement, any measurement at all—number of scarf-wearing blue turtles that walk into North American EDs—and voilà!, a destination at infinity is instantaneously created at which the probability of this measurement lives. All we have to do know this probability is take an infinite number of measurements—and there it is! We’ll then automatically know the probability of the infinity-plus-one measurement without any error.

No frequentist can know any probability because no infinite number of measures has yet been taken. Bayesians of the objective stripe are in the same epistemic canoe. Subjectivists carve flotation devices out of their imaginations and simply make up probability, guesses which are influenced by such things as how many jalapeno peppers they ate the day before and whether a grant is due this week or next month.

Bayesians think like frequentists. That’s because all Bayesians are first initiated into frequentism before they are allowed to be Bayesians. This is like making Catholic priests first become Mormon missionaries. Sounds silly, I know. But it’s a way to fill classrooms.

Frequentists, like Bayesians, and even native probabilists like ourselves can assume probabilities. They can all make statements like “If the probability of this measurement, assuming such-and-such information, is p, then etc. etc. etc.” That’s usually done to turn the measurement into math, and math is easier to work with than logic. Leads to the Deadly Sin of Reification too often, though; but that’s a subject for another time. Point is: there is nothing, save the rare computational error, wrong with this math.

Back to Mayo. Frquentists never give themselves an onus. On justifying their ad hoc models, that is, because they figure probability is real, and that if they didn’t guess just the right parameterized continuous model, it’ll be close enough the happy trail ends at infinity.

Only infinity never comes.

You’d think that given all we know about the paradoxes that arise from the paths taken to reach infinity, and that most measurements are tiny in number, and that measurements themselves are often ambiguous to high degree, that frequentists would be more circumspect. You would be wrong.

The Third Way is just probability. We take what we know of the measurement and from that deduce the probability. Change this knowledge, change the probability. No big deal. Probability won’t always be quantifiable, or easy, and it won’t always be clear that the continuous infinite approximations we make to our discrete finite measurements will be adequate, but mama never said life was fair. We leave that to SJWs.

If I had my druthers, no student would learn of Bayes (I mean the philosophy; the formula is fine, but is itself is not necessary) or frequentism untill well on his way to a PhD in historical statistics. We’d start with probability and end with it.

Maybe that’s why they don’t let me teach the kiddies anymore.

Update I’ll have an article on the Strong law and why it doesn’t prove probability is ontic, and why using it to show probability is ontic is a circular argument, and why (again) frequentism fails. Look for it after the 4th of July week. Next week will be mostly quiet. If you’re in a hurry, buy the book!

13 Comments

  1. Hi,

    Regarding: “No frequentist can know any probability because no infinite number of measures has yet been taken.”

    You don’t need an “infinity” to talk about probability. This is one of the common fallacies against frequentist probability.

    The Strong Law of Large Numbers says that it is almost certain that between the mth and nth observations in a group of length n, the relative frequency of Heads will remain near the fixed value p and be within the interval [p-e, p+e], for any e > 0, provided that m and n are sufficiently large numbers. That is, P(Heads) in [p-e, p+e] > 1 – 1/(m*e^2). Sufficiently large…no infinity needed. I think most rational people would conclude that after 100 trillion flips of a coin, we have a good handle on the probability of heads of that or a similar coin, and 100 trillion flips of a coin is still less than infinity.

    If you want “exact probability”, I’d say there is no exact in the real world. But we can probably get closer than any measuring device can tell the difference. Tell me an e>0, and I can tell you an m needed to achieve that (which will still be less than infinity). Much like an integral and partial sums. Your computer program is adding up smaller and smaller but more and more rectangles under a curve, not an infinite amount, and those approximate sums sure seem to be OK in the real world even if they aren’t “exact”. Are you really bickering about .5000000000000000000001 vs .5 ?

    The relative frequency sure seems to settle down to a flat line pretty fast when you flip a coin and keep track of the results. Why is that? What is it settling down to then?

    Thanks,
    Justin

  2. DAV

    I think your definition of Bayesians is confusing. The difference between Bayesians and Frequentists lies in their view of probability which fundamentally has nothing to do with statistical model construction. Your Third Way is just the Bayes Way sans unobservable model parameters. Perhaps you should be calling converted Frequentists ex-Frequentists and end this double meaning of Bayesian?

  3. DAV

    Justin,

    To a Frequentist, probability is what pops out of the Central Limit Theorem. A limit is when you take something to infinity. It’s still, though, a ratio between relative counts. To a Bayesian, probability is a measure of certainty which may lead you to deducing a relative count but can also be applied to one-time events.

  4. Michael Dowd

    Is frequentism similar to a modal average?

  5. Dav, it can be applied to one-time events, but then a decent question is, how does it perform if we repeated the experiment again or over and over. That is, does this Bayesian procedure have good frequentist properties.

    Justin

  6. JohnK

    I’m beginning to understand that nothing more weighty than a blank incuriosity is among the (numerous) reasons Matt is not allowed to teach the kiddies anymore. After all, Probability (ontologically) exists because it Just Does.

    That the entire edifice amounts to a series of incantations, spells, appeals to Authority, half-remembered things from Stats class, doesn’t — can’t — even come up. The question, “Does a peach have probability?” is unable even to be formulated.

    The Standard Positions (that is, both frequentism and ‘Bayesianism’) are beyond reproach, precisely because they are beyond ridicule. It would merely irritate the Adept to exclaim to them, “So, a peach ‘has’ no probability, it is not part of its ‘peachiness’, but when two or three peaches are gathered in Probability’s name, there it is in the midst of them?”

    After all, Nassim Taleb explained it all thoroughly — and he banned Matt from tweeting anything at all back to him for very, very good reasons.

    Because Nassim Taleb is a genius. And Matt is not even allowed to teach the kiddies anymore. QED.

    The current edifice weighs down everything. By and large, careful analysis, logic, examples, ridicule, merely bounce off. The Central Limit Theorem proves that a peach has probability. Next question.

    As Matt has documented, cracks in the Giant Machine are beginning to appear. Matt has scientific colleagues, even friends. But the Giant Machine is so very very convenient to so much of Modern Times.

    The very real question is, Why bother to get it (more) right? Is there more money, more power, more influence, more status, in getting it more right in this present moment?

    The answer is obvious.

    I think it comes down to cool. Things will change when the kiddies think it might be cool to wonder, “Does a peach have probability?”

  7. Ed

    Justin says:
    “The Strong Law of Large Numbers says that it is almost certain that between the mth and nth observations in a group of length n, the relative frequency of Heads will remain near the fixed value p and be within the interval [p-e, p+e], for any e > 0, provided that m and n are sufficiently large numbers. That is, P(Heads) in [p-e, p+e] > 1 – 1/(m*e^2). Sufficiently large…no infinity needed. I think most rational people would conclude that after 100 trillion flips of a coin, we have a good handle on the probability of heads of that or a similar coin, and 100 trillion flips of a coin is still less than infinity.”

    This is a common mistaken belief about the various laws of large numbers, the strong law of large numbers uses the probability p of a trial to derive propositions about the frequencies of an INFINITE ensemble of possible FINITE sequences of trials. When YOU have an ACTUAL sequence of trials with a certain frequency of successes and make the inference from this ACTUAL value to the probable value of p you are actually using a prior derived from the infinite ensemble of finite sequences.

    More precisely you are making the following inference: the probability that my actually obtained frequency is one of the finite sequences whose obtained frequency is inside the interval [p-e,p+e] is the fraction(here’s your prior) of finite sequences in the INFINITE ensemble of finite sequences whose frequencies lie inside the interval [p-e,p+e], such fraction is > 1-1/(m*e^2).

    As our gracious host is fond of saying: every frequentist is a secret bayesian.

  8. DAV

    Justin,

    It has no frequentist properties at all. It has nothing to do with counts. In fact, one-time events don’t repeat so asking what would happen when repeated is a meaningless question.

    An example, if you have something that can be in one of three states (call them A, B and C) that can change over time — and that’s all you know — what is the probability (your certainty) that the very next state change will result in C? A Frequentist can’t answer but a Bayesian will reply 1/3. Why? Because, lacking any additional information, one can’t have more or less certainty in state change resulting in C than any other state so the certainty in outcome must be equally divided among all possible states. The only thing counted is the number of possible outcomes.

    Later information could change the certainty in outcome between the states.

    Another example (without numbers) is what is the probability the Lakers will win their next game? Since no two games are identical, the next game is a one-time event. All a frequentist can do is answer what the win/loss ratio might be over a number of identical games or perhaps, similar games –whatever that would mean — and assume it somehow applies to the next game. How that ratio is obtained would be a good question.

  9. Bill R

    Probability is the future tense of a proportion.

    Why not drop that too and go straight to statistics without probability? (E.g. Wasserman’s blog or the French school) Treat everything as permutation or combinations of orders, with the occasional metric thrown in for distances.

  10. Ken

    Every tool is useful for something, to some degree, some more or less than others. People understand this. When atool isn’t suitable, they get a better tool for the job. Sometimes they know the tool isn’t suitable, but will work well enough anyway. A screwdriver can make a good chisel even though that’s not its designed-for use, but for some jobs a genuine chisel is required.

    Arguing if a given tool (Bayes, Frequentism, etc.) is any good is petty. What matters are tool’s pros/cons/limits of suitability. Observing the do-it-your-self-er relied too often on a screwdriver when they should’ve used a chisel is not a reflection on the tool used. But it is that kind of backward-looking reasoning that Briggs is arguing justifies that screwdrivers should be abandoned! That’s philosophy, the wrong philosophy, creeping in where it has no business — like an overprotective mother seeing how certain children get into conflicts and rather than having the kids learn how to get along and work things out, refuse to let them play together.

  11. Ed,

    “Justin says:

    “The Strong Law of Large Numbers says that it is almost certain that between the mth and nth observations in a group of length n, the relative frequency of Heads will remain near the fixed value p and be within the interval [p-e, p+e], for any e > 0, provided that m and n are sufficiently large numbers. That is, P(Heads) in [p-e, p+e] > 1 – 1/(m*e^2). Sufficiently large…no infinity needed. I think most rational people would conclude that after 100 trillion flips of a coin, we have a good handle on the probability of heads of that or a similar coin, and 100 trillion flips of a coin is still less than infinity.”

    This is a common mistaken belief about the various laws of large numbers, the strong law of large numbers uses the probability p of a trial to derive propositions about the frequencies of an INFINITE ensemble of possible FINITE sequences of trials. When YOU have an ACTUAL sequence of trials with a certain frequency of successes and make the inference from this ACTUAL value to the probable value of p you are actually using a prior derived from the infinite ensemble of finite sequences.”

    Sorry, I don’t know what you’re saying here with the last sentences (not being snarky, I really don’t know). I got it from von Mises’ book. If I have an actual sequence of 1s and 0s, I can compute the mean at each trial and graph it. I notice this graph settles down more and more as the (finite) trials increase, no matter what the true p is. If I repeat this process many times, the settling down almost always occurs.

    How do you explain the settling down that occurs (not just with simulation, but also in reality with physical coin flips)?

    p. 4 of this http://www.dklevine.com/archive/strong-law.pdf has “Using some Probability Theory, the Strong Law can be rewritten into a form with probabilities involving ?nitely many random variables only.”, so I’m not totally sure if you need infinity here,

    Thanks,
    Justin

  12. “Justin,

    It has no frequentist properties at all. It has nothing to do with counts. In fact, one-time events don’t repeat so asking what would happen when repeated is a meaningless question.”

    Well, if not identical situation, maybe a very similar situation.

    “An example, if you have something that can be in one of three states (call them A, B and C) that can change over time — and that’s all you know — what is the probability (your certainty) that the very next state change will result in C? A Frequentist can’t answer but a Bayesian will reply 1/3. ”

    Why? Because, lacking any additional information, one can’t have more or less certainty in state change resulting in C than any other state so the certainty in outcome must be equally divided among all possible states. The only thing counted is the number of possible outcomes.”

    Yeah, I’m not sure I’d have enough information to make a decision, unless I make an assumption and model it.

    “Another example (without numbers) is what is the probability the Lakers will win their next game? Since no two games are identical, the next game is a one-time event. All a frequentist can do is answer what the win/loss ratio might be over a number of identical games or perhaps, similar games –whatever that would mean — and assume it somehow applies to the next game. How that ratio is obtained would be a good question. ”

    True, or model the individual players on the teams is another approach. I’d say that while the Bayesian can model and plug and chug and get a number, it might not be a “probability”, maybe more like a “personal chance”. That is, I probably wouldn’t have much confidence in their (or mine, or any) analysis with such small n involved.

    If they had good historical success with such an approach I may be more interested in believing it,

    Justin

  13. DAV

    I’m not sure I’d have enough information to make a decision, unless I make an assumption and model it.

    Yes. You can add things to premises which will change things but, if you are adding assumptions, you are adding guesses and not information. How does that help?

    while the Bayesian can model and plug and chug and get a number, it might not be a “probability”, maybe more like a “personal chance”. That is, I probably wouldn’t have much confidence in their (or mine, or any) analysis with such small n involved.

    You seem to be insisting on the Freqentist definition of probability: a ratio of counts. To a Bayesian, it’s a measure of confidence (certainty of outcome). Why wouldn’t you have confidence in your confidence measure?

    If they had good historical success with such an approach

    What do you mean by “historical success”? How does one measure success in a confidence level?

Leave a Reply

Your email address will not be published. Required fields are marked *