Statistics

Bayesian Theorists Were Little Better Than Cranks

I stole today’s title from David Papineau’s essay “Thomas Bayes and the crisis in science“, which many readers sent in.

When I was in grad school bad in the early to mid 1990s, Bayes was just off its flush of becoming respectable, which occurred mostly in the 1980s. But then, as now, and as you’ve all heard me lament before, all statisticians must first be initiated into frequentism. As such, they find it difficult to overcome. The experience is not unlike trying to leave the religion of your youth. Sure, you can stop practicing it. But you can never stop feeling its influence.

This is why you still hear from self-styled Bayesians admonitions to develop Bayesian procedures with “good frequentist properties”, which is (a) begging a Simpson’s paradox-type situation, and (b) incoherent. If Bayes is right (about which sense more in a moment), then it’s always right and frequetism wrong, and vice versa. The two are not compatible philosophies of probability.

See Uncertainty: The Soul of Modeling, Probability & Statistics for more on all this, incidentally.

Anyway, Bayes has three interpretations. The subjective which says, and I do not jest, probability is a function the indigestibility of your food. The probability of any proposition is how you feel about it. It is therefore an effeminate philosophy (do not confuse feminine with effeminate). The objective, which is frequentist in character, and which thinks probability is ontic. This is a mistake. And then the logical, which says probability is epistemic. This is the correct view (which is not really called “Bayesian” by anybody, though people use it that way). I’m not proving this here: I’m telling you. Read the book for arguments.

The importance of Bayes is not—as I have stressed hundreds of times, to little avail—is not in the formula. It is not strictly needed, not ever. It is nice, it is helpful. But that is it. What we always want is

     Pr(Y | X)

where Y is the proposition of interest and X is the totality—I’d shout this if I thought it would do any good—of evidence. This probability is not always quantifiable. Tough cookies. How we get to Pr(Y | X) is only of interest to technicians, and is where the formula might be of use. But it is always beside the point.

Which means all the ya-ya-ya about “updating beliefs” is beside the point. First, subjective probability is wrong, and second, the update is a technical matter. What always counts is the totality of evidence you accept. And the evidence you accept is not necessarily the same as I accept—or the same as anybody else accepts. Hence disputes. Probability is only a dull function of the evidence accepted.

The real revolution in Bayesian thought is that everything uncertain can be assigned a probability, though not always in number. There is nothing wrong with that sentiment, and everything right. But like I just said in other words, it is the evidence which counts. And only the evidence. The math connecting evidence to probability (the least interesting aspect) we can leave to geeks and nerds.

This is why we know statements like the following (from the article) are false in the strict sense:

Bayes’s reasoning works best when we can assign clear initial probabilities to the hypotheses we are interested in, as when our knowledge of the minting machine gives us initial probabilities for fair and biased coins.

No. What works best is assembling the evidence that comes closest to showing the cause of the proposition of interest Y. The wrong wrong has already been chosen, as we see by the next sentence “But such well-defined ‘prior probabilitie’ are not always available.”

We don’t need “prior probabilities” on the theory that some thing causes heart attacks. We need evidence that it does or doesn’t. Sometimes we start out ignorant. So what? We build evidence from that ignorance.

Thinking Bayes is a panacea, or a universal formula, is why die-hard frequentists are still scared of leaving their incorrect theory of probability. No panacea exists. Subjectivism is silly. And they are right.

But it is a false dichotomy to insist on either subjective/objective Bayes of frequentism. There is a third way.

Categories: Statistics

13 replies »

  1. Results have causes. We can all agree on this. Well, except for anti-theists and some physicists.

    The disputes arise here: Not all causes lead to the same results, and not all results derive from the same causes. We call this “random” deviation. Then we debate endlessly over a lack of information, and invent ways to pretend that the lack of information isn’t important.

  2. “Results have causes. We can all agree on this.”

    Yes, they do, but it is because of our perception of time. Something (A) comes before something else (B), thus B must have been ’caused’ by A. Only our conscious perception allows us to call something a ’cause’ and the other thing a ‘result’

    The problem is that the dimension we call ‘time’ is just a function of our memory decay and we perceive it in one direction only. On top of that, there’s about a 1/4 to 1/2 of a second delay, so we never consciously experience the presence of anything. You can’t ‘live in the moment’, so to speak.

    What if all happens at the same ‘time’. There’s no cause or the result then.

    I believe that in higher physics dealing with multiple dimensions, one can remove the time variable and the other ‘things’ will work just fine.

  3. Every time I read one of these attempts, I hear myself attempting to point at simple ideas while flailing my hands.

    The priors are never quite as given as we hope. With many of the stories coming out today that are pushing inverted rationale, and the more than 0 people believing in that rationale, I wonder if I am not stuck in the Matrix. “The bullets from an AR-15 do terrifying damage!” The bullets from a 45 don’t? The bullets from a 9mm don’t? WHAT? When starting a conversation with such a person, there is no prior to cling to.

    And there is sense in frequentist not wanting to join reality. At least we can fall back on a dice and point at the six sides of the dice and have a starting place. There is something tangible to put in people’s hands and the frequentists can roll dice to their hearts content making distribution curves of so many varieties… Breaking out of those distribution curves is both necessary and terrifyingly dangerous. You suddenly enter a realm that doesn’t quantify well. What is a frequentist worth if he can’t quantify.

    I always wear my seatbelt, but I have never really used it.

    I always brush my teeth, but it turns out that I was brushing too hard.

  4. Briggs, what is your definition of probability? I.e., what’s the correct definition? 🙂

    It sounds as if it’s only sharply defined in trivial cases, such as when we choose more-or-less randomly from a well-defined sample space. But in the cases which matter, such as the probability that a certain factor is the cause of a disease, or the probability of success in a certain endeavor, what does probability even mean?

  5. Thanks. I presume, then, that you don’t put forward a relatively short, one-size-fits-all, no-more-than-a-sentence-or-paragraph definition.

  6. Alan,
    If you want a mathematical treatment, Wikipedia has a nice starting point under “Probability axioms”, and for the logicist approach under “Cox’s theorem”.

    I personally tend towards the first: a positive-valued additive measure on a set of mutually exclusive and exhausive propositions/event/states. But then I usually was dealing with experiments and real outcomes/actions in adversarial context (product development), where it was possible to actually count the outcomes (kinda-sorta). And I had to “eat my own cooking.”

  7. From Wikipedia:
    “99942 Apophis is a near-Earth asteroid that caused a brief period of concern in December 2004 because initial observations indicated a probability of up to 2.7% that it would hit Earth on April 13, 2029.”

    What does the probability of 2.7% mean to a statistician? What is the operational meaning of this quantification of uncertainty?

    As a physicist I interpret as–there is some error in measurements of asteroid position (dx) and velocity (dv) that were presumably done in 2004. Now, create an ensemble of trajectories starting from the zone (dx *dv) of initial uncertainty and observe how many of these trajectories intersect with Earth’s trajectory in 2029. That gives me the probability in a concrete operational sense.
    I suppose this interpretation is frequentist and I am sure this is how probability is understood by working scientists.

  8. Alan,

    Aha. Sorry, I had written about it so often, I just assumed. Probability, like logic, is a (not always quantifiable) measure of the relation between propositions. Probability, like logic, is the relation itself, and is silent on the nature of the propositions themselves. Why, how, and so forth can be found in many of the linked articles.

  9. Kalif – No. Just no. Time is real. It is a fundamental part of the structure of the universe. Just because two different people disagree on what time something happened does not invalidate the existence of time. It just makes our experience of its passage subjective.

    But this is the place to talk about probability and statistics, not advanced physics, so I’ll stop now.

  10. Why should we be comfortable saying Y is the evidence *I* accept and that you and I might differ on what evidence to accept, but uncomfortable with the idea of subjective prior probabilities?

    I don’t get it.

  11. Regarding “updating”, given the emphasis upon “totality of evidence”, what if the “update” is a mutation in the model, which affects, when it is used, the character of the likelihood function? Does one enumerate all possible models, then, run them on the data, and elect the “best” using a device like Bayes Factors?

    And what if the “update” is data, but rather than coming from a measurement with its own description of variability, the update is the result of a new calculation. Sure, the calculation will have a description of the uncertainty which attends its predictions, but it is structurally different than the “updates” provided by another, competing calculation. How to quantify the contrast between the two structures? What’s the “uncertainty” of such?

    I don’t mean to be difficult, not at all. I do consider myself to be a full Bayesian, although I, of course, don’t buy Digestive Subjectivity. It’s just that the program or approach suggested in the above — totality of evidence — seems to fully embrace these kinds of things.

    There may be approaches to this, such as Empirical Likelihood or Approximate Bayesian Computation, applicable to invertible models, but these are serious questions.

Leave a Reply

Your email address will not be published. Required fields are marked *