Philosophy

The Bayesian Metaphor Can Do More Harm Than Good: Update

A Bayesian bat.

A Bayesian bat.

Quoting from a post on vampires, “In Bayesian inference, you start with some initial beliefs (called ‘Bayesian priors’ or just ‘priors’), and then you ‘update’ them as you receive new evidence.”

This is the standard metaphor, and it’s not so much wrong as unhelpful, misleading, and restricting. The metaphor derives from Bayes’s rule (details which can be looked up anywhere) and which gives a formula which on the right-hand-side is supposed to be an element representing “prior beliefs.” The formula itself is correct, as most math is. But because math is correct does not mean that it means what you think it means.

(Incidentally, all (as in all) frequentist methods should be dumped forthwith: no hypothesis testing, no p-values, no parameters, no infinities. It is a false dichotomy to suppose that if not Bayes then frequentist, and vice versa.)

You’re interested in some proposition Y. Is Y true or false? How do we know? Well, by identifying, if we can, what caused Y to be true or false. Failing that, we discover evidence which is probative of Y. Gather everything you can related to Y and call this evidence, which is in the form of a complex proposition, E. What we want is then:

     Pr(Y | E).

Of course, in E may be knowledge of the cause or determination of Y (or not-Y) and in which case Pr(Y | E) = 1 (Pr(Y | E) = 0). It is also not the case that Pr(Y | E) need be a unique number, or any number at all. Not all beliefs are quantifiable.

Anyway, point is, Pr(Y | E) is what we want, and it is all we really need. There is no Bayesian metaphor of a “prior” and “posterior” needed. If Pr(Y | E) does have a numerical value, we can figure it out directly. There is no absolute need to invoke Bayes’s formula. Sometimes, Bayes’s formula is just the thing, as an aid in computation. But other times it is a hindrance. The example at this link shows Bayes can lead to more work (this was a post restored after the website was hacked). Bayes gets to the right answer, of course, because again there is nothing wrong with the math. But an equation does not have to be used because one has it in hand.

It’s not only that Bayes sometimes causes extra effort, but the equation can cause a weird form of reification, especially when it is written incorrectly, like this:

     Pr(Y | X) = Pr(X | Y) x Pr( Y ) / Pr( X )

This is wrong because there is no such thing as “Pr(Y)” or “Pr(X)”. These objects do not exist. Numbers can be put in their place and the equation can be made to work out, but it is the step of putting numbers in that is wrong. There is no such thing as an unconditional probability, so we can never write without error “Pr(Y)” or “Pr(X)”. Instead, we should write e.g. Pr(Y|W) or Pr(X|W), where W is the knowledge we start with, i.e. our real prior (knowledge).

The part in bold, “Pr(Y)”, is where the trouble starts. It encourages people to put a number on the “prior”, usually subjective, belief we have in Y. We then “update” this belief by adding in knowledge of X. Sometimes people we say they are ignorant of Y entirely, yet still they place a number on Pr(Y), which is, of course, a contradiction. If we truly know nothing of Y, then no number at all can be assigned to its probability.

For instance, I have a proposition Y in mind as I write this. What, dear reader, is your “prior” on its truth? Notice I tell you nothing about Y. You are as ignorant as possible. You don’t even know if Y is simple or multi-dimensional. Real ignorance means no probability. This result (to give it a grandiose name) applies with greater force to non-observable (continuous) parameters, too. Ignoring the result results in the well known paradoxes of Bayesian statistics and to the use of “improper” priors, which aren’t probabilities at all.

So we must start with some kind of knowledge about Y, if only a tautology. I.e., we must have at least Pr (Y | T), where T is any tautology. This form does not give a unique numerical value, of course. But it is at least philosophically tenable. In any case, if we start with some W and add to this knowledge X, we still desire Pr(Y | WX ), which can be got by direct examination or via Bayes’s formula.

But that’s still not right, because usually when we learn X, we modify W, often by tossing out parts of it or by modifying it to some extent so that it is “closer” to X. So we never really compute Pr(Y | WX) but instead find something like Pr(Y | W’X), where W’ does not equal W, and where Pr(Y | W’X) isn’t and cannot be found by using the prior Pr(Y|W).

The reification enters with the quantification. Because numbers are often shoehorned onto the formula, Pr( Y| WX ) takes on life and becomes “the” probability of Y, instead of the evidence for Y given WX, which may or may not really be quantifiable.

The point of the linked vampire post is that tradition, given to us by our elders and old books, must not be ignored and must be given, especially in our rapidly declining age, more weight than our present, is a conclusion with which I am in complete agreement.

Update Bayesian “updating” is everywhere! “How to get the most out of realizing you are wrong by using Bayes’ Theorem to update your beliefs“.

Categories: Philosophy, Statistics

6 replies »

  1. Great post; it’s got me more impatient for the book.

    I like the criticism that Bayes’ Theorem isn’t anything special. Since it’s derivable from simple set theory and venn diagrams, its elevation to the status of “this is how the human mind works” is just more reification.

  2. Curiously, I am left unimpressed by a tacit admission that mathemagician’s fancies and fabrications do not re(-)present reality.

  3. Are there any plans to give out a posterior version of “Breaking the Law of Averages”? There seems to be a high need for the updated book with an extensive Q&A section.

  4. Although I could see the math was right, I have always resisted Bayesian methods because I saw no logical method of limiting the priors, i.e. in the extended form (read the lowercase i and j as subscripts)
    P(Ai | B) = P(B|Ai)P(Ai) /?j(P|Aj)P(Aj)
    I couldn’t find a logical method of limiting the Aj’s.

  5. Thank you for the post. I must say, though, that I’m not buying it.

    You say the following: “You’re interested in some proposition Y. Is Y true or false? How do we know? Well, by identifying, if we can, what caused Y to be true or false”.

    To know if Y is true or false, I don’t have to know what caused it. For example, to know if someone is dead, I don’t have to solve who the murderer is. Instead, I check their pulse. The cause is not needed to determine truth.

    Please correct if I am wrong. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *