Statistics

P-values vs. Bayes Is A False Dichotomy

A urologist discussing pee values.

There still exist defenders of p-values. The largest class, the superstitious, are those who remember nothing about p-values except that they must be wee. Let us in haste pass by these amateur magicians.

The hardcore cadre of p-value champions are our concern. These fine folks do not recognize that every use of a p-value except one results in a fallacy. P-values cannot discover cause, nor “links”, nor tell you the probability any hypothesis is true, nor judge the goodness or value of any model. They can do one thing alone: they can tell you the probability, given the model and an ad hoc test statistic and assuming the parameter or parameters of that model are set equal to some value, of seeing a larger value of the statistic also given you were able to repeat the identical “experiment” that gave rise to statistic an infinite number of times.

P-values rely on the ad hoc model choice. They rely on the ad hoc model error choice. They rely on the ad hoc statistic. Change any of these, change the p-value. There is no unique p-value for any set of observations.

P-values do not answer questions people ask. Most ask, “What is the probability of Y given X?” P-values say, “Don’t ask me.” Other query, “What is the probability the parameter lies in this interval?” P-values say, “It is forbidden for me to answer.” Still more want to know, “If I give this patient the new treatment, what is the chance he improves?” P-values say, “Let me be wee!”

Given all this, why are there still p-value champions? Because of their quite realistic fear of out-of-the-box Bayesian procedures.

For one, P-valuers disagree (or most do) that probability can be subjective, as most Bayesians say it is. Probability is not subjective. If your evidence is “There are 99 green states and 1 yellow state in this interocitor, which must take one of these two states”, then it is an unimpeachable statement of subjective probability to say, “Given the evidence, the probability of the yellow state is 82.7578687%”. If probability is subjective, and given there are no such things as interocitors, how can you prove this assessment wrong? Answer: you cannot.

Probability is not subjective, but is instead a deduction, not always quantitative, given evidence.

Some Bayesians, however, are “objective”, and reject subjectivism. P-valuers still dislike these Bayesians. Why? Because of “priors”.

Both P-valuers and Bayesians begin by proposing an ad hoc model, usually parameterized on the continuum, or on a segment of the continuum which is itself continuous. A regression, for instance, supposes one set of observables (the “Xs”) relate in a certain linear mathematical way to a parameter or parameters of the main observable (the “Y”).

The next step, also agreed to by P-values and Bayesians, is to specify the ad hoc model error. The regression supposes the central parameter of the observable can take any value from the limits of negative to positive infinity, and that every value in the continuum is a possibility.

This is convenient mathematical approximation, but it is always an approximation. Nothing infinite actually exists, and nothing can be measured to infinite precision. No process or sequence goes on to infinity, as the math of the continuum insists. (It remains to be seen whether space itself is continuous in this sense.)

These ad hoc models are not strictly needed. Finite, discrete choices aligned with measurement exact models exist, but they are not in wide use; actually, they are mostly unknown.

It is at this point the P-valuers and Bayesians split, the P-valuers to their hypothesis testing fallacies, and the Bayesian to their priors. These are the ad hoc assumptions of the uncertainty of the ad hoc parameters of the ad hoc model.

P-valuers complain that the posterior probabilities of the parameters given the choice of prior depends on that choice. But this is a feature, not a bug. It is a feature because all probability is conditional on the assumptions made. It cannot therefore be a surprise that if you change the assumptions, you change the probability, but P-valuers do express surprise.

They do so because frequentist theory says probability exists in the ontological sense. P-valuers know their models are ad hoc but they also believe that by imagining their data could go on forever, the ad hociness vanishes in some mysterious way. Which is false.

Now almost all Bayesians stop at talking about posteriors of parameters, as if these parameters where of interest. They too have forgotten the questions people ask.

That means, as should not be clear, there is a third choice between frequentist and Bayesian theory. And that is probability: plain, unadorned, matter-of-fact probability, not about parameters, but directly, about observables themselves. This is the so-called predictive approach.

Try, when possible, to use a finite-discrete model, deduced from measurement process. These is the least ad hoc approach of all.

How to do that is detailed in Uncertainty.

Categories: Statistics

8 replies »

  1. The ASA published a statement re p-values generally in line with Briggs’ sentiments (June 2016); various reporting suggests this has been reviewed on-line about 200,000 times. The motivation was largely due to p-values being a deciding criteria for selecting research for publication. The ASA’s view on the matter is summed up as: ‘p-values are not badges of truth and p < 0.05 is not a line that separates real results from false ones. They’re simply one piece of a puzzle that should be considered in the context of other evidence'.

    With the ASA's statement were published some 20 authoritative associated commentaries re related viewpoints (abandoning p-values vs recognizing limited utility, trade-offs resulting from taking a given action, etc.), worth reading:

    http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108?scroll=top#.WcUSgLKGPcs

  2. I have read your book carefully twice, and your articles on the subject, as well as several other books, and I have to confess that I have only a vague idea of what you are talking about. You need to take a deep breath and go back and define “p” values in a comprehensible way, as for a 10 year old. You have to explain what is at stake. You did the best job with the example of sub 2.5 micron particles. Perhaps you should revisit that subject, which is still important, and try to explain to a ten year old what is happening at the EPA with regard to 2.5 micron particles and how “p” values fit in. (Urology doesn’t help, by the way). You are an expert, and a resourceful, scintillating writer on almost all subjects. But in discussing your own specialty, like so many others, you are hopelessly murky and confusing. You have to eschew jargon, all the jargon.

  3. George Gilder,
    Here’s a definition which I copied down from the real Briggs, the sensible one:
    “Given the model ASSUMED and
    ONLY the data we’ve SEEN, and
    accepting the null hypothesis IS true and
    it’s calculating the probability of a test statistic greater than the one obtained, an absolute value
    If we were to repeat the experiment an infinite number of times”.
    Note the Royal ‘we’.

  4. For George Gilder:

    It is rather a credit to you that your careful reading of the exact meaning of a p-value leaves you confused.

    In his post today, Matt carefully laid out all the meaning there is in any p-value – the sum total of everything that p-values can validly tell you, given their underlying mathematical assumptions and structure: “they can tell you the probability, given the model and an ad hoc test statistic and assuming the parameter or parameters of that model are set equal to some value, of seeing a larger value of the statistic also given you were able to repeat the identical ‘experiment’ that gave rise to [the] statistic an infinite number of times.”

    So, let’s unpack that a little bit. First, note that the p-value is defined to be accurate ONLY under mythical conditions: a) the ‘identical’ experiment (exactly how ‘identical’ is left to our imaginations) b) repeated an infinite number of times (which is impossible within any finite time).

    So, right away, the meaning of any p-value calculated under non-mythical conditions (viz., any actual p-value) is undefined. As in: the rigorous meaning of a p-value actively prevents us from saying what any actual, calculated p-value means.

    Second, notice that the p-value’s meaning depends upon an ‘ad hoc’ test statistic. This means that there is no such thing as ‘THE’ test statistic for a set of data. You can use many different test statistics (and therefore calculate many different ‘p-values’) for the same data. And the question of which test statistic (and therefore, which p-value) is ‘best’ (to publish) is ‘ad hoc’ – that question is formally undefined and cannot be answered mathematically.

    Do you see the irony? Simply by taking the meaning of p-values with complete seriousness, we have already strayed far from mathematical rigor. We see that a) the meaning of any actual calculated p-value is undefined, because it is certainly not part of any infinite series of ‘identical’ experiments; b) a plethora of different test statistics (with differing calculated p-values) can be chosen, and that choice is entirely ‘ad hoc’ – no mathematics exists that points us to the test statistic that is ‘best’.

    And there is more. Of course, absolutely nothing in the meaning of a p-value tells us what a ‘low’ p-value is. There is no mathematical justification whatever for taking a p-value 0.50 seriously. That, too, is completely ‘ad hoc’.

    And there is more. As Matt stated, the meaning of a p-value rests on an assumption that “the parameter or parameters of that model are set equal to some value.” Obviously, if our assumptions about the parameters are incorrect, then our calculation of the p-value is invalid.

    And commonly, we ‘check’ the correctness of our assumptions about these parameters by calculating a ‘low’ p-value. But in a famous 1987 paper, Berger and Selke established that it is inaccurate to say that a ‘low’ p-value justifies our assumptions about the parameters. So the p-value does not provide a valid ‘check’ of our assumptions about the parameters of the probability distributions of our variables. We still must assume that they are “set equal to some value,” though the p-value cannot tell us whether they are.

    And so on. It is important to note that Matt delineated the entire meaning of p-values. It is questionable whether ‘p-value’ has any realistic meaning whatever, given that its definition relies on an infinite repetition. But even aside from this, p-values never had any valid meaning beyond the one that Matt gave, and they never will.

    Thus the key: the meaning, and hence the utility, of p-values, cannot be validly extended to include the meanings ascribed to p-values in contemporary research.

    Hence, p-values cannot be used to determine cause. They cannot be used to ‘compare’ a model to a ‘null hypothesis’. They cannot even be reliably used to tell you if your assumptions about your ad hoc parameter values are correct. The question of which p-value is ‘best’ (to publish) is formally undefined and cannot be answered mathematically. And on and on.

    You are over-thinking this. There’s nothing to ‘understand’. It’s just that the implication – what is at stake – is so stark.

    What is at stake? The entire contemporary p-value enterprise is therefore founded on chimeras, phantasms. Every contemporary use of p-values is bereft of any valid mathematical justification.

    Perhaps an analogy can help. From now on, call all p-values-in-use “phlogiston-values,” instead. It would not be too far, at all, to state that a “p-value” in contemporary research has much the same utility and meaning as “phlogiston.”

    P-values-in-use, exactly like the term “phlogiston,” are located within a universe of untenable assumptions, spurious logic, and invalid conclusions; and they only make sense within that universe, and no other.

    You are over-thinking. A concept like “p-value” cannot be ‘understood’; it can only be abandoned. There is no way to ‘recover’ it, modify it, or tinker with it, that will save it. It belongs to, and was created, and lives, entirely within a vaguely-linked universe of quasi-meanings that will never add up to a single sound thought. Run away, and think of it no more. Think “Phlogiston-value”, and much will become clear.

  5. Don’t know what happened to this paragraph in the upload:

    And there is more. Of course, nothing in the meaning of a p-value that tells us what a ‘low’ p-value is. There is no mathematical justification whatever for taking a p-value 0.50 seriously. That, too, is completely ‘ad hoc’.

  6. Borked again. There’s something wrong. One more time:

    And there is more. Of course, nothing in the meaning of a p-value that tells us what a ‘low’ p-value is. There is no mathematical justification whatever for taking a p-value less than 0.05 seriously, or for not taking one greater than 0.50 seriously. That, too, is completely ‘ad hoc’.

Leave a Reply

Your email address will not be published. Required fields are marked *