Another Proof Against P-Value Reasoning

This isn’t so much an extra proof, but a clarification on one of the proofs used in “Everything Wrong With P-values Under One Roof” (the first two arguments).

Calculation of the p-value does not begin until it is accepted or assumed the null is true: p-values only exist when the null is true. Now if we start by accepting the null is true, logically there is only one way to move from this position and show the null is false. That is if we can show that some contradiction follows from assuming the null is true. In other words, we need a proof by contradiction in the following way:

  • If “null true” then some proposition Q is true;
  • Not-Q (Q is false in fact);
  • Then “null true” is false; i.e. the null is false.

Yet there is no proposition Q in frequentist theory consistent with this kind of proof. Indeed, under frequentist theory, which must be adhered to if p-values have any hope of justification, the p-value assuming the null is true is uniformly distributed. This statement (the uniformity of p) is the only Q available. There is no theory in frequentism that makes any claim on the size of p except that it can equally be any value in (0,1). And, of course, every calculated p (except in some circumstances to be mentioned presently) will be in this interval. Thus we have:

  • If “null true” then Q = “p ~ U(0,1)“;
  • p in [0,1] (note the now-sharp bounds).

We cannot move from observing p in (0,1), which is almost always true in practice, to concluding that the null is true. This would be the fallacy of affirming the consequent. On the other hand, in the cases where p in {0,1} (the set with just elements 0 and 1), which happens in practical computation when the sample size is small or when the number of parameters is large, then we have found that p is not in (0,1), and therefore it follows that the null is false. But this is an absurd conclusion when p=1.

Importantly, there is no statement in frequentist theory that says if the null is true, the p-value will be small, which would contradict the proof that it is uniformly distributed. And there is no theory which shows what values the p-value will take if the null is false. There is no Q which allows a proof by contradiction.

Think of it this way: you begin by declaring “The null is true!”; therefore, it becomes almost impossible to move from that declaration to concluding it is false.

There is no justification for use of p-values other than will or desire of the user.

9 Thoughts

  1. Considering the proposition if null true the Q is true,

    Wouldn’t you need it to be “if and only if null is true then Q is true? The way it is written, Q could be true in either case of Q is true or false.

  2. Good rhetoric, but misses the point. It’s not a deduction so proofs are irrelevant. Isn’t that begging the question? Never has been a deduction. It’s a retroduction aka a Peircian abduction. From https://en.wikipedia.org/wiki/Abductive_reasoning:

    The surprising fact, C, is observed;
    But if A were true, C would be a matter of course,
    Hence, there is reason to suspect that A is true.

    Like induction, it can be in error. (Hence the need for repetition to control errors.) It is bayesian to the core, where you have to come up with a prior state that explains the observed.

    Also, in passing, I’ve never seen it used as a unitary or sole criteria for real cases (e.g. product release decisions). That seems to be an academic thing.

  3. Per,

    This is only the ordinary modus tollens written out to emphasize our search for a Q.

    William R,

    No, not quite. Your argument is handled in the paper. Don’t forget it’s illegal (if you like) to put any measure of uncertainty on fixed propositions in frequentism. Everybody does it, of course, and so everybody is wrong. Rather, they are not, but frequentism is. There is no way to show A is true, false, or unlikley. See the Holmes/Tukey quotes in the paper.

  4. Matt,

    Not that I’m a frequentistist, but where do you get the idea that you one can’t make (weak or strong) statements about the relative uncertainty of data given prior conditions? That is specifically allowed, via empirical likelihoods, bayes theorem or parametric likelihoods.

    p-values as percentile empirical ranks wrt to a sampling distribution are just one way, and are particularly apt if you are interested in randomization/exchangebility reference sets. Avoids the whole non-existence of parameters and infinite distributions issue.

    probability is just the future tense of a proportion.

  5. Matt,

    Not that I’m a frequentistist, but where do you get the idea that you can’t make (weak or strong) statements about the relative uncertainty of data given prior conditions? That is specifically allowed, via empirical likelihoods, bayes theorem or parametric likelihoods.

    p-values as percentile empirical ranks wrt to a sampling distribution are just one way, and are particularly apt if you are interested in randomization/exchangebility reference sets. Avoids the whole non-existence of parameters and infinite distributions issue.

    Probability is just the future tense of a proportion.

  6. Bill,

    Hmm. I don’t think I understand your question. Can you restate? I of course agree that we can make probabilistic predictions every time we specify premises for some propositions; these just are predictions, if you like.

    P-values, however, say nothing about what folks think they say something about. Do read the original paper.

  7. @Will

    “Probability is just the future tense of a proportion.”

    Probability needs a conscious observer in order to exist or mean anything. Things/events/etc. do not have any inherent property called probability. We make it up by observing ‘events’ and calculating frequency/probability.

    Probability looks like the future tense of a proportion, but we don’t have access to all future trials.

    If you roll a six-sided die, the probability of getting a certain number is 1/6, but it is 1/6 of all the possible trials, ever, until the end of time. For one trial(experiment) it is not.

    That’s why p values are next to useless, as they are always a one-off.

  8. If you roll a six-sided die, the probability of getting a certain number is 1/6, but it is 1/6 of all the possible trials, ever, until the end of time. For one trial(experiment) it is not.

    That’s the frequentist view. To a non-frequentist the probability is the certainty in the result. P(side | object) = 1/6 for a six-sided object means the certainty in any given result is 1/6.

    That’s why p values are next to useless, as they are always a one-off.

    P-values are useless for determining the efficacy of the model as they only give the measure of the current fit (the model) with the given data. They say nothing about future performance. In addition, larger data sets tend to have smaller p-values as a matter of course.

Leave a Reply

Your email address will not be published. Required fields are marked *