JASA: The Substitute for P-Values

The paper is finally out! The Substitute for p-Values. Here from the Journal of the American Statistical Association, Volume 112, 2017, Issue 519, Pages 897-898 is the abstract (this paper was peer reviewed, so you know every word is true):

If it was not obvious before, after reading McShane and Gal, the conclusion is that p-values should be proscribed. There are no good uses for them; indeed, every use either violates frequentist theory, is fallacious, or is based on a misunderstanding. A replacement for p-values is suggested, based on predictive models.

Says things like:

There are no good reasons nor good ways to use p-values. They should be retired forthwith….We have interest in proposition Y, which might be “This patient gets better.” We want the probability Y is true given we know X_0 = “The patient will be treated by the usual protocol” or X_1 = “The patient will be treated by the New & Improved! protocol.” We have a collection of observations D detailing where patients improved or not and which protocol they received.

We want Pr(Y| X_i D).

Strangely, none of the usual procedures, Bayes or frequentist, give this commonsense answer.

This paper was an invited response to Blakeley B. McShane & David Gal’s “Statistical Significance and the Dichotomization of Evidence“. Abstract:

In light of recent concerns about reproducibility and replicability, the ASA issued a Statement on Statistical Significance and p-values aimed at those who are not primarily statisticians. While the ASA Statement notes that statistical significance and p-values are “commonly misused and misinterpreted,” it does not discuss and document broader implications of these errors for the interpretation of evidence. In this article, we review research on how applied researchers who are not primarily statisticians misuse and misinterpret p-values in practice and how this can lead to errors in the interpretation of evidence. We also present new data showing, perhaps surprisingly, that researchers who are primarily statisticians are also prone to misuse and misinterpret p-values thus resulting in similar errors. In particular, we show that statisticians tend to interpret evidence dichotomously based on whether or not a p-value crosses the conventional 0.05 threshold for statistical significance. We discuss implications and offer recommendations.

Beside my response, there is also one by Donald Berry, “A p-Value to Die For“; however, there is no Abstract for it. A couple of paragraphs of the response are at the link. I can also tell you—gleefully—that it contains sentence “We have saddled ourselves with perversions of logic—
p-values—and so we deserve our collective fate.” Berry also says:

I just Googled “p < 0.05 implies statistical significance” and found this garbage on the first site listed: “Most authors refer to statistically significant as P < 0.05 and statistically highly significant as P < 0.001 (less than one in a thousand chance of being wrong)." Many of these articles have statisticians as co-authors. They are our students, our colleagues, us!

Another is from Andrew Gelman & John Carlin: “Some Natural Solutions to the p-Value Communication Problem—and Why They Won’t Work“. No Abstract there, either.

They say things like “A common conceptual error is that researchers take the rejection of a straw-man null as evidence in favor of their preferred alternative…Unfortunately…a low p-value is not necessarily strong evidence against the null…” And “confidence intervals contain some information beyond that in p-values, they do not resolve the larger problems that arise from attempting to get near-certainty out of noisy estimates.”

Eric B. Laber & Kerby Shedden provide the last response, “Statistical Significance and the Dichotomization of Evidence: The Relevance of the ASA Statement on Statistical Significance and p-Values for Statisticians“. No Abstract. These authors are pro wee p.

Finally, McShane and Gal have a Rejoinder. Says things like

…because individual-level measurements are typically quite errorful, sample sizes are not especially large, and effects are small and variable, study estimates are themselves often rather noisy; noisy estimates in combination with the fact that the publication process typically screens for statistical significance results in published estimates that are biased upward (potentially to a large degree) and often of the wrong sign…Further, the screening of estimates for statistical significance by the publication process to some degree almost encourages researchers to conduct studies with errorful measurements and small sample sizes because such studies will often yield one or more statistically significant results. Of course, all of these issues are further compounded when researchers engage in multiple comparisons—whether actual or potential.

Amen.

4 Comments

  1. Congratulations, of course – for the paper, to be sure, but also, and I think much more importantly, that it was an invited comment.

    This is progress.

  2. Yes, congratulations on the invited comment. We, of course, still disagree on much, but invited comments are important, career-wise.

  3. A concise and clear paper, but it would be good to see some real world examples with M and D specified and the probability of Y calculated. In my world it would be useful to have a model that estimates the probability of a student passing a college course with a productive grade (anything but a D, F, Incomplete, or Withdrawal). How about a demonstration of actually developing one?

Leave a Comment

Your email address will not be published. Required fields are marked *