Manipulating the Alpha Level Cannot Cure Significance Testing — Update: Paper Finally Live!

A new paper has been submitted to a well known journal, “Manipulating the Alpha Level Cannot Cure Significance Testing: Comments on ‘Redefine Statistical Significance'”, by David Trafimow, Valentin Amrhein, Fernando Marmolejo-Ramos, and — count ’em! — fifty-one others, of which is included Yours Truly. The other authors kindly and graciously allowed me to add my Amen, for which I am most grateful.

The “comments” refer to the paper by DJ Benjamin, Jim Berger, and a slew of others, “Redefine statistical significance” in Nature Human Behavior 1, 0189. Our submission is to the same journal, obviously as rebuttal.

We looked at Benjamin before, in the post Eliminate The P-Value (and Bayes Factor) Altogether & Replace It With This. The replacement is predictive modeling, which I wrote about extensively in Uncertainty and briefly in the JASA paper The Substitute for P-Values.

From the new paper, the One sentence summary: “We argue that depending on p-values to reject null hypotheses, including a recent call for changing the canonical alpha level for statistical significance from .05 to .005, is deleterious for the finding of new discoveries and the progress of cumulative science.”

You may download the entire paper as a PDF preprint at Peer J Preprints.

Here (not set in blockquote to avoid the italics) is the entire Conclusion. Help spread the word! It’s time to kill off p-values and “null hypothesis” significance testing once and for all — and restore a great portion of Uncertainty that has falsely been killed off. (Yes, Uncertainty.)

Conclusion

It seems appropriate to conclude with the basic issue that has been with us from the beginning. Should p-values and p-value thresholds be used as the main criterion for making publication decisions? The mere fact that researchers are concerned with replication, however it is conceptualized, indicates an appreciation that single studies are rarely definitive and rarely justify a final decision. Thus, p-value criteria may not be very sensible. A counterargument might be that researchers often make decisions about what to believe, and using p-value criteria formalize what otherwise would be an informal process. But this counterargument is too simplistic. When evaluating the strength of the evidence, sophisticated researchers consider, in an admittedly subjective way, theoretical considerations such as scope, explanatory breadth, and predictive power; the worth of the auxiliary assumptions connecting nonobservational terms in theories to observational terms in empirical hypotheses; the strength of the experimental design; or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.

UPDATE Peerj says “This manuscript has been submitted and is being checked by PeerJ staff.” I thought it would have already cleared by now. It hasn’t, so the link above won’t yet work, as John discovered. Once the paper clears, I’ll update again. Sorry for the confusion.

UPDATE Difficulty is that Peer J says all authors have to confirm authorship, which means 54 people have to sign up for an account, etc. etc. Stay tuned.

UPDATE Paper is finally live! Follow this link!

5 Comments

  1. I cannot access the preprint. I signed up for PeerJ and verified my email address with them, but then I get this error:

    “403 Access Denied
    Manuscripts being checked by staff, in review or already accepted for publication cannot be edited unless returned by staff. Additionally, only the corresponding author may be able to access specific pages to edit the submission.”

  2. Briggs

    Thanks, John. Looking into it. Stay tuned.

    See update(s) above!

  3. Tom Johnson

    A serious challenge to the way things have been done for 125 years or more.
    And credibly based, too.
    Frame an hypothesis.
    Conduct a study. Collect a sample.
    Then what? What did the experiment show us? Is the hypothesis to be accepted or discarded?
    How do we know what we know? What does verifiable mean?
    This question of p-values is the basis for an increasing number of decisions in the public sphere, in medical research, in advertising, in engineering, in politics, just about everywhere!
    Very basic questions here. I found the reference to the Michelson-Morley experiment particularly vexing, as there is no question about their study having a small sample size and thus a large uncertainty.
    And yet Einstein based his Special Theory on the results.

Leave a Reply

Your email address will not be published. Required fields are marked *