A Simple Argument Proving You Must Not Trust Standard Scientific Evidence

A Simple Argument Proving You Must Not Trust Standard Scientific Evidence

In my forced isolation, which was in many ways a gift, I took to thinking of my failures to convince people not to use or trust standard statistical evidence. Here is another in a long line of such arguments. I know it’s not to everybody’s taste, but this argument uses only elementary logic, so that anybody can read it.

You are growing weary of hearing these complaints, but here is pictorial proof of why you must distrust “research shows” science:

I do not say disbelieve; I say distrust. This pic is from the “The significance filter, the winner’s curse and the need to shrink” by Erik W van Zwet and Eric A Cantor in Statistica Neerlandica.

Recently, Barnett and Wren (2019) collected over 968,000 confidence intervals extracted from abstracts and over 350,000 intervals extracted from the full-text of papers published in Medline (PubMed) from 1976 to 2019. We converted these to z-values and their distribution is shown in Figure 1. The under-representation of z-values between -2 and 2 is striking.

What a z-value is isn’t important for us. The +/-2 is because (for those who don’t know) z-values greater or equal to this, in absolute value, are awarded the magical distinction of significance. That is possession of a wee P; a p-value less than the magic number.

The magic number is known by every practicing scientist, so I need not repeat it here. Speaking it aloud brings bad luck: frequentist probability is pagan.

The point is that once you can announce a wee P, fame and glory, in the scientific sense, are yours. And they are easy to get, easy to create. It’s child’s play to edge your models up those cliffs and into the Land of Mystical $|z| \ge 1.96$ (the technical value of “2” is 1.96, for obscure mathematical reasons; the two vertical lines mean the value between them is always made positive).

I have tried many times, in as many different ways I could think of, to convince you not to follow, produce, or believe any “hypothesis test”, i.e. the quest for $|z| \ge 1.96$, or to join the wee P spree. I am asking much of you, because this means rejecting the great bulk of modern science. Let me try one more time.

Put everything and what you know of this subject out your mind. Let’s instead review simple logic. We’ll start with modus ponens and modus tollens, written in probability notation (yet another way to write logic):

$\Pr(Q|P \:\&\: P\to Q) = 1$,
$\Pr(\overline{P}|\overline{Q} \:\&\: P\to Q) = 1$; and so
$\Pr(Q|P \:\&\: P\to Q) = \Pr(\overline{P}|\overline{Q} \:\&\: P\to Q)$.

The first line is the probability the proposition Q is true assuming the proposition P is true, and assuming it is true that “if P is true then Q is true” (that’s what the arrow means). The probability equals 1, because Q is true given these assumptions. That’s modus ponens. The second line is the probability P is false (the overline here means logical contrary) assuming Q is false, and assuming that “if P is true then Q is true”. This probability also equals 1, because P is false given these assumptions. This is modus tollens.

The two probabilities are equal, because (and you didn’t need me telling you this) 1 = 1. But suppose we don’t assume $P\to Q$; i.e. we do not suppose that “if P is true then Q is true”. Instead, we have any other proposition that is not logically equivalent to that. Call it “E”. Example: E = “The sun is shining” or E = “7 > 5”. Whatever you like, just not $P\to Q$. Then we get this:

$\Pr(Q|P \:\&\: E) \ne \Pr(\overline{P}|\overline{Q}\:\&\: E)$.

This says the probability that Q is true accepting P is true and E is true does not equal the probability P is false given Q is false and E is true. Except, of course, by accident, or when P = Q. By which I mean, since all probabilites live on [0,1], it is always possible the calculation of any two probabilities may happen to have by coincidence the same value. And if P = Q it is obvious the probability Q is true given Q is true equals 1, whatever E is, as long as isn’t nonsensical. Except for these non-interesting exceptions, this inequality will hold for any propositions P and Q (again, where $P \ne Q$ and E doesn’t contradict either).

No logician or probabilist disagrees with these (really, trivial) facts. The modus ponens and tollens results flow because, and only because, we had that crucial premise $P\to Q$. If we pop in some “random” E in its place, we can get anything.

Here’s an example. Let E = “blue is not red” (why not). And let P = “Two dice tossed” and Q = “Sum on dice is 2” (P also has implicit premises about dice which we all know). Then

$\Pr(Q|PE) = 1/36$ (we don’t need the “&” between P and E; it’s implied).

It should be obvious that

$\Pr(\overline{P}|\overline{Q}E) \ne 1/36$,

which reads the probability of “Two dice aren’t tossed” given “Total on two dice isn’t 2”, and assuming E, is not equal to 1/36. Because, of course, two dice could be tossed even if the total wasn’t 2.

Well, we haven’t done anything up to this point, except review basic logic. Yet in it is everything.

For instance, what would you think of a guy who announces, for our dice P and Q, “I have just calculated $\Pr(Q|PE)$, and it has given me a small number; lo, even smaller than the magic number; indeed 1/36 = 0.028. Therefore, I insist you believe $\Pr(\overline{P}|\overline{Q}E)$ is high. Indeed, I ask you to act like $\Pr(\overline{P}|\overline{Q}E)=1$!”?

I can’t answer for you, but I’d ask him to go home and sleep it off.

One more bit of logic before we come to the stunning conclusion.

It is possible to have bad arguments with true (or likely) conclusions. Let P = “It’s Friday morning and green is less than red”, let E be whatever you like except the exception above, and let Q = “7>5”. Pr(Q|PE) doesn’t make much sense, even though it is true that “7>5”. We would, as good logicians, toss out the argument from P and E to Q. We would not insist that, “Well, since obviously 7>5, it must be true it is Friday morning and green is less than red.” If we tried that, we’d be laughed out of the logic club.

And now—the stunning conclusion!

The dice mistake is the same made in every hypothesis test. Where by every I mean every. And the 7>5 mistake is made in some of them.

In hypothesis tests, P = “No effect exists”, and Q = “Data more extreme than we actually got”. If $\Pr(Q|PE)$ is small (less than the magic number) we are asked to believe $\Pr(\overline{P}|\overline{Q}E)$ is high, or act like it is equal to 1. That is another way of saying Pr(effect exists | data we saw & E) = 1 (or high).

Not only is all this forbidden in the theory that gives p-values, though everybody does it, the act itself is a fallacy. It is an invalid argument. It is wrong. It is bad reasoning. It might be true, because of other reasons, that “effect exists” is true or is of high probability (with respect to those other reasons), but that is no justification for accepting the hypothesis test. At all. Ever. Every use of hypothesis tests is fallacious.

Can good choice of E save testing? Let’s see.

E is certainly not “$P \to Q$”. It it were then E would read, “If no effect exists, then we’d get data more extreme then we actually got.” Nobody believes that because it doesn’t make sense. But that’s what E would have to be in order to invoke modus tollens and modus ponens and make a valid logical argument.

E is also not $P \to \overline{Q}$, which reads “If no effects exits, we’d get the data we got”. Though that in fact might be believed in some situations. In others, E is believed to be the opposite: E = “If effect exists, we’d get the data we got”. Neither of these E are of any help; neither rescues the logic.

Thus, the only E that would make good logic is not believed, isn’t used, and isn’t true. There is no rescuing hypothesis testing. Every use is a fallacy.

“So what do we use instead Briggs?”

Find $\Pr(\overline{P}|\overline{Q}E)$, which (though this notation is awkward) is the probability the effect exists, given the data we got, and whatever else we believe represented by E.

Simple, ain’t it.

“Oh yeah, if it was simple, why doesn’t everybody use it?”

Because people want to believe in magic. Because they don’t know better. Because it’s too hard. Because, worst of all, the direct method I advocate leads to far, far fewer claims of effects “discovered”.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

4 Comments

  1. Stan Young

    Now, the lack of distribution between -2 and +2 could be publication bias, researchers failing to report non-significant studies. Some of the pile up of distribution outside the range -2 to +2 might be due to data or analysis manipulation to turn a nonsignificant result into a significant one. The greater distribution above +2 could be due to researchers knowing that a positive result calls for more research, i.e., government grants, whereas a negative result signals one should give up on this question.
    W.C. Fields: A thing worth having is a thing worth cheating for.

  2. Briggs

    Stan,

    Very true.

  3. George Gilder

    Sorry, but your notation is incomprehensible and you don’t explain it. You are talking mostly to yourself. All those dollar signs and overlines are gobbledygook. And I am someone who loved your book on logic.

  4. Briggs

    George,

    Dollar signs? Uh oh. It sounds like something went badly wrong with the LaTeX interpreter. You should not see any dollar signs at all.

    Might I suggest trying the Substack mirror?

    https://wmbriggs.substack.com/p/a-simple-argument-proving-you-must

    That appears to be working. So does my site to me, however. I’ll see if I can find a way to break and diagnose it.

    Thanks.

    UPDATE

    I’ve determined the EMAILS aren’t rendering the LaTeX, and I had thought they were. Naturally, if you try to read LaTeX and aren’t familiar, which is most people, it looks like gibberish. It’s a math coding language that is supposed to be rendered by certain plugins to WordPress. But it didn’t work here.

    I’m looking into it. But it does work on the site.

Leave a Reply

Your email address will not be published. Required fields are marked *