Class 30: Hypothesis Testing Stinks I

Class 30: Hypothesis Testing Stinks I

Today the math, which you don’t strictly need if you can’t calculate, behind how to think about Research Shows headlines and papers. This will be an invaluable exercise. You must try this, even if you have no other statistical, mathematical, or scientific experience.

Happy Thanksgiving. No class on 2 December.

Jaynes’s book (first part): https://bayes.wustl.edu/etj/prob/book.pdf

Permanent class page: https://www.wmbriggs.com/class/

Uncertainty & Probability Theory: The Logic of Science

Link to all Classes. Jaynes’s book (first part):

Video

Links:

YouTube

Twitter

Rumble

Bitchute (often a day or so behind, for whatever reason)

Link to all Classes.

HOMEWORK: Read Jaynes, and see below how you will find a Research Shows paper and apply the methods you learned today.

Lecture

You and I, dear reader, have seen hundreds, and more than hundreds, of Research Shows headlines and papers. We have investigated these works in depth, and showed how to critique them in a rigorous fashion. But we need not always have the ability to that, but we can think, or have experience which contradicts, or sometimes even confirms, what these Research Shows claims show. Today, following Jaynes, we show you how to think about these things.

If you want to grasp the entire mathematical argument, you must read Jaynes 5.1 and 5.2. I do not detail it below. There is no point my repeating any of the material which he covered, and said better. I go over the highlights in the lecture, and add amplifications, but there’s a lot I left out, too.

Here are also clues to what is coming. What we did today could be considered a form of hypothesis testing, only it wasn’t. A hypothesis test is a bizarre blend of probability and decision, where the decision is made for you, using criteria that may have no relevance for you, and which answers probability questions no one ever asks. Except certain statisticians, who have confused the point of their studies. All of which we’ll come to in time.

Here Jaynes simply calculates the probability of a proposition given background information, and then again using that same background information and adding to it new information, in the form of certain claimed experimental evidence. That’s it. There’s nothing more. As I have told you many times, this is it. This is everything. Every lesson in this class is a variation of Pr(Y|X). That is the beauty of logical probability. No special apparatus is needed.

This is what hypothesis testing should be, but isn’t. It will turn out the real hypothesis testing is nothing like this, and is a bizarre strange peculiar procedure better termed mathematical magic.

What I want emphasize here is that the technique of adding of your own hypotheses to those provided by “Research Shows” headlines and researchers. We do not have to be limited to what is put in papers. We can judge evidence for ourselves. Which doesn’t mean, of course, that we’ll get it right.

If a Research Shows (RS) paper uses ordinary statistics (frequentist or Bayes), it will have a hypothesis it wants to tout. Call it H_r; ‘r’ for research. This is usually “tested” against a simple or trivial hypothesis, usually called a “null”. Often, and too often, this is a straw man. In the example of today it wasn’t: it was no skill in ESP, or no ESP ability. Call this H_n; ‘n’ for null.

We have ordinary Bayes theorem, using our background knowledge (or assumptions, etc.) X, and whatever new data we have in the RS paper. We can write

$$Pr(H_r|DX) = Pr(H_r|X)\frac{Pr(H_r|X)Pr(D|H_rX)}{Pr(H_r|X)Pr(D|H_rX) + Pr(H_n|X)Pr(D|H_nX) + c}$$

where c is usually 0. Make sure you say to yourself what each of these terms means. (I mean it. Do it.) Now, using ordinary hypothesis testing, you don’t get any of this. You should, but don’t. But you do get the implication from the typical RS paper that Pr(H_r|DX) is high, if not “certain”. Sadly, almost never is this calculation made, and we do have to infer it. Once we do, we can deduce Pr(H_n|DX) (since the two must sum to 1).

Jaynes’ point is that we do not have to accept the word of the RS researchers. We are free to add contrary hypothesis that also explain the data D. In the ESP example, we listed a bunch that all came down to various forms of cheating, bias, bad data, mistakes, and sensory leakage. These became our c:

$$c = \sum_i Pr(H_i|DX) Pr(D|H_iX)$$

This notation is a little misleading because when we consider c we have to adjust Pr(H_r|X) and Pr(H_n|X), so that Pr(H_f|X) + Pr(H_n|X) + sum(Pr(H_i|X)) = 1. And remember this is our X, not the researchers’. That said, if c is large, if we are able to give good weight to alternate hypotheses beside or beyond those given by the researchers, then Pr(H_r|DX) will have a tough time being large. Which is to say, it will be small, and therefore H_r will be more difficult to believe.

Whether to act like or decide H_r is true is entirely different from the probability we calculate for it. That depends on what you will do with it, or what somebody will do with it to you. Alas, ordinary hypothesis testing makes the decision for you, even if, for you, or for anybody, it is a lousy decision. These are matters we’ll leave for another day.

Finally our homework. I (and you will too) searched for “research shows” and this was the first study headline that came up: “New research shows younger and middle-aged adults have worse long COVID symptoms than older adults“. This points to the paper “Neurologic Manifestations of Long COVID Disproportionately Affect Young and Middle-Age Adults” in Annals of Neurology by Choudhury et al.

Their conclusion: “Younger and middle-age individuals are disproportionally affected by Neuro-PASC regardless of acute COVID-19 severity.” By “Neuro-PASC” they mean neurological post-acute sequelae covid symptoms. They said technical things like this (with my emphasis):

…10?months from COVID-19 onset, we found significant age-related differences in Neuro-PASC symptoms indicating lower prevalence, and therefore, symptom burden, in older individuals. Moreover, there were significant age-related differences in subjective impression of fatigue (median [interquartile range (IQR)] patient-reported outcomes measurement information system [PROMIS] score: younger 64 [57–69], middle-age 63 [57–68], older 60.5 [50.8–68.3]; p?=?0.04) and sleep disturbance (median [IQR] PROMIS score: younger 57 [51–63], middle-age 56 [53–63], older 54 [46.8–58]; p?=?0.002) in the NNP group, commensurate with higher impairment in quality of life (QoL) among younger patients.

Those “p”s in the parentheses are the results of ordinary hypothesis testing, which, the ritual tells us, have to be smaller than the magic number of 0.05. Their conclusion said:

Younger and middle-age individuals are disproportionally affected by Neuro-PASC regardless of acute COVID-19 severity. Although older people more frequently have abnormal neurologic findings and comorbidities, younger and middle-age patients suffer from a higher burden of Neuro-PASC symptoms and cognitive dysfunction contributing to decreased QoL. 

They ask us to believe, roughly, that the young or close to middle age people experienced subjective impressions of fatigue because of “long covid”, but that old people, who, you will recall, had worse covid symptoms than the young, did not have as many. This is their H_r. Their null, H_n, is that there are no difference in these kinds of symptoms or impressions. Naturally, they believe Pr(H_r|DY) is very high, where Y is their background information about such matters.

My Pr(H_r|DX), however, is low. Mainly for two reasons. First, my Pr(H_r|X) was low. I don’t buy that “long covid” is real, or it is at least ill defines, in the sense that the symptoms are all over the place, inconsistent, and not really suffered by those who haven’t heard of it, or were skeptical of it (like “sick building syndrome”). I do leave open the possibility I’m wrong, and that some long-term specific hard-measurable effects might exist in some people for whatever reason, as opposed to vague feelings and goofy scores on questionnaires (which try to quantify the unquantifiable).

Second, I also have a few alternate hypothesis that would explain this data. One of which is the narrow range of people chosen for this study. I’m not confident the same results would be found if somebody else did the picking. I don’t mean fraud, I mean bias. For instance, some of the patients were from ” video-based telehealth visit[s].” Another: I trust very little quantified questionnaires for unquantifiable things. Another: they tested for a lot of things, and reported on only a few. The more things you check, then something is always going to be “correlated”, and verified with wee Ps, eventually.

The end result is that, for me, my Pr(H_r|DX) is not that different from my Pr(H_r|X). I started low and end low. Notice I do not have to quantify any of this. I could, and you could use the math, but that’s only necessary if you’re ate up about it. Which I’m not. But if you are, then you go ahead and plug some numbers in.

Indeed, this is your homework. Find a Research Shows headline or paper and critique it in a way sensible to you.

Which doesn’t mean sensible to anybody. This kind of analysis doesn’t make you right and the researchers wrong. It only explains how both sides view the evidence.

And that is what probability is! The expression of uncertainties given certain assumptions.

Happy Thanksgiving. Again, no class on 2 December.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *