Remember! Everybody can and should read Part I. Part II today requires the Class background.
Uncertainty & Probability Theory: The Logic of Science
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: You must look up and discover a Research Shows paper and see if it conforms to the conditions given below. Have you done it yet?
Lecture
Go back and re-read Class 32, so the notation makes sense. Recall we ended, as most epidemiological studies ended, by computing Pr(E|YX) and Pr(E|No YX), but that neither of these are Pr(Y|DEX), and that claiming they are is the epidemiologist fallacy. We want Pr(Y|DEX), the probability of the disease given exposure and a “sufficient” dose, and given whatever background X.
We can get to Pr(Y | DEX) easily enough using the correct rules of probability. First we see:
$$\Pr(Y | EX) = \frac{\Pr(Y |X)\Pr(E | YX)}{\Pr(Y | X)\Pr(E | YX) + \Pr(No Y | X)\Pr(E | No YX)}.$$
This simply applies Bayes theorem properly, and gives us the probability of having the disease given exposure (not yet dose), and given our background knowledge. Notice very carefully it does have the two elements in the epidemiologist fallacy, which are Pr(E | YX) and Pr(E | No YX). Which means epidemiologists started on the right path, but never completed their journey.
For Bonvicini (go back and review!) we can plug numbers in:
$$\Pr(Y | EX) = \frac{0.00027 \times 13/41}{0.00027 \times 13/41 + 0.9973 \times 11/82 } = 0.00063.$$
Also note that this applies only to the 82+41 = 123 people that were measured. If we want to use this as an estimate to people not yet measured, we must take more care (and we recall the 0.00027 is suspect). But these are niceties we can ignore today.
We can, and should, also calculate this:
$$\Pr(Y | No EX) = \frac{0.00027 \times 28/41}{0.00027 \times 28/41 + 0.9973 \times 71/82 } = 0.00021.$$
Now Pr(Y | EX) > Pr(Y | X) and Pr(Y | EX) > Pr(Y | No EX), which is intriguing, and should be whispered to those who want to dig deeper. It is in no way proof that pesticides cause ALS. Because if that were true, then all exposed people would have got ALS. Which all did not. And some who were not exposed also got ALS. And these two facts at least mean other things besides pesticide are causing ALS. Which means those other things, which we have proved with certainty must exist, might account for all causes of ALS, and pesticide account for none. As long as there is no error in the exposure as claimed, that is.
We still have not got to dose. We can, using probability as before:
$$\Pr(Y | DEX) = \frac{\Pr(Y |EX)\Pr(D | YEX)}{\Pr(Y |E X)\Pr(D | YEX) + \Pr(No Y | EX)\Pr(D | No YEX)}.$$
We now know Pr(Y | DEX), which means we also know Pr(No Y | DEX) (which is 1 – Pr(Y | DEX)). Yet we do not know what Pr(D | YEX) or Pr(D | No YEX) are. But we do know that if pesticides have anything to do with causing ALS then it must be that Pr(D | YEX) > Pr(D |No YEX).
If, for example, Pr(D | YEX) = Pr(D |No YEX) (for any value of either) then Pr(Y | DEX) = Pr(Y | EX), which says that knowing D is irrelevant after knowing E and X, a very curious thing to claim. It would mean that (always assuming X), after exposure, both diseased and non-disease patients have the same chance of receiving a dose. But recall the causes and causal pathways in any two people need not be the same.
Only if Pr(D | YEX) > Pr(D |No YEX) is Pr(Y | DEX) > Pr(Y | EX). Indeed, if Pr(D | YEX) < Pr(D |No YEX) then Pr(Y | DEX) < Pr(Y | EX), and the dose is in the direction of protective against the disease. Which might be the case if the dose has a hormesis effect.
Let’s pay careful attention to the equation Pr(D | YEX) > Pr(D |No YEX). In both cases (always accepting X), we assume exposure. Both the diseased and non-diseased have been exposed. We then want to know about dose. If, after being equally exposed the chance of getting a dose is also equal for both the diseased and non-diseased, then dose is irrelevant, as mentioned, and we only need Pr(Y|EX). We don’t need to know dose. But that means, since exposure is not dose, that if Pr(Y|EX) > Pr(Y|No EX), it must be something besides dose but related to exposure that accounts for the increased probability of disease.
There could be any number of things that are exposure but that are not dose-related that could be causally important in the disease. This is why any method that only gives, or gives functions of, mere exposure can be very badly misleading. Which is why most epidemiological studies that rely on exposure are in fact misleading.
Problem is, of course, that we typically don’t know what Pr(D | YEX) or Pr(D |No YEX) is. We certainly don’t in Bonvicini. Merely for the sake of argument, let us suppose that indeed Pr(D | YEX) > Pr(D |No YEX). Not knowing reasonable numbers, we might guess—and recall this assumes all people involved were exposed—Pr(D | YEX) = 0.1 and Pr(D |No YEX) = 0.08. Which is, some 10% of the exposed diseased people were dosed, and only 8% of the exposed non-disease people were. To stress: I made these numbers up. I have no idea of their accuracy.
With that we get Pr(Y | DEX) = 0.0008. Recall Pr(Y | EX) = 0.00063. That difference, 0.0008 – 0.00063 = 0.00017 accounts for adding dose to our evidence. It is a very small number (of course assuming our guesses are correct). But it is the difference, whatever it might be, that is important, because we already know that mere exposure in the absence of dose gives evidence of disease. Exposure itself can’t be a cause, so it means that things associated with exposure are our culprits.
All this analysis is before considering the importance to people outside those already measured. For that, we’d need a more sophisticated analysis. But we can do a quick preview of this. Suppose Pr(Y | DEX) = 0.0008, as above. We assumed Pr(D | YEX) = 0.1. And from above we had Pr(E | YX) = 0.32. For a group of new people, say a region of a state, or for whatever number of new unmeasured people, we want the probability that somebody has the disease and is exposed and is dosed. This is
$$\Pr(YDE|X) = \Pr(Y | DEX)\Pr(D | EX)\Pr(E |X) = 0.0008\times 0.1 \times 0.32 = 0.000 0026.$$
Knowing nothing about new people except that they “fit” whatever information we have in X, then for every million new people, we’d expect about 26 (the Pr(YDE|X) times a million) people with the disease who were both exposed and dosed.
For those just exposed, we have Pr(YE|X) = \Pr(Y | EX)\Pr(E |X) = 0.00063 $\times$ 0.32 = 0.0002. And so in that same new million people, we’d expect about 202 people with the disease who have been exposed. That means about 202 – 26 = 176 come from “exposure”, or whatever it is that is causally related to exposure (because again it cannot be exposure itself).
These numbers are only one side of the story, and all must be balanced by whatever good the pesticide does. Suppose, to make up numbers, that we determine the pesticide saves the lives of 100 people out of every million (to low ball it). Then, in balance, the pesticide is still ahead on average. But this kind of thing is an analysis for another time.
Meanwhile, we have one more trick epidemiologists play. And that is not only to skip dose, but to the skip exposure. A model of exposure is substituted for exposure, and it is forgotten the model is a model, that a guess is not the real thing.
We do this the next, and final time, in our initial Epidemiology series.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Briggs, loving the class so far! I have a somewhat related question regarding what was discussed in this lecture, specifically about dose-response models.
Let’s say we are working with a graded outcome (rather than a dichotomous one), such as systolic blood pressure, and we want to determine if there is a dose-response effect of intervention “X” on systolic blood pressure.
How can we ensure that what we are observing is not just an artifact but a real dose-response effect? This becomes particularly challenging when accounting for the fact that individuals start at different baselines and have varying adaptive capacities. It’s not uncommon to see the mean change in intervention groups be only about half of the standard deviation.
Would using ANCOVA be an appropriate method to “control” for these intraindividual differences?
If not, would it make sense to compare the number of participants in each intervention group who reach a smallest effect size of interest, and then compare the calculated probabilities?
I hope this question isn’t too much of a bother. Again, loving the class!
PS: I also sent you an email not so long ago. If you get the chance to take a look at it, I’d greatly appreciate it.
Best regards,
Ignacio