Remember! Everybody can and should read Part I. Part III today requires the Class background.
Uncertainty & Probability Theory: The Logic of Science
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: You must look up and discover a Research Shows paper and see if it conforms to the conditions given below. Have you done it yet?
Lecture
Recall last time we finally reached Pr(Y|DEX), the probability of the disease given exposure and a dose, and whatever background information X we have. Problem was that, in many instances, we don’t know what Pr(D|YEX) and Pr(D|No YEX) are, so we cannot get what to Pr(Y|DEX). Thus we should not be making any causal claims. Which, of course, epidemiologists often cannot resist doing.
We finally came to the worst sin. But recall the first sin, which is that, again often, epidemiologists stop at calculating functions of Pr(E|YX) and Pr(E|No YX), and then intimate or imply Y is a cause if differences between these two numbers exists. Which it most certainly has not.
If that isn’t bad enough, and it is, sometimes exposure (which is not dose) is not known, and instead a guess is substituted for it. And then it is forgotten a guess was made, and the uncertainty in that guess vanishes into the mists.
That is, exposure is not known or measured. But a substitute for it is given, usually by some complex model. For pesticides, such models might be some statistical measure of distance from a farm or a busy highway to a person’s home zip code (both examples have been tried). Whether or for how long a person was at that zip code is never regarded, or what protections a house might or might not provided is unknown.
Once these wild guesses about exposure are made, calculations like Pr(E’|YX) and Pr(E’|No YX) are made, where the E’ indicates the modeled guess of exposure (these models almost all skip Pr(Y|E’X)). And where all uncertainty in the model’s guess is forgotten.
It would be better for us to write M = E’, where M is the model’s guess of exposure. These models are sometimes more complex than just exposure or no exposure, but quite a number of them are just that: dichotomous exposure. Let’s examine these models for a start.
For our purposes, we need know nothing about them except that there is uncertainty in the model’s guess. That is,
$$\Pr(E|MX) < 1$$,
which is the probability of exposure given the model said there would be exposure, and whatever background information we have, which now includes information which led to this form of exposure model. We know this is not 1, because if it was the model would be perfect, and none are.
The rest of the analysis is as above, adding only the uncertainty in the exposure model. As above, epidemiologists would first calculate something like Pr(M|YX)/Pr(M|No YX), which is not the probability of disease. But it is reported that Y is a cause.
We can get what we want as before, but now we must start with Pr(Y | MX), the probability of disease given the model’s guess and background information X.
$$\Pr(Y | MX) = \frac{\Pr(Y |X)\Pr(M | YX)}{\Pr(Y | X)\Pr(M | YX) + \Pr(No Y | X)\Pr(M | No YX)}.$$
Next is the probability of the disease given actual exposure and the model saying there was exposure (where there is agreement between the model saying exposure and exposure taking place):
$$\Pr(Y | EMX) = \frac{\Pr(Y |MX)\Pr(E | YMX)}{\Pr(Y |M X)\Pr(E | YMX) + \Pr(No Y | MX)\Pr(E | No YMX)}.$$
The problem is Pr(E | YMX) and Pr(E | No YMX). If we knew the chance of exposure (E) given Y or No Y, then we wouldn’t need the model of exposure in the first place. Recall these two equations assume the model said there was exposure and assuming Y or Not Y (all assuming X). These two probabilities can be known from the exposure model’s performance if it is ever investigated in an independent predictive way, which is almost never is.
But if we’re willing to trust the skills of the modeler, which we most definitely should not if we must make any important decision using the model, then we might assume there is no bias in the model, then Pr(E | YMX) = Pr(E | No YMX) to a first approximation. These need not be Pr(E | YMX) = 1 or Pr(E | No YMX) = 1, meaning a perfect model; only that the model is just as good at predicting exposure for a person with or without the disease. If we let Pr(E | YMX) ~ Pr(E | No YMX), then because the terms cancel above
$$Pr(Y | EMX) \approx \Pr(Y | MX)$$.
Finally, we can calculate (to first approximation) the probability of new diseases given exposure and the model saying exposure, as above:
$$\Pr(YEM|X) = \Pr(Y | EMX)\Pr(E | MX)\Pr(M |X) \approx \Pr(Y | MX)\Pr(E | MX)\Pr(M |X)$$.
which is calculated as the probability of the disease given a person has been exposed and the model said they were exposed, times the probability of exposure given the model claimed exposure, all times the probability of the model claiming exposure. We can get both Pr(E | MX) and Pr(M |X), at least presumably, from the model characteristics, though again it’s rare to nonexistent to provide these numbers).
What we can notice is that because Pr(E | MX) < 1, because the model will be imperfect, we should have less confidence in the results. Which makes perfect sense since the model is only a prediction of exposure, with uncertainty.
And all this is still before we get to dose, which makes the whole even more uncertain.
$$\Pr(Y | DEMX) = \frac{\Pr(Y |EMX)\Pr(D | YEMX)}{\Pr(Y |EMX)\Pr(D | YEMX) + \Pr(No Y | EMX)\Pr(D | No YEMX)}.$$
To know this probability, in depth and not to approximation, requires work rarely or never seen. But it is only correct probability when a model is in use.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
Let me guess…forth of July parade “causes” children to become republicans. Hmm……
Should have been cause William. Still dazed by your powerful presentation. All of which is 100% true I might add.