All can and should read this post! Not just those actively taking the Class. This is an introduction to Epidemiology, a science which has done tremendous damage to clear thinking. All need to understand the basics, because “results” from this field are foisted on us more often than you know. It gets technical at the end, and only those in the Class will follow. But all should read until that point.
THIS IS THE LAST CLASS UNTIL 6 JANUARY 2025! JOYOUS ADVENT, MERRY CHRISTMAS, HAPPY NEW YEAR.
Uncertainty & Probability Theory: The Logic of Science
Video
Links:
Twitter (latest class at bottom of thread)
Bitchute (often a day or so behind, for whatever reason)
HOMEWORK: You must look up and discover a Research Shows paper and see if it conforms to the conditions given below.
Lecture
Suppose you want to know whether giving a man a dose of some substance causes him to develop a dread disease, worrisome malady, or some untoward difficulty. How would you go about learning this?
You could give him the substance and see. Problem with that is that if you give the substance to him now, there may be things about the man that resist the disease (a word I’ll use to describe any bad outcome); things such as diet, circumstance, history, or biology, modified by age and any number of physical states may be involved. Which means our man might not get the disease now, but that he might later, after he has forgotten he took the substance, or he might never get it for the same kinds of reasons.
If he doesn’t get the disease, maybe you didn’t give him enough of the substance, or maybe the substance at low levels because of hormesis was even helpful. Measuring the outcome of just one man would not seem to bring definite knowledge of cause or size of effect.
You could give the substance to a number of men and then count who got sick. But you haven’t solved your problem. You have magnified it, because each of those men have their own individual diets, circumstances, ages, histories, biologies. Almost certainly you will have measured very, very little of these things, and if any of them, or groups of them, in complex, non-linear combinations, are important to the presence of the malady, you might never know what is going on.
Even worse, if the man does get the disease, unless you can control absolutely everything about the man, which you cannot, it could well be that something besides the substance gave him the disease.
Your labors would be lessened were the substance to act quickly and reliably, like arsenic or a lighted stick of dynamite. But you’re going to be in deep kimchi if the substance acts slowly, cumulatively, or needs highly specialized, or even variable, circumstances to activate.
Ideally, you would identify the exact causal path from the ingestion or inhalation of the substance to the disease or its absence. By “causal path” I mean the precise circumstances, which may not be unique, in which the substance, modified by any number of conditions, each precisely measured and known, causes or fails to cause the disease. Tough task. Sometimes practically impossible.
Take heart. It can be done! After all, how many times do you suppose we need to try the trick with dynamite to prove to any observer that it causes indigestion?
Problem for you is that your substance is not dynamite. It’s some routine chemical, like a pesticide, say, or a food additive, or something to do with a car’s exhaust. Something which sounds bad, but which you’re not sure is bad, a thing which is also known to cause some good. The causal path can still be found, but it’s going to take herculean effort, and a long, long time. Mostly because there is always some squeamish effeminate who will object to direct experimentation.
Barring direct experimentation, what if instead you could substitute correlations and offer them as evidence as causation?
Not if you mean those correlations are proof of causation, because they aren’t, and can never be. As I have written innumerable times, every scientist know that correlation is not proof of causation. Unless it is his correlation. Then he believes his correlation is causation with the ardor and zeal of a convert.
Epidemiology is the field which officially mistakes correlations for causations.
It would be a useful field if it remained in the world of correlations, presenting results quietly and only as speculations of possible, itinerant or occasional causes, as only hints and clues for other scientists over which causal paths to investigate. Consider we wouldn’t need epidemiology to grasp big, obvious causes, like live dynamite and dead bodies
We only need or invoke epidemiology when there is great uncertainty.
Which means we must use probability. So I hope you (non-regulars in the Class) will forgive me if I now lapse into very light notation, because it greatly eases writing of these things. All we need to know is that “Pr(Y|X)” means the probability of Y assuming X is true, whatever X and Y might be. (Those who have been taking the Class know all this.)
Let Y be the disease, malady or other bad thing in which we’re interested. X will be whatever evidence we consider probative of Y. If X represents full knowledge of the causal path of a substance to disease then Pr(Y|X) = 0 if this knowledge says a person won’t get the disease, or Pr(Y|X) = 1 if this knowledge says he will. Any other number between 0 and 1 means we are unsure.
If we’re unsure we cannot claim cause: it is a fallacy to do so.
To fix an example, let’s use Amyotrophic Lateral Sclerosis (ALS) as our disease, and for our substance we’ll consider pesticides. I’m going to draw numbers from the 2010 paper “Exposure to pesticides and risk of amyotrophic lateral sclerosis: a population-based case-control study” in Annali dell’Istituto Superiore di Sanità by Francesca Bonvicini and others.
Now, before coming to pesticides, a person interested in ALS will have accumulated evidence about “background” rates for this, and many other, diseases. Let’s call this evidence X. (It will generally vary by person, etc.)
From Bonvicini we learn there are about 150,000 residents in the Reggio Emilia area of Italy in which ALS patients were identified in hospital records over an 11-year period. This means the greater population is somewhat larger, because some of the 150,000 would have died or moved out, and others will have been born or moved in over that time. There is uncertainty in the population.
During the 11 years, after a thorough search of all public and private hospitals, and a search of records for a drug used to treat ALS, they found 41 ALS cases, some of which were only “probable”. Which means they might not have been ALS cases at all. All this is part of our X. There is uncertainty in our diagnoses.
This is the point to pause. We already have two uncertainties and what’s interesting about them is that they never make their way into any models (in the paper, or elsewhere). It will turn out there are more than just these two uncertainties, and they never enter models either. They therefore do not exist—officially. All concentration is given to the final models (which we will not come to today), i.e. those parts that can easily become math, the rest is forgotten, and the math becomes “The Results”.
Which means, at the least, that epidemiological results will be over-certain, the probabilities too sure. This is the other side of the Deadly Sin of Reification. That is when a model or theory replaces Reality. Here we have bits of Reality thrown out, and never remembered, because they do not fit ordinary model procedures. This is extremely common! The Lure of Quantification guarantees too much weight will be given to The Results.
Now, because of our uncertainties, we know the overall ALS rate is smaller than 41/150000 = 0.00027, because the 150,000 is too small and the 41 is probably too large. But, for a start and to mimic The Results, we’ll let Pr(Y|X) = 0.00027. Which is the probability a person in this vicinity was diagnosed with ALS or “probable” ALS in those 11 years. These figures are open to dispute, as said, and I don’t insist on any of them. Indeed, I insist that nobody insists, because nothing here is certain.
This is already a weak form of epidemiology, if you like. It becomes genuine epidemiology when we want to tie pesticides to ALS, not knowing the causal path.
It is at this point epidemiology goes awry. So pay attention. What is it we want to know? Whether, or how often, and at what dose, pesticides cause ALS, or don’t cause ALS. Which pesticide? All of them? Any of them? Just this one? There is incredible vagueness here, another glaring uncertainty. Many papers in this area exist (see the video), and each peels off different pesticides. Are they all bad? Equally bad? In the same way with dose? We never answer any of these questions.
It is at this point epidemiology becomes something like a treasure hunt. As long as we can find correlations, we are allowed to paint a substance as “bad”. It’s not that we start with a specific singular substance and want to investigate its properties, its powers and conditions. It’s that we are willing to take anything that sounds bad and seek for correlation. This is a fecund fount of false positives. Technically, as we’ll see another day, this ignores or exaggerates the X in our equations.
In any case, let’s call a sufficient dose D. What is “sufficient”? I have no idea. Neither does anybody else almost none of the time.
PAY ATTENTION. Knowing whether anybody got a sufficient dose of pesticides is often impossible. Which means we’re stuck, and can’t do anything. Except that an epidemiologist invented the idea of exposure.
Exposure is not dose! Exposure is what happens before dose. You must be exposed before you can get a dose. But you might be exposed and never get a dose. An analogy is cable TV. If you have cable TV, and part of your “package” has CNN, an epidemiologist would say you are exposed to CNN, because the cable brings CNN into your house. But, like most people, you might never watch this channel. If so, while you are exposed, you never received a dose.
My friend Jaap Hanekamp is fond of the shower analogy. In a shower you’re exposed to around 25 gallons of water in ten minutes, and are better off for it. But if you drank, i.e. dosed yourself with, ten pints of water in that time, you die. Exposure is not dose.
In the cable example, if all you know is whether the cable comes into the house, you can never know dose. But we might guess, and since all guesses are uncertain, it means probability enters again. Let D be the dose and E the exposure, then we can write
$$\Pr(D|EX) \ne \Pr(E|DX)$$,
where we must have whatever background information X we used to in order to make our guesses. We can go further. It should be clear that Pr(E|DX) = 1: if you got the dose, you were surely exposed. But Pr(D|EX) can be any number in 0 to 1, depending on the information X we assume.
The doctor analyzing the data from Bonvicini learns that 82 people who did not have ALS were asked whether they were ever exposed to pesticide (of certain kinds, which we ignore here), and 11 said they were. It was also discovered, via questionnaire, that 13 of the 41 ALS patients were exposed to pesticide.
There is uncertainty in the questionnaire. Which is ignored.
The authors in Bonvicini then commit one of the most common blunders in epidemiology. They calculated something like
$$\Pr(E | YX) = 13/41 \approx 0.317$$
and
$$Pr(E | No YX) = 11/82 \approx 0.134.$$
The first equation is the probability of exposure of any of these 41 people who had ALS, and the second is the probability of exposure if any of these 82 people who did not have ALS. Both equations also assume whatever background information X we have. Bonvicini concluded, or heavily implied, that because Pr(E | YX) > Pr(E|No YX) (or rather a function of this) that therefore pesticides cause ALS.
We’ll not do this until much later in the class, but really the authors computed a function of these two probabilities in order to do “hypothesis testing”, a procedure I wish would die in science. We’ll see why another day.
These are the wrong probabilities. What we want is this: Pr(Y|DEX), which is the probability of having ALS after first being exposed and then receiving a sufficient dose of pesticide, all given our initial information X. It should be obvious, in case it is not, that
$$\Pr(Y | DEX) \ne \Pr(E|YX) \ne Pr(E|No YX)$$.
I never counted, but I’d say a hefty, and almost surely more than most, papers in epidemiology make the mistake of assuming Pr(Y | DEX) = Pr(E|YX), even though this is a blatant error in probability. This is a pandemic in bad probability.
Next time we’ll compute Pr(Y|EX), and then Pr(Y|DEX). We’ll see these are not the same. We’ll then move to what I call the epidemiologist fallacy, which is when a model for exposure is substituted for exposure—which happens all the time!—and in which causal claims are made about dose, which nobody knows.
Merry Christmas.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE