WARNING! If you’re attempting to read this via email and not on the site, you might encounter problems: the text might look like gibberish. The LaTeX interpreter, which I need for the equations, is inconsistent in rendering equations in emails. It works fine on the site, however. Click the title in the email to go to the site.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This is an excerpt from Chapter 7 of Uncertainty.
Inference to the best explanation, a.k.a. abduction, is often defined with respect to surprise. But all surprise is conditional on tacit or already accepted knowledge. Your dad comes into the house, his clothes smeared with red paint. What is the explanation for the paint (and not your dad entering the house per se; as always, we pick the propositions of interest)? Well, it could be that he was out painting the house. Or perhaps he was painting the car, or maybe even an old chair in the garage. Or it could be that scamps drove by your house and paint-bombed him. Or perhaps he was abducted by Martians and the splotches are their probe markings. And so on, endlessly.
Pick one of these explanations and suppose it’s the only one we have. Say E = “Dad was out painting the house”. The observed data is D = “Paint-smeared clothes.” There must be tacit premises floating about, something to connect painting with paint on clothes, like “One who paints often but not always gets smeared,” or a plain “One who paints [always] gets smeared”. Whichever, E is the best explanation. Why? We have inferred the best explanation because it was the only explanation. Probabilities like $\Pr(\mbox{D} | \mbox{E})$ and $\Pr(\mbox{E} | \mbox{D})$ are irrelevant, and incorrect. This point is often confused because of miswriting Bayes’s theorem. Some incorrectly, or rather incompletely, write
$$
\Pr(\mbox{E} | \mbox{D}) = \frac{\Pr(\mbox{D} | \mbox{E})\Pr(\mbox{E})}{\Pr(\mbox{D})}.
$$
The mistakes are “$\Pr(\mbox{E})$” and “$\Pr(\mbox{D})$”, which do not exist (they have no conditions). It might be tempting to argue the denominator can be expanded:
$$
\Pr(\mbox{D}) = \sum_i \Pr(\mbox{D}|\mbox{E}_i)\Pr(\mbox{E}_i),
$$
but this repeats the error (for every $i$) in writing $\Pr(\mbox{E}_i)$, none of which exist. No unconditional probability exists, as proved earlier. The same mistakes would be made were D and E swapped in these equations.
There is no way to say how surprising D was except by reference to some explanation. To think otherwise is commit the fallacy of p-values in a Bayesian context. Suppose that $\Pr(\mbox{D} | \mbox{E})$ is small but non-zero. Yet if E is all we have on offer (completed by whatever tacit premises we accept), then E must be conditionally true, thus it is irrelevant how likely or surprising D is. If $\Pr(\mbox{D} | \mbox{E}) = 0$ (precisely 0) then, given D, E must be false even if E is the only possible explanation. In cases like that, we are left in ignorance, which is no bad thing, as long as we admit it.
How did we arrive at E anyway? Well, we searched our past experiences and hit upon the explanation which would make the probability of D, given that explanation E, high, if not certain. Since all we could come up with is this E, then, conditionally on our experiences, this E is the best—the only—explanation.
This is one of those areas where notation is both responsible and is the fix for our troubles. There isn’t anything wrong with Bayes’s theorem (how could there be?), but only with improper uses of it like in the next equation. $\Pr(\mbox{E})$ isn’t the “prior” probability of the explanation “before” we see D; it isn’t anything; it is a set of meaningless symbols. To become meaningful, some premises are needed. Perhaps “Experience”:
$$
\Pr(\mbox{E} | \mbox{D}, \mbox{Experience}) = \frac{\Pr(\mbox{D} | \mbox{E}, \mbox{Experience})\Pr(\mbox{E}|\mbox{Experience})}{\Pr(\mbox{D}|\mbox{Experience})}.
$$
And this works if our experience suggests only this explanation because
$$
\Pr(\mbox{D} | \mbox{E}, \mbox{Experience}) = \Pr(\mbox{D}|\mbox{Experience}),
$$
neither of which are 0, and because $\Pr(\mbox{E}|\mbox{Experience}) = 1$. That makes the left hand side 1, too. No magic has been done here, and it is as simple as said above. Because this was the only E we could think of, this is only possible explanation; a tautology. Note carefully that probability qua probability, nor Bayes’s theorem, tells us how we came to E, what caused us to think of this explanation. That is done by induction, as was shown earlier. Not the for the last time I emphasize probability does not solve all problems of uncertainty and evidential reasoning.
Instead of only one possibility, suppose our intuition or experience suggested two possibilities—or we are told to consider only two possibilities, which amounts to the same thing—E$_1$= “Garage painting with splatter,” E$_2$= “Scamps drove by and paint-bombed dad.” Experience, or outside directive or whatever, is the proposition “Just one of E$_1$ or E$_2$ is the correct explanation.” (This is similar to our familiar two-sided coin proposition and is not a tautology.) We then have
$$
\Pr(\mbox{E}_1 | \mbox{D}, \mbox{Experience}) = \frac{\Pr(\mbox{D} | \mbox{E}_1, \mbox{Experience})\Pr(\mbox{E}_1|\mbox{Experience})}{\Pr(\mbox{D}|\mbox{Experience})}.
$$
Now
$$
\Pr(\mbox{E}_1|\mbox{Experience}) = \Pr(\mbox{E}_1|\mbox{E}_1 \mbox{ or } \mbox{E}_2) = 1/2,
$$
and
$$
\Pr(\mbox{D} | \mbox{E}_1, \mbox{E}_1 \mbox{ or } \mbox{E}_2) =\Pr(\mbox{D} | \mbox{E}_1).
$$
And it must be that $\Pr(\mbox{D}|\mbox{Experience}) = \Pr(\mbox{D} | \mbox{E}_1 \mbox{ or } \mbox{E}_2) = 1$ since at least one of these must explain the data we have, and to explain means to describe the cause. That means $\Pr(\mbox{E}_1 | \mbox{D}, \mbox{Experience}) = 0.5\times\Pr(\mbox{D} | \mbox{E}_1)$ and thus $\Pr(\mbox{E}_2 | \mbox{D}, \mbox{Experience}) =0.5\times \Pr(\mbox{D} | \mbox{E}_2)$.
But wait: if E$_1$ is an explanation, then $\Pr(\mbox{D} | \mbox{E}_1) = 1$, which E$_1$ says why D came out; and the same is true for E$_2$! That makes $\Pr(\mbox{E}_1 | \mbox{D}, \mbox{Experience}) = \Pr(\mbox{E}_2 | \mbox{D}, \mbox{Experience}) = 1/2$.
And this should make eminent sense: we start with the premise that one of E$_1$ or E$_2$ must be the cause or explanation of D, which is a trivial premise. Since either can explain D, either has equi-probability of being the cause. There is nothing in D to help us pick: D is a consequence of either explanation. It is now clear that inference to the best explanation is a misnomer in any sense which makes use of an observation. There was never a problem with the non-illuminating mathematics. It was our misapplication, writing things like $\Pr(\mbox{D})$, that caused confusion.
Yet it seems in the context of paint-splattered father that his painting the garage is more likely than his being paint-bombed. Why? Both activities would cause the same paint; and our intuitions or some outside directives told us only to consider these and no other explanations. Well, at least my experience—my premises, which are rich and not all articulable—suggests that paint-bombing is pretty rare and garage painting isn’t. If I had to pick either E$_1$ or E$_2$ given these premises and the deduced non-numerical probability, that is, of course, to make a decision. And that means accounting for non-probability matters like loses and gains: probability isn’t decision. Supposing none of those are important, and using the (not here justified rule) of picking the most likely cause, I’d pick garage painting. Simple as that. Once again, to understand cause and explanation, is to seek after knowledge of nature and essence.
A word about the multiplicity of cause. Olympic runners competing for the 100 meter run. The gold medalist comes in at 9.69 seconds, while the silver medalist is right behind at 9.67 seconds. The winner has won by being faster. We do not need a statistical test to verify this, assuming (conditional on) the measurement equipment is working properly—but then we never need a statistical test or model to tell us what happened in the absence of measurement error. Something caused the winner to win. The cause was not one thing, but many, and this is because the movement of the human body over that distance comes about not by one cause but by many. Some of these causes will be more important than others, and these can (possibly) be known by controlling and measuring whatever these conditions are. The exact number of causes is enormous and scarcely countable: among these causes are everything the athlete has eaten (over such-and-such a time), all the elements that enter into his training regime, and on and on.
Next consider winning times over a number of races. Men gold medalists are always faster than the fastest woman. Does male sex cause the men to out-race their feminine competitors? As everybody knows, it is not sex per se that causes men to be faster; instead, sex causes differences in anatomy and physiology that are tied to athletic performance. Men have more muscle, and muscle causes fleetness, and so on. This is why the myriad of models that “control” for sex and which imply sex is a cause are always wrong—unless they are modeling direct effects of sex, such as pregnancy, and in which case, no model is used because we understand the essence. Sex is a proxy for (usually) multiple other causes and is itself not a cause. And this kind of reasoning also applies for such things as race, income, and so on. Statistical models aren’t capable of discerning cause.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.