January 9, 2018 | 20 Comments
This is our last week of theory. Next week the practical side begins in earnest. However much fun that will be, and it will be a jolly time, this is the more important material.
Last time we learned the concept of irrelevance. A premise is irrelevant if when it is added to the model, the probability of our proposition of interest does not change. Irrelevance, like probability itself, is conditional. Here was our old example:
(7a) Pr(CGPA = 4 | grading rules, old obs, sock color, math) = 0.05,
(7c) Pr(CGPA = 4 | grading rules, old obs, math) = 0.05,
In the context of the premises “grading rules, old obs, math”, “sock color” was irrelevant because the probability of “CGPA = 4” did not change when adding it. It is not that sock color is unconditionally irrelevant. For instance, we might have
(7d) Pr(CGPA = 3 | grading rules, old obs, sock color, math) = 0.10,
(7e) Pr(CGPA = 3 | grading rules, old obs, math) = 0.12,
where now, given a different proposition of interest, sock color has become relevant. Whether it is useful is, and always will be, whether it is pertinent to any decisions we would make about CGPA = 3. We might also have:
(7f) Pr(CGPA = 4 | grading rules, old obs, sock color) = 0.041,
(7g) Pr(CGPA = 4 | grading rules, old obs) = 0.04,
where sock color becomes relevant to CGPA = 4 absent our math (i.e. model) assumptions. Again, all relevance is conditional. And all usefulness depends on decision.
Decision is not unrelated to knowledge about cause. Cause is not something to be had from probability models; it is something that comes before them. Failing to understand this is the cause (get it!) of confusion generated by p-values, hypothesis tests, Bayes factors, parameter estimates, and so on. Let’s return to our example:
(7a) Pr(CGPA = 4 | grading rules, even more old obs, sock color, math) = 0.05,
(7b) Pr(CGPA = 4 | grading rules, even old obs, math) = 0.051.
Sock color is relevant. But does sock color cause a change in CGPA? How can it? Doubtless we can think of a story. We can always think of a story. Suppose sock color indicates the presence of white or light colored socks (then, the absence of sock color from the model implies dark color or no hosiery). We might surmise light color socks reflect extra light in examination rooms, tiring the eyes of wearers so that they will be caused to miss questions slightly more frequently than their better apparelled peers.
This is a causal story. It might be true. You don’t know it isn’t. That is, you don’t know unless you understand the true cause of sock color on grades. And, for most of us, this is no causation at all. We can tell an infinite number of causal stories, all equally consistent with the calculated probabilities, in which sock color affects CGPA. There cannot be proof they are all wrong. We therefore have to use induction (see this article) to infer sock color by its nature is acausal (to grades). We must grasp the essence of socks and sock-body contacts. This is perfectly possible. But it is something we do beyond the probabilities, inferring from the particular observations to the universal truth about essence. Our comprehension of cause is not in the probabilities, nor in the observations, but in the intellectual leap we make, and must make.
This is why any attempt to harness observations to arrive at causal judgments must fail. Algorithms cannot leap into the infinite like we can. Now this is a huge subject, beyond that which we can prove in this lesson. In Uncertainty, I cover it in depth. Read the Chapter on Cause and persuade yourself of the claims made above, or accept them for the sake of argument here.
What follows is that any kind of hypothesis test (or the like) must be making some kind of error, because it is claiming to do what we know cannot be done. It is claiming to have identified a cause, or a cause-like thing, from the observations.
Now classical statistics will not usually say that “cause” has been identified, but it will always be implied. In a regression for Income on Sex, it will be claimed (say) “Men make more than women” based on a wee p-value. This implies sex causes income “gaps”. Or we might hear, if the researcher is trying to be careful, “Sex is linked to income”. “Linked to” is causal talk. I have yet to see any definition (and they are all usually long-winded) of “linked to” that did not, in the end, boil down to cause.
There is a second type of cause to consider, the friend-of-a-friend cause, or the cause of a cause (or of a cause etc.). It might not be that sock color causes CGPAs to change, but that sock color is associated with another cause, or causes, that do. White sock color sometimes, we might say to ourselves, is associated with athletic socks, and athletic socks are tighter fitting, and it’s this tight fit that causes (another cause) itchiness, and the itchiness sometimes causes distraction during exams. This is a loose causal chain, but an intact one.
As above, we can tell an infinite number of these cause-of-a-cause stories, the difference being that here it is much harder to keep track of the essences of the problem. Cause isn’t always so easy! Just ask physicists trying to measure effects of teeny weeny particles.
If we do not have, or can not form, a clear causal chain in our mind, we excuse ourselves by saying sock color is “correlated” or (again) “linked to” CGPA, with the understanding that cause is mixed in somehow, but we do not quite know how to say so, or at least not in every case. We know sock color is relevant (to the probability), but the only way we would keep it in the model, as said above, is if it is important to a decision we make.
Part of any decision, though, is knowledge of cause. If we knew the essences of socks, and the essence of all things associated with sock color, and we judge that these have no causal power to change CGPA, then it would not matter if there were any difference in calculated probabilities between (7a) and (7b). We would expunge sock color from our model. We’d reason that even a handful of beans tossed onto the floor can take the appearance of a President’s profile, but we’d know the pattern was in our minds and not caused intentionally by the bean-floor combination.
If we knew that, sometimes and in some but not necessarily all instances, that sock color is in the causal chain of CGPA (as in for instance tightness and itchiness) then we might include sock color in our model but only if it were important for decision.
If we ignorant (but perhaps only suspicious) of the causal chain of sock color, which for some observations in some models we will be, we keep the observation only if the decision would change.
Note carefully that it is only knowledge of cause or decision that lead to use accepting or rejecting any observable from our model. It has nothing to do (per se) with any function of measurements. Cause and decision are king in the predictive approach. Not blind algorithms.
In retrospect, this was always obvious. Even classical statisticians (and the researchers using these methods) do not put sock color into their models of grade point. Every model begins with excluding an infinity of non-causes, i.e. of observations that can be made but that are known to be causally irrelevant (if not probabilistically) irrelevant to the proposition of interest. Nobody questions this, nor should they. Yet to be perfectly consistent with classical theory, we’d have to try and “reject” the “null” hypotheses of everything under, over, around, and beyond the sun, before we were sure we found the “true” model.
Lastly, as said before and just as obvious, if we knew the cause of Y, we don’t need probability models.
Next week: real practical examples!
Homework I do not expect to “convert” those trained in classical methods. These fine folks are too used to the language in those methods to switch easily to this one. All I can ask is that people read Uncertainty for a fuller discussion of these topics. The real homework is to find an example of or try to define “linked to” without resorting somewhere to causal language.
Once you finish that impossible task, find a paper that says its results (at least in part) were “due to” chance. Now “due to” is also causal language. Given that chance is only a measure of ignorance, and therefore cannot cause anything, and using the beans-on-floor example above, explain what it is people are doing saying results were “due to” chance.