## William M. Briggs

### Statistician to the Stars!

#### Page 145 of 693

What’s the difference between “independence” and “irrelevance” and why does that difference matter? This typical passage is from The First Course in Probability by Sheldon Ross (p. 87) is lovely because many major misunderstandings are committed, all of which prove “independence” a poor term: (and this is a book I recommend; for readability, I changed Ross’s notation slightly, from e.g. “P(E)” to “Pr(E)”)

The previous examples of this chapter show that Pr(E|F), the conditional probability of E given F, is not generally equal to Pr(E), the unconditional probability of E. In other words, knowing that F has occurred generally changes the chances of E’s occurrence. In the special cases where Pr(E|F) does in fact equal Pr(E), we say that E is independent of F. That is, E is independent of F if knowledge that F occurred does not change the probability that E occurs.

Since Pr(E|F) = Pr(EF)/Pr(F), we see that E is independent of F if Pr(EF) = Pr(E)Pr(F).

The first misunderstanding is “Pr(E), the unconditional probability of E”. There is no such thing. No unconditional probability exists. All, each, every probability must be conditioned on something, some premise, some evidence, some belief. Writing probabilities like “Pr(E)” is always, every time, an error, not only of notation but of thinking. It encourages and amplifies the misbelief that probability is a physical, tangible, measurable thing. It also heightens the second misunderstanding. We must always write (say) Pr(E|X), where X is whatever evidence one has in mind.

The second misunderstanding, albeit minor, is this: “knowing that F has occurred generally changes the chances of E’s occurrence.” Note the bias towards empiricism. In other places Ross writes “An infinite sequence of independent trials is to be performed” (p. 90, an impossibility); “Independent trials, consisting of rolling a pair of fair dice, are performed (p. 92, “fair” dice are impossible in practice). “Events” or “trials” “occur”, i.e., propositions that can be measured in reality, or are mistakenly thought to be measurable. Probability is much richer than that.

Non-empirical propositions, as in logic, easily have probabilities. Example: the probability of E = “A winged horse is picked” given X = “One of a winged horse or a one-eyed one-horned flying purple eater must be picked” is 1/2, despite that “events” E and X will never occur. So maybe the misunderstanding isn’t so minor at that. The bias towards empiricism is what partly accounts for the frequentist fallacy. Our example E and X have no limiting relative frequency. Instead, we should say of any Pr(E|F), “The probability of E (being true) accepting F (is true).”

The third and granddaddy of all misunderstandings is this: “E is independent of F if knowledge that F occurred does not change the probability that E occurs.” The misunderstanding comes in two parts: (1) use of “independence”, and (2) a mistaken calculation.

Number (2) first. Because it is a mistake to write “Pr(EF) = Pr(E)Pr(F)”, there are times, given the same E and F, when this equation holds and times when it doesn’t. A simple example. Let E = “The score of the game is greater than or equal to 4” and F = “Device one shows 2”. What is Pr(E|F)? Impossible to say: we have no evidence tying the device to the game. Similarly, Pr(E) does not exist, nor does Pr(F).

Let X = “The game is scored by adding the total on devices one and two, where each device can show the numbers 1 through 6.” Then Pr(E|X) = 33/36, Pr(F|X) = 1/6, and Pr(E|FX) = 5/6; thus Pr(E|X)Pr(F|X) (~0.153) does not equal Pr(E|FX)Pr(F|X) (~0.139). Knowledge of F in the face of X is relevant to the probability E is true. (Recall these do not have to be real devices; they can be entirely imaginary.)

Now let W = “The game is scored by the number shown on device two, where device and one and two can show the numbers 1 through 6.” Then Pr(E|W) = 1/2, Pr(F|W) = 1/6, and Pr(E|FW) = 1/2 because knowledge of F in the face of W is irrelevant to knowledge of E. In this case Pr(EF|W) = Pr(E|W)Pr(F|W).

The key, as might have always been obvious, is that relevance depends on the specific information one supposes.

Number (1). Use of “independent” conjures up images of causation, as if, somehow, F is causing, or causing something which is causing, E. This error often happens in discussions of time series, as if previous time points caused current ones. We have heard times without number people say things like, “You can’t use that model because the events aren’t independent.” You can use any model, it’s only that some models make better use of information because, usually, knowing what came before is relevant to predictions of what will come. Probability is a measure of information, not a quantification of cause.

Here is another example from Ross showing this misunderstanding (p. 88, where the author manages two digs at his political enemies):

If we let E denote the event that the next president is a Republican and F the event that there will be a major earthquake within the next year, then most people would probably be willing to assume E an F are independent. However, there would probably be some controversy over whether it is reasonable to assume that E is independent of G, where G is the event that there will be a recession within two years after the election.

To understand the second example, recall that Ross was writing at a time when it was still possible to distinguish between Republicans and Democrats.

The idea that F or G are the full or partial efficient cause of E is everywhere here, a mistake reinforced by using the word “independence”. If instead we say that knowledge of the president’s party is irrelevant to predicting whether an earthquake will soon occur we make more sense. The same is true if we say knowledge that this president’s policies are relevant for guessing whether a recession will occur.

This classic example is a cliché, but is apt. Ice cream sales are positively correlated with drownings. The two events, a statistician might say, are not “independent”. Yet it’s not the ice cream that is causing the drownings. Still, knowledge that more ice cream being sold is relevant to fixing a probability more drownings will be seen. The model is still good even thought it is silent on cause.

Keynes

This sections contains more technical material.

The distinction between “independence” and “irrelevance” was first made by Keynes in his unjustly neglected A Treatise on Probability (pp. 59–61). Keynes argued for the latter, correctly asserting, first, that no probabilities are unconditional. Keynes gives two definitions of irrelevance. In our notation, “F is irrelevant to E on evidence X, if the probability of E on evidence FX is the same as its probability on evidence X; i.e. F is irrelevant to E|X if Pr(E|FX) = Pr(E|X)”, as above.

Keynes tightens this to a second definition. “F is irrelevant to E on evidence X, if there is no proposition, inferrible from FX but not from X, such that its addition to evidence X affects the probability of E.” In our notation, “F is irrelevant to E|X, if there is no proposition F’ such that Pr(F’|FX) = 1, Pr(F’|X) \ne 1, and Pr(E|F’X) \ne Pr(E|X).” Note that Keynes has kept the logical distinction throughout (“inferrible from”).

Lastly, Keynes introduces another distinction (p. 60; pardon the LaTex):

$h_1$ and $h_2$ are independent and complementary parts of the evidence, if between them they make up $h$ and neither can be inferred from the other. If x is the conclusion, and $h_1$ and $h_2$ are independent and complementary parts of the evidence, then $h_1$ is relevant if the addition of it to $h_2$ affects the probability of $x$.

A passage which has the footnote (in our modified notation): “I.e. (in symbolism) $h_1$ and $h_2$ are independent and complementary parts of $h$ if $h_1 h_2 = h$, $Pr(h_1|h_2) \ne 1$, and Pr(h_2|h_1) \ne 1$. Also$h_1$is relevant if Pr(x|h) \ne Pr(x|h_2).” Two (or however many) observed data points, say,$x_1$and$x_2\$ are independent and complementary parts of the evidence because neither can be deduced—not mathematically derived per se—from each other. Observations are thus no different than any other proposition.

Last Wednesday, the Daily Mail told the world of the peer-reviwed paper Lord Monckton, Willie Soon, David Legates and I wrote entitled “Why models run hot: results from an irreducibly simple climate model” (the post which highlighted this will be restored soon). The article was “Is climate change really that dangerous? Predictions are ‘very greatly exaggerated’, claims study“.

• Researchers claim global warming predictions are ‘greatly exaggerated’
• Large climate models typically require computers to perform calculations
• They consider factors such as animal numbers and tectonic variations
• By comparison, a team of researchers has created a ‘simple’ model
• It looks at levels of solar energy absorbed and reflected by Earth
• Using this simple model, they claim current predictions are wrong
• Once errors are corrected, global warming in response to a doubling of CO2 is around 1oC or less – a third of the predicted 3.3oC

The scientific community reacted with clam, reasoned, logical argument.

Kidding! I’m kidding. People flipped out. Less than two days after our paper was generally known, I was hacked. The posts and comments from my old WordPress account were wiped out. Thank the Lord, I had backups for most things. Although I was off line for almost five days, I’m mostly back.

Here is one of the other asinine reactions. I’ll have more later because this makes for a fascinating case study of how outrageously political science has become.

A meager-witted unctuous twit of a “reporter” rejoicing under the unfortunate name Sylvan Lane (cruel parents) from the far-left Boston Globe was assigned to attack the authors of “Why Models Run Hot”. Lord Monckton and I are independent and Legates’s position is solid. So Lane went after Soon. He emailed asking for “information.” I offered to provide it. Lane wrote back:

I apologize if I wasn’t clear before. The kind of questions I would like to ask Dr. Soon are the same ones Science Bulletin insisted you and your colleagues answer before it published your paper. Here’s a link to its conflict of interest policy, which outlines the same type of questions any writer is required to answer before being published in the journal.

I do agree with you that these questions are best left up to him, which I why I’ve cc’d him on this email. While Science Bulletin’s conflict of interest policy is comprehensive, it doesn’t specify whether it pertains to the specific submitted study or an author’s body of work. I’ve contacted them to clarify and contacted Dr. Soon and Harvard-Smithsonian to ask them about their interpretation of the policy. Those are my only intentions.

I replied:

Allow me to doubt that “clarifying” Dr Soon’s employment status and his employer’s understanding of a journal’s publication policies are your only intentions. But if on the wee small chance they are, is it your habit to investigate the employment status of every author of every science paper, or just those papers the content of which are disconsonant (in some way) with your employer’s or your views? What a dull job that would be.

But now I come to think of it, this might be a fun line of questioning. Let me try. How much money are you getting for this work? Do you feel that this money discredits the work you’re doing? Do you feel tainted by the money? Do you feel tempted, or will you, change what you write so that it more closely matches that of your employer? Have you had training as a scientist or in other ways feel competent to judge the content of science papers like ours? If not, why are you writing about this particular paper?

You’ll of course know the fallacy of the non sequitur. If not, here’s an example. A man makes a claim X. X might be true or again it might be false. A reporter says, “I don’t like that man, therefore X cannot be true. I shall write a story about this, to the cheer and admiration of my fellow journalists.” He does so, and is feted as predicted.

Anyway, if you have relevant scientific, logical, climatological, meteorological, or statistical questions, I’d be glad to help. But I’ll trade answer for answer.

Not surprisingly, the dull-minded Lane did not respond. Instead, filled with notions of his own self importance and a nearly complete ignorance of how conflict-of-interest declarations work, the untutored Lane filed a report with his partisan political sheet: “Climate change skeptic accused of violating disclosure rules“.

I contacted Lane on Twitter (@SylvanLane: his visage reminds of a smugger version of Pajama Boy) to let him know what a foolish and stupid thing he had done. The coward did not respond.

Absolutely nowhere in this fictional “controversy” are any questions of science asked, addressed, or even hinted at. What is that Alinsky tactic? Teach the controversy and not the idea, or whatever? So blatant was Lane’s purpose that I hope his parents, if they haven’t been forced into hiding, are at least blushing for him.

Need I point out that it doesn’t matter if any or all of us authors were racist sexist homophobe slave trader twice-convicted con artists from Pluto, none of that, in any way, would be relevant to the points we made in “Why Models Run Hot”?

Any notion of responding to Lane’s preposterous “charges” would be giving him a victory, if you can call such callow acts “victorious.” Therefore I’ll insist that if you want to talk about the paper, talk about the paper.

We’re nearly back, ladies, gentlemen, and things in between!

I have restored most of the posts and comments. I haven’t uploaded the old images, so no pictures. The site needs lots of work, tweaks, adjustments. But it’s there!

Ha ha! Thanks to those who hacked me, I was able to move servers, which I’ve wanted to do for a long time.

Update I was hacked shortly after our paper “Why models run hot: results from an irreducibly simple climate model” became generally known. The posts and comments from my WordPress database were wiped out. Nothing else in the database or on the site was touched.

Except…backups. Every time I asked Yahoo (my old host) about them, they temporized. How could site-wide backups disappear. From me, I can see, but from Yahoo itself? Not so easy, that.

Like I said earlier, I forgive my hackers. You’ve done me a favor (what it is, I shan’t tell you!).

The pictures that used to appear above and in posts I have. But. Inside each post is a link which no longer points to the right place. I can fix this through a far-too painful grep session, then try to overwrite the new database, or skip it. I’m skipping it. I’ll go back and fix those posts which receive traffic, leaving the rest picture-less. Worse things can happen.

I have about 30 posts to back up. I’ll put most of these up over the course of a week or so. I don’t want to overwhelm subscibers with emails. Your comments to these posts are, sadly and forever, lost. But you can make them anew! I tried waiting for Yahoo to see if they could restore a snapshot of the database, but one day turned into two, into three, which today turned into a “snag” and then a soulless announcement that “more information” would be available within “24 to 48 hours.” Since this was doubtful, to say the least, I made the move.

I lost no emails nor any files. Only those two tables.

I have to fix all the little things with the theme that I had before. This will take a day or three. I’m in no rush.

More later.

Update Any WordPress.com experts out there? My new registered blog is “wmbriggs.com”, whereas the old one was “wmbriggs.com/blog”. All the site stats and, more importantly, blog subscribers are registered under the later. I looked around on WordPress.com but couldn’t discover a way to make these the same. Ideas? (Besides emailing their support.)

Update Now is also the time to ask for theme tweaks and minor changes.

Update If you’ve emailed me over the past five days, please email again. I lost these.

A Logical Probabilist (note the large forehead) explains that the interocitor has three states.

This post is one that has been restored after the hacking. All original comments were lost.

Bayesian theory probably isn’t what you think. Most have the idea that it’s all about “prior beliefs” and “updating” probabilities, or perhaps a way of encapsulating “feelings” quantitatively. The real innovation is something much more profound. And really, when it comes down to it, Bayes’s theorem isn’t even necessary for Bayesian theory. Here’s why.

Any probability is denoted by the schematic equation $\Pr(\mbox{Y}|\mbox{X})$ (all probability is conditional), which is the probability the proposition Y is true given the premise X. X may be compound, complex or simple. Bayes’s theorem looks like this:
$\Pr(\mbox{Y}|\mbox{W}\mbox{X}) = \frac{\Pr(\mbox{W}|\mbox{YX})\Pr(\mbox{Y}|\mbox{X})}{\Pr(\mbox{W}|\mbox{X})}$.
We start knowing or accepting the premise X, then later assume or learn W, and are able to calculate, or “update”, the probability of Y given this new information WX (read “W and X are true”). Bayes’s theorem is a way to compute $\Pr(\mbox{Y}|\mbox{W}\mbox{X})$. But it isn’t strictly needed. We could compute $\Pr(\mbox{Y}|\mbox{W}\mbox{X})$ directly from knowledge of W and X themselves. Sometimes the use of Bayes’s theorem can hinder.

Given X = “This machine must take one of states S1, S2, or S3? we want the probability Y = “The machine is in state S1.” The answer is 1/3. We then learn W = “The machine is malfunctioning and cannot take state S3?. The probability of Y given W and X is 1/2, as is trivial to see. Now find the result by applying Bayes’s theorem, the results of which must match. We know that $\Pr(\mbox{W}|\mbox{YX})/\Pr(\mbox{W}|\mbox{X}) = 3/2$, because $\Pr(\mbox{Y}|\mbox{X}) = 1/3$. But it’s difficult at first to tell how this comes about. What exactly is \Pr(\mbox{W}|\mbox{X}), the probability the machine malfunctions such that it cannot take state S3 given only the knowledge that it must take one of S1, S2, or S3? If we argue that if the machine is going to malfunction, given the premises we have (X), it is equally likely to be any of the three states, thus the probability is 1/3. Then $\Pr(\mbox{W}|\mbox{YX})$ must equal 1/2, but why? Given we know the machine is in state S1, and that it can take any of the three, the probability state S3 is the malfunction is 1/2, because we know the malfunctioning state cannot be S1, but can be S2 or S3. Using Bayes works, as it must, but in this case it added considerably to the burden of the calculation.

Most scientific, which is to say empirical, propositions start with the premise that they are contingent. This knowledge is usually left tacit; it rarely (or never) appears in equations. But it could: we could compute $\Pr(\mbox{Y}|\mbox{Y is contingent})$, which even is quantifiable (the open interval (0,1)). We then “update” this to $\Pr(\mbox{Y}|\mbox{X \& Y is contingent})$, which is 1/3 as above. Bayes’s theorem is again not needed.

Of course, there are many instances in which Bayes facilitates. Without this tool we would be more than hard pressed to calculate some probabilities. But the point is the theorem can but doesn’t have to be invoked as a computational aide. The theorem is not the philosophy.

The real innovation in Bayesian philosophy, whether it is recognized or not, came with the idea that any uncertain proposition can and must be assigned a probability, not in how the probabilities are calculated. (This dictum is not always assiduously followed.) This is contrasted with frequentist theory which assigns probabilities to some unknown propositions while forbidding this assignment in others, and where the choice is ad hoc. Given premises, a Bayesian can and does put a probability on the truth of an hypothesis (which is a proposition), a frequentist cannot—at least not formally. Mistakes and misinterpretations made by users of frequentist theory are legion.

The problem with both philosophies is misdirection, the unreasonable fascination with questions nobody asks, which is to say, the peculiar preoccupation with parameters. About that, another time.