# William M. Briggs

### Statistician to the Stars!

#### Page 145 of 547

David Stove, Chance Master

Just review and clarification this time, folks. Dirty hard work. But necessary given the confusion from last post. Time to pump some neurons! Next time we—finally!—get to parameters, models, and all that.

(All the stuff in this series is, in a fuller form, in my new upcoming book, which is tentatively called Logical Probability and Statistics—but I’ve only changed the name 342 times, so don’t count on this one sticking.)

Recapitulation

• Probability is the measure of uncertainty; matters of certainty and uncertainty speak of our knowledge and how we know what we know or of what we are unsure.
• Given a fixed set of premises and conclusion, it follows that conditionally the conclusion (a proposition) is either true, false, or somewhere in between, i.e uncertain. A proposition can be true given one set of premises, false given another, or uncertain given a third.
• A proposition is necessarily true when it (a) validly follows a chain of true propositions back to a bedrock set of propositions which are accepted as true because (b) they are axiomatically true, i.e. just plain true. We cannot explain why or how these fundamental propositions are true: we accept they are becaue of their obviousness, they are revealed to us, (Socrates would say we remember them), i.e by faith.
• A proposition is contingently true when it validly follows from chain of propositions which are accepted as true. If these accepted propositions (and their sires, grandsires, etc. if present) are themselves true then the proposition at hand is necessarily true as above, and it is misleading to say it is contingently true. It can be that a proposition is accepted as contingently true because it is not known to be necessarily true.
• An objectivist takes an argument as it is and neither adds to or subtracts from it. The truth, falsity, or in-betweenness of the conclusion follow only from the evidence stipulated.
• An subjectivist takes an argument and adds to or subtracts from it; either from the premises or in modifying the conclusion. This is acceptable only if these modifications are manifest. They usually are not. Logic and probability do not guaurantee an absence of confusion.
• A frequentist often acts like a subjectivist unaware of her subjectivism; but even if not, she makes other errors. See the previous post for a sample—and only a sample—of these errors.
• Probability need not be a number, can be a range, or can be unique value. A probability of 1 implies truth, of 0 falsity. There is no such thing as probability; it is not a physical thing; neither are numbers.
• An easily seen result of probability is that adding a truth to a list of premises does not change the argument. If, for example, a conclusion follows (or doesn’t) from some set of premises, then saying, “Accepting these premises and this truth” is equivalent to saying, “Accepting these premises.”

Clarifications

Given all these, there was confusion last time exactly how evidence in premises allows us to deduce probabilities.
Suppose the proposition (conclusion) “A Q will emerge”. What does knowing only the premise, “I have no idea how things emerge from this process” do for us? That is, what is the objectivist probability the conclusion is true given this and only this premise?

The answer “I don’t know” floats to mind. After all, we admitted ignorance and ignorance says nothing about Q. The logical nothing means no thing, incidentally, and not just a little thing, nor mostly nothing, nor uniformity, nor anything else. The probability doesn’t exist if we know nothing about Q. But this is only so if the premise itself truly has nothing to say about Q. Is there anything that can be deduced from the premise which allows us to uncover occult evidence about Q?

Well, it might be argued that “I have no idea how things emerge from this process” taken in conjunction with the conclusion “A Q will emerge” implies that Q is possible. But this is to act like a subjectivist and to change the premise to “I really don’t know, but since the guy is asking about Q, it seems as if Q is at least possible.” This inference is false because it cheats. We cannot go from “I know nothing about Q” to “Q is possible.” At best, this argument is circular because it takes information in the conclusion and places it is the premise (subjectively). Thus “I don’t know” is the probability; i.e. the probability doesn’t exist.

Switch the premise to “A Q might emerge from this process.” The conclusion is still “A Q will emerge.” The argument is invalid, but we feel on firmer probabilistic ground. The answer which appears first might be, “Since a Q might emerge, but we have no other idea about Q, then the probability is some number between 0 and up to and including 1.”

Is there anything that can be deduced from the premise this time? Yes. Taken one way, from “A Q might emerge from this process” it follows that “A Q might not emerge from this process.” Written tersely, this is “Either a Q will emerge or it won’t,” which is evidently a tautology, a statement which is always true, i.e. it is a truth.

From above we know that adding a truth to a list of premises does not change the argument. And any truth in a list of premises may be swapped for another truth. For example, “It will rain July 4th, 2561 in New York City or it won’t.” Making this substitution and keeping the same fixed conclusion, it does not follow that its probability is the interval (0, 1]. Instead, it is admitting we know nothing about Q, and nothing cannot imply a probability.

But, given the fluidity of English, there is another way we can interpret the premise: “A Q is one of the possibilities of this process.” We can still derive the same tautology from this, but now there is ever-so-slightly more information about Q, and with that we can claim our original answer, which is some number between 0 and up to and including 1. Which still isn’t saying much. All we can infer from this premise is that Q is not impossible. That is why the interval is (0, 1] and not [0, 1]. Small comfort!

Incidentally, we could say that the probability is [0, 1] for the tautological or ignorance premises, but since this interval is everything there is—truth, falsity, and in-betweenness—it really says nothing, which is our answer.

Back to the statistical syllogism. Premise: “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge.” Same conclusion. The statistical syllogism allows us to deduce the probability 1/n from this premise, an answer which confused some who insisted that no probability can be deduced.

If that is so—if no probability whatsoever can be derived from this new premise—then we necessarily are in the logically equivalent situation of the tautological or ignorance premises. We have seen it was only from these (or other logically equivalent propositions) that we can deduce the interval [0,1], which is to say no probability at all.

Is it true that “I know nothing about Q” is logically equivalent to “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge”? Obviously not. We could, of course, deduce the tautology “Q emerges or not”, but that is because this tautology is always true even if there is no process in the universe which produces Qs.

Is “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge” logically equivalent to “A Q is one of the possibilities of this process”? If so, then the answer is the interval (0,1], which says something but very little, and which may have been in the mind of some commenters. Well, at least from our new premise we can derive that “A Q is one of the possibilities of this process”, which was the old premise. But if there is no more information than that in the new premise, then we are done and the critics are right.

But what are we to make of the other words in the new premise? In the old premise, we do not know how many different possibilities exist: the number could be infinite. But in the new premise we know, in addition to that Q can be a state (the old premise), that there are n-1 other states besides Q and that one of these states (Q or another) must emerge. There are only these n-1 states and no other. We have certain evidence which says n different things can happen, that there is a distinction between them (somehow). It is from this other information the statistical syllogism works its magic. Let’s see why.

Notice that we have a “variable” in the premise, which we can replace with actual values. Try n = 1. It is doubtful a critic would object to the statistical syllogism in this case and claim we can say “nothing” about whether a Q will emerge.

Now let n be greater than 1 (and an integer). What of the other n-1 states we know are possibilities? These also have probabilities. Switch the conclusion to “A Q will not emerge.” The statistical syllogism would give (n-1)/n for the probability. Well, if that doesn’t seem true to you, then I can offer no proof, just as I can offer no proof that the probability for “A Q will emerge” is 1/n. Yes, the statistical syllogism is axiomatic (in part).

Technical Mumbo Jumbo

Some people—Jaynes, notably, Stove, Diaconis, others—have thought there was a proof of the statistical syllogism. All these attempts fail. The proofs rest on certain principles, like the “Principle of Indifference”, “Principle of Maximum Entropy”, or “Principle of Symmetry”, etc. etc. All of them reach a point where they make claims like “Pr(State i | Premise) = Pr(State j | Premise), i,j = 1,…,n” and where “Premise” is our new premise. Propositions like this are certainly true, and if you accept them (and have some mathematical training) you can easily see how acceptance leads to the statistical syllogism.

The difficulty comes in accepting “Pr(State i | Premise) = Pr(State j | Premise) etc.” Why should we? Where does this truth come from? Well, ( as I show in this paper) it must be axiomatic. Adding all the various principles of “indifference”, “symmetry” and so on only serve to make the arguments circular.

It is also to act subjectivity because we no longer have “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge” as our premise, but “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge and the principle of indifference” (or another principle). Now that and its brother principles just is to say “It is equally likely that each state should emerge.” Well, if we take as a premise that it is equally likely that each state should emerge, then it necessarily follows each state is equally likely. But the argument is circular. Instead, it is from the statistical syllogism we infer these principles.

We cannot say we have no reason to believe the probability of the conclusion. We have definite reasons, and these are the knowledge that there are n possible states of the process and that one of these is Q and that one of these (and none other) must show. This is a heckofa lot of information to have, and hence sufficient reason to believe the probability.

This is worth emphasizing because phrases like “no reason”, “ignorance”, and the like are often tossed around, especially when it comes to formal models. It is good to see up front what little epistemic value these have or how their use can be misleading.

End result? The objectivist must accept the statistical syllogism. The subjectivist may do what he likes. The frequentist must still sit patiently and wait for an infinite number of trials of this process to complete before telling us the probability.

More examples

Your homework (in the comments) is to give more good examples, or if you’re clever, examples which seem good but which aren’t.

“They’re coming to take me away, ha ha!”

Ireland’s Prime Minister Enda—that’s Enda, not Edna, the creature—Kenny is being snubbed by Cardinal Sean O’Malley. Reason? Kenny is being “honored” by Barely Catholic—a.k.a. BC, a.k.a. Boston College—because Kenny is trying to muscle Ireland into legalizing abortions for women who are “at risk” of suicide.

O’Malley is AC, or All Catholic, which means he thinks that the last thing anybody who supports abortion needs is a pat on the back. O’Malley is taking a page that had fallen out of Cardinal Dolan’s playbook (and was thought forever lost) and is giving Kenny the silent treatment to teach him a lesson. No, strike that. To teach others the lesson that there a limits.

Anyway, Kenny is pushing the “The Protection of Life During Pregnancy” bill, another item which proves the tired old joke, “How do you tell when a politician is lying? When he names bills which do the opposite of what they proclaim.”

Idea is that young Irish ladies are so upset at discovering themselves pregnant (science has not yet discovered how this happens) that they commit self-murder on a grand scale. Kenny’s bill would allow scalpel-wielding men—I almost said “doctors”—to kill the lives inside these women so the women don’t kill themselves. Now if these ladies do commit self-murder, they kill two lives and not just one. Thus Kenny’s proposal makes utilitarian sense, right?

How many lives will the new law save? According to Ireland’s The Star, the suicide rate for women is about 5 per 100,000. Accept that for the sake of argument.

A lot of numbers coming up, but stick with me, you’re going to like this.

There are around 4.5 million folks in Ireland. Assuming half are females, then there are 2.25 million potential female victims of self-murder. The Central Statistics Office helps us narrow that. It says (in 2011) there were about 280 thousand women between 15-24 and another 730 thousand between 25 and 44, which together are the most fecund years. Make it 1 million potential pregnancies per year.

The CSO counted just over 75 thousand births in 2010 (a number which dropped from the previous year). Twins and higher numbers of birth are relatively rare, so assume (for a first blush) that each of these births came from one lady. That gives a pregnancy rate of 75 thousand divided by 1 million, which is 0.075, or 7.5%.

Now it gets a little tricky. Recall the suicide rate was 5 per 100,000, or 0.005%. That makes for about 50 dead bodies among women of child-bearing age (if we assume the same rate applies to women of all ages). How many of these were pregnant? Well, obviously those ladies who committed self-murder didn’t give birth, so our pregnancy rate is slightly underestimated, but it’s still a good first guess. Applying it to the number of self-murdered (50 * 0.075), we get about 4 (rounded up, to account for our under-estimating the pregnancy rate).

That means roughly 4 Irish pregnant women kill themselves each year. How many of these left notes saying it was because they were pregnant that they did the deed? Surely not all of them. It’s just a guess, though a reasonable one given all the myriad causes of suicide, but let’s say one-quarter.

Then in Ireland about 1 lady a year kills herself because she is pregnant.

Kenny’s bill would allow this woman to apply to the government to restrain her and have some white-garbed character kill the life inside her. But since one life is saved and another is purposely killed, the whole thing would really be a wash.

Time to put on our crystal ball hats (well mixed metaphors are a pleasure). The numbers show that about one lady a year would approach whatever new bureaucracy is created by Kenny’s bill. How many readers—a show of hands, please—believe it will be just one?

Well, there will be variability plus the statistics above are all approximations, so suppose I erred by an order of magnitude and then, God help us, double the result it in Kenny’s favor, and that it will be around 20 and not 1 lady a year who will cry for help. How many think it will be 20?

How many think 100? 1,000? More? How many believe that there will be a sudden epidemic of “suicidal behavior” once Kenny’s bill passes?

How many believe that Kenny’s plan will turn Ireland into an island of fibbers?

The manly, conservative NFL player Evan Mathis demonstrates his dissatisfaction with big government.

As will come as absolutely no surprise to regular readers, but will be a traumatic revelation to the public at large, it has finally been acknowledged in the peer-reviewed literature that the more a man resembles John Wayne, that the larger the number of people who say as he passes, “Wasn’t that Chuck Norris!?”, that the more women who against their wills and better judgments swoon in this man’s presence, the more likely he is to embrace conservative principles.

This research was in the field of evolutionary psychology, it was peer-reviewed, and the proof entirely statistical, so you know it has to be true. So say Michael Bang! Petersen, Daniel Sznycer, and a couple of others in the infallible journal Psychological Science (emphasis added; Petersen’s middle name really is Bang).

I can confirm the results. Your own author—who is a soaring six-feet two-inches, two hundred pounds of hardened sinew and pure corded steely muscle, a man who can crack a walnut by blinking and whose five-o’clock shadow appears before the toast grows cold—is among the manliest of the male sex and, as the research suggested, conservative as hell.

You can confirm it, too. Just look what non-conservatives have on offer: Chris Matthews, the pudgy effeminate who admitted to going tingly after peeking at Barack Obama’s pants crease (or whatever), the cadaverous Alan Colmes who has to be duct-tapped to his studio chair lest a strong breeze blow him off set. Jimmy Carter. Pretty boy George Clooney. The meekly, mousy, mugging Jon Stewart. Harry Reid? Please.

And then, besides me, who else do we find on the right? Clint Eastwood, baby. Ronald Reagan and Charlton Heston, two dead guys who are still manlier than half the Senate. Bret Baier at Fox; what was he, a linebacker? Arnold Schwarzenegger is a registered Republican. The Geo. Bushes were fighter pilots. And just you take a poll in any major league locker room asking whether the mountainous occupants prefer more or less government.

According to the Daily Mail summary, docile, flat-chested males are “more likely to support the welfare state and wealth redistribution”. You bet they are. That can’t take on the larger challenges, so they plead to be given. And that’s fine, because two trademarks of conservatism are generosity and bigheartedness. We’re pleased to oblige.

That’s not me speaking, that’s science. Yes, this is the way we evolved on the African veldt. And therefore there’s nothing to do be done about it. The awesomeness differential is built into nature. Some of us will be big and mighty, others will listen to NPR and never learn what hockey is. This is the Way Things Are. This is tough luck for a lot of you, which might explain why those who can’t bench press their own weights are always going on about fairness.

Listen sugars: if you still believe in fairness at your age, your mother failed in her job. We all have our crosses to bear; it’s just that some of us are quieter about it than others. Which reveals another conservative principle: fortitude. That also spells manliness, which is now no surprise.

The paper is more nuanced than the summary given here, sometimes to the point of absurdity. But that’s because academics can’t resist lathering thick coatings of theory on everything they touch (some jargon about “asymmetric war of attrition” “theory”).

And they only distinguished conservativeness by attitude towards redistribution. That’s always a mistake because manly men (a.k.a. conservatives) are happy to give generously and with love to those truly in need. Yet both charitable duty and confiscatory taxation are called “redistribution.” Conservatives don’t want to give their money to pusillanimous politicians and bloodsucking bureaucrats who’d only use the proceeds to lavish gifts upon themselves and fund the breeding of more of their own kind.

I know a lot of progressives read this site. Never you fret, weaker brothers. We conservatives are here to protect you. The manliness of one conservative (the paper proves this) is more than enough for a passel of progressives. You come right over here and stand behind us and we’ll save you from those who would take by force what is rightfully yours.

——————————————————————

Thanks yet again to Al Perrella and K.A. Rodgers who alerted us to this topic.

Smokestack Seppuku

Whatever you do, don’t look at that smokestack! Do, and next thing you know you’ll be drawing a knife across your throat. Suicide!

Or so says the press and John G. Spangler, M.D., M.P.H., a professor of family medicine at Wake Forest Baptist. The good doctor used a statistical model to “prove” that if you live in a North Carolina county which has a coal-fired power plant, the chance you will kill yourself—from disgust, despair, or moral desuetude we never learn—increases significantly over a county which has none.

Yes, and if you live in a county which has two such power plants, look out! Death lurks on every erg. That you should find yourself surrounded by three such institutions does not bear thinking. Yet we must and will.

Peer-Review Strikes Again

Spangler’s peer-reviewed findings appeared in the Journal of Mood Disorders with title “Association of Suicide Rates and Coal-Fired Electricity Plants by County in North Carolina.”

Bucking the trend in enlightened morals, Spangler starts his paper by claiming “Suicide is a tragedy”. He also admitted that environmental pollution is not “commonly thought of as relating to suicide.” (And for good reason, as we shall see.)

“It is hypothesized that suicide is related to having a coal-fired plant in a county, acting as a substitute measure of air pollution.” How do these ordinarily life-giving buildings (try living in North Carolina without air conditioning) encourage dark forces? Possibly by causing “abnormal cognition, neurological development or degeneration” and lowering “overall life satisfaction” you see.

Statistics To The Rescue

Here is what Spangler did. He gathered county-level suicide rates and various demographics, such as percent whites, median income and the like, and counted the number of coal-fired power plants. He also took genuine air-quality measurements of metals and other pollutants, which was wise. He then “regressed”, i.e. used an unnecessarily complicated statistical model, the suicide rate and the other variables together.

None of the variables except percent whites, median age, and number of coal-fired power plants were “significant.” Spangler claimed that for every increase of one plant the suicide rate increases by about 2 per 100,000. This led Spangler and the press to conclude, as summarized for instance in Scientific American, “that county suicide rates correlated very predictably with the number of coal-fired electricity plants within said county.”

The flaw should already be obvious, and glaringly so, to those who know statistics. For those who don’t, stick around.

Even accepting the (hidden as yet) fallacy, there were some oddities about Spangler’s work that jumped out. He claimed that in North Carolina “sixteen [counties] had one plant; three had 2 plants (Gaston County, Halifax County, and Robeson County); and one had 3 plants (Person County)” This is 20 counties with 16 + 6 + 3 = 25 plants, which means 80 counties did not have any coal-fired power plants (NC has 100 counties).

Let’s Try This Ourselves

Spangler did not list the sixteen counties with just one plant. However, Sourcewatch a most progressive organization, has a list which appears complete, and from these we can infer the missing counties. See the tables below for details.

The suicide rates per country were also not in Spangler’s paper, but the CDC: 2003-2010 Final Data has them.

Here is a plot of number of coal-fired power plants by the the county-level suicide rate.

Smokestack suicides

The median suicide rate for counties empty of coal-based electricity was 12.9, which was the identical rate in counties which had one plant. For those three—and only three—counties which had two plants, the median was 11.3. In the one county which had three plants, the rate was 10.6.

The green line is the “regression” of these two variables, which seems to indicate that increasing the number of plants decreases suicide rates, the exact opposite conclusion of Spangler’s. Seems that adding coal plants is good for you!

Statistics Are Scary For Good Reason

How can this be? Easy. For one, Spangler’s data could be slightly different because suicide rates change from year to year (my rates are aggregated from 2003-2010, and Spangler says his are from 2001-2005). But if that’s true, and because the number of coal plants in each county hadn’t changed, it means the data is too variable to draw any conclusions. It’s also suspicious Spangler doesn’t have a plot like this in his paper.

For another, regression does funny things to data, making lines which should go down, mysteriously go up. Especially when you toss an enormous number of variables at it hoping something will stick. And the more variables you throw, the more likely something will stick, even absurd things. Note that none of the actual environmental variables Spangler used showed up. These are the variables which could actually influence health, and yet all were unimportant.

The model itself is silly: there are only three counties with two plants, and one with three, yet Spangler (and I above) drew a regression line over this wee sample. But the mathematics doesn’t know this, so it will give a result. My green line is just as absurd as Spanger’s: there just isn’t enough data about increasing the number of plants to say anything cogent.

The Fallacy Revealed

And then there’s the fallacy hinted at above. It occurs when people infer individual-level conclusions from aggregate data. Something, or many various things, caused the differences in suicides between counties, but it does not follow that because a correlation was found in a statistical model that the variable identified had any causative effect.

If that were so, then moving to a county which had a higher proportion of whites or older folks would increase your suicide risk. That is obviously ridiculous, but if we follow the press reports and Spangler’s breathless intimations, that is the conclusion we would reach.

We should be especially suspicious here because no pollutants were noted, nor were any of the other demographic variables, like income and education. The county-level is just too crude a scale to be useful. The many journalists who picked up this story should have recognized this, as should have Spangler: a simple plot (like the one here) would have showed him his task was futile.

Appendix

Tables of the data in Spangler’s paper, given in case my counts differ from his. The suicide rates for counties with no plants were taken from the CDC. Semora is an unincorporated town located partly in Caswell county and partly in Person county, which I assigned to Person so that it had 3 plants as indicated by Spangler.

Counties with one plant.
County (City) Rank Suicide rate
Haywood (Canton) #11 18.2
Rowan (Salisbury) #24 15.9
Brunswick (Southport) #38 14.2
Catawba (Terrell) #40 14.0
Rockingham (Eden) #39 14.0
New Hanover (Willmington) #45 13.1
Cleveland (Mooresboro) #47 13.0
Buncombe (Arden) #48 12.9
Wayne (Goldsboro) #51 12.8
Edgecombe (Battleboro) #52 12.3
Lenoir (Kinston) #58 11.5
Forsyth (Belews Creek) #64 11.1
Orange (Chapel Hill) #67 10.7
Chatham (Moncure) #77 9.8
Washington (Plymouth) #96 7.7
Counties with two plants.
County (Cities) Rank Suicide rate
Gaston (Mount Holly, Belmont) #30 15.3
Halifax (Weldon, Roanoke Rapids) #61 11.3
Robeson (Lumberton x 2) #66 11.0
County with three plants.
County (Cities) Rank Suicide rate
Person (Roxboro x 2, Semora) #69 10.6

The remaining 80 counties had suicide rates from 26.0 to 4.4.