Whatever you do, don’t look at that smokestack! Do, and next thing you know you’ll be drawing a knife across your throat. Suicide!
Or so says the press and John G. Spangler, M.D., M.P.H., a professor of family medicine at Wake Forest Baptist. The good doctor used a statistical model to “prove” that if you live in a North Carolina county which has a coal-fired power plant, the chance you will kill yourself—from disgust, despair, or moral desuetude we never learn—increases significantly over a county which has none.
Yes, and if you live in a county which has two such power plants, look out! Death lurks on every erg. That you should find yourself surrounded by three such institutions does not bear thinking. Yet we must and will.
Peer-Review Strikes Again
Spangler’s peer-reviewed findings appeared in the Journal of Mood Disorders with title “Association of Suicide Rates and Coal-Fired Electricity Plants by County in North Carolina.”
Bucking the trend in enlightened morals, Spangler starts his paper by claiming “Suicide is a tragedy”. He also admitted that environmental pollution is not “commonly thought of as relating to suicide.” (And for good reason, as we shall see.)
“It is hypothesized that suicide is related to having a coal-fired plant in a county, acting as a substitute measure of air pollution.” How do these ordinarily life-giving buildings (try living in North Carolina without air conditioning) encourage dark forces? Possibly by causing “abnormal cognition, neurological development or degeneration” and lowering “overall life satisfaction” you see.
Statistics To The Rescue
Here is what Spangler did. He gathered county-level suicide rates and various demographics, such as percent whites, median income and the like, and counted the number of coal-fired power plants. He also took genuine air-quality measurements of metals and other pollutants, which was wise. He then “regressed”, i.e. used an unnecessarily complicated statistical model, the suicide rate and the other variables together.
None of the variables except percent whites, median age, and number of coal-fired power plants were “significant.” Spangler claimed that for every increase of one plant the suicide rate increases by about 2 per 100,000. This led Spangler and the press to conclude, as summarized for instance in Scientific American, “that county suicide rates correlated very predictably with the number of coal-fired electricity plants within said county.”
The flaw should already be obvious, and glaringly so, to those who know statistics. For those who don’t, stick around.
Even accepting the (hidden as yet) fallacy, there were some oddities about Spangler’s work that jumped out. He claimed that in North Carolina “sixteen [counties] had one plant; three had 2 plants (Gaston County, Halifax County, and Robeson County); and one had 3 plants (Person County)” This is 20 counties with 16 + 6 + 3 = 25 plants, which means 80 counties did not have any coal-fired power plants (NC has 100 counties).
Let’s Try This Ourselves
Spangler did not list the sixteen counties with just one plant. However, Sourcewatch a most progressive organization, has a list which appears complete, and from these we can infer the missing counties. See the tables below for details.
The suicide rates per country were also not in Spangler’s paper, but the CDC: 2003-2010 Final Data has them.
Here is a plot of number of coal-fired power plants by the the county-level suicide rate.
The median suicide rate for counties empty of coal-based electricity was 12.9, which was the identical rate in counties which had one plant. For those three—and only three—counties which had two plants, the median was 11.3. In the one county which had three plants, the rate was 10.6.
The green line is the “regression” of these two variables, which seems to indicate that increasing the number of plants decreases suicide rates, the exact opposite conclusion of Spangler’s. Seems that adding coal plants is good for you!
Statistics Are Scary For Good Reason
How can this be? Easy. For one, Spangler’s data could be slightly different because suicide rates change from year to year (my rates are aggregated from 2003-2010, and Spangler says his are from 2001-2005). But if that’s true, and because the number of coal plants in each county hadn’t changed, it means the data is too variable to draw any conclusions. It’s also suspicious Spangler doesn’t have a plot like this in his paper.
For another, regression does funny things to data, making lines which should go down, mysteriously go up. Especially when you toss an enormous number of variables at it hoping something will stick. And the more variables you throw, the more likely something will stick, even absurd things. Note that none of the actual environmental variables Spangler used showed up. These are the variables which could actually influence health, and yet all were unimportant.
The model itself is silly: there are only three counties with two plants, and one with three, yet Spangler (and I above) drew a regression line over this wee sample. But the mathematics doesn’t know this, so it will give a result. My green line is just as absurd as Spanger’s: there just isn’t enough data about increasing the number of plants to say anything cogent.
The Fallacy Revealed
And then there’s the fallacy hinted at above. It occurs when people infer individual-level conclusions from aggregate data. Something, or many various things, caused the differences in suicides between counties, but it does not follow that because a correlation was found in a statistical model that the variable identified had any causative effect.
If that were so, then moving to a county which had a higher proportion of whites or older folks would increase your suicide risk. That is obviously ridiculous, but if we follow the press reports and Spangler’s breathless intimations, that is the conclusion we would reach.
We should be especially suspicious here because no pollutants were noted, nor were any of the other demographic variables, like income and education. The county-level is just too crude a scale to be useful. The many journalists who picked up this story should have recognized this, as should have Spangler: a simple plot (like the one here) would have showed him his task was futile.
Tables of the data in Spangler’s paper, given in case my counts differ from his. The suicide rates for counties with no plants were taken from the CDC. Semora is an unincorporated town located partly in Caswell county and partly in Person county, which I assigned to Person so that it had 3 plants as indicated by Spangler.
|County (City)||Rank||Suicide rate|
|New Hanover (Willmington)||#45||13.1|
|Forsyth (Belews Creek)||#64||11.1|
|Orange (Chapel Hill)||#67||10.7|
|County (Cities)||Rank||Suicide rate|
|Gaston (Mount Holly, Belmont)||#30||15.3|
|Halifax (Weldon, Roanoke Rapids)||#61||11.3|
|Robeson (Lumberton x 2)||#66||11.0|
|County (Cities)||Rank||Suicide rate|
|Person (Roxboro x 2, Semora)||#69||10.6|
The remaining 80 counties had suicide rates from 26.0 to 4.4.