Statistics

The Epidemiologist Fallacy Strikes Again: Premature Birth Rate Edition

Hypothesis testing leads to more scientific nonsense than any other practice, including fraud. Hypothesis testing, as regular readers know, cannot identify cause. It conflates decision with probability and leads to vast, vast over-certainties.

Why is it so liked? Two reasons. One, it is magic. When the wee p shows itself after the incantation of an algorithm, it is as if lead has been transmuted into gold, dross into straw. Significance has been found! Two, it saves thinking. Wee ps say are taken to mean the cause—or “link”, which everybody takes as “cause”—that was hoped for has been certified.

What is “significance”? A wee p. And what is a wee p? Significance.

And that is it.

Here’s the headline: Premature Birth Rates Drop in California After Coal and Oil Plants Shut Down: Within a year of eight coal- and oil-fired power plant retirements, the rate of preterm births in mothers living close by dropped, finds new study on air pollution..

Shutting down power plants that burn fossil fuels can almost immediately reduce the risk of premature birth in pregnant women living nearby, according to research published Tuesday.

Researchers scrutinized records of more than 57,000 births by mothers who lived close to eight coal- and oil-fired plants across California in the year before the facilities were shut down, and in the year after, when the air was cleaner.

The study, published in the American Journal of Epidemiology, found that the rate of premature births dropped from 7 to 5.1 percent after the plants were shuttered, between 2001 and 2011. The most significant declines came among African American and Asian women. Preterm birth can be associated with lifelong health complications.

Now this is a reporter, therefore we cannot expect her to know not to use causal language. The peer-reviewed study is “Coal and oil power plant retirements in California associated with reduced preterm birth among populations nearby” by Joan Casey and six other women.

The journal editors, all good peer reviewed scientists, surely know the difference between cause and correlation though, yes?

No. For in the same issue the paper ran appeared an editorial praising the article in causal terms. The editorial was from Pauline Mendola. She said, “We all breathe.”

Who knew?

She also said “Casey and colleagues have shown us that retiring older coal and oil power plants can result in a significant reduction in preterm birth and that these benefits also have the potential to lower what has been one of our most intractable health disparities.”

Casey did not show this. Casey found wee p-values in (we shall soon see) an overly complicated statistical model. Casey found a correlation, not a cause. But the curse of hypothesis testing is that everybody assumes, while preaching the opposite, that correlation is causation.

Onto Casey.

One would assume living near power plants, and even near recently closed power plants, we’d find folks unable to afford the best obstetrical services, and that we’d also find “disparities”—always a code word for differences in races, etc. So we’d expect differences in birthing. That’s causal talk. But with excellent evidence behind it.

Casey’s Table 1 says 7.5% of kids were preterm whose mothers’ address was near a power plant. They called this address the “exposure variable“. These power plants were all over California (see the news article above for a map).

Casey & Co. never measured any effect of any power plant—such as “pollution” or PM2.5 (i.e. dust), or stray electricity, or greater power up time, or etc. Since Casey never measured anything but an address, but could not help but go on about pollution and the like, the epidemiologist fallacy was committed. This is when the thing blamed for causing something is never measured and when hypothesis testing (wee p-values) are used to assign cause.

Anyway, back to that 7.5% out of 316 births. That’s with power plant. Sans plant it was 6.1% out of 272. Seems people moved out with the plants. But the rate did drop. Some thing or things caused the drop. What?

Don’t answer yet. Because we also learn that miles away from existent plants the preterm rate was 6.2% out of 994, while after plant closings it was 6.5% out of 1068. That’s worse! It seems disappearing plants caused an increase in preterm babies! What Caly needs to do is to build more plants fast!

Dumb reasoning, I know. But some thing or things caused that increase and one of the candidates is the closed plants—the same before and after.

So how did Casey reason it was plant removal that caused—or was “linked” to—preterm babies to decrease? With a statistical model (if you can find their paper, see their Eq. [1]). The model not only included terms for plant distance (in buckets), but also “maternal age (linear and quadratic terms), race/ethnicity, educational attainment and number of prenatal visits; infant sex and birth month; and neighborhood-level poverty and educational attainment.”

Linear and quadratic terms for mom’s age? Dude. That’s a lot of terms. Lo, they found some of the parameters in this model evinced wee ps, and the rest of the story you know. They did not look at their model’s predictive value, and we all know by now that reporting just on parameters exaggerates evidence.

Nevertheless they concluded:

Our study shows that coal and oil power plant retirements in California were associated with reductions in preterm birth, providing evidence of the potential health benefits of policies that favor the replacement of oil and coal with other fuel types for electricity generation. Moreover, given that effect estimates were stronger among non-Hispanic Black women, such cleaner energy policies could potentially not only improve birth outcomes overall but also reduce racial disparities in preterm birth.

Inappropriate causal language and politics masked as science. Get ready for a lot more of this, friends.

Categories: Statistics

9 replies »

  1. Another example of statistics don’t lie, only statisticians (and in this case it appears to be some amateur statisticians).

  2. That guy, Random Chance, gets around! Busier than a Gremlin he is. Yet for all his work, it is not of Significance.

  3. Running the numbers, 4 fewer premature births were recorded near the plant, but 3 more were recorded further away. A total change of 1 in a population of over 1,300 is statistically insignificant.

  4. I stopped believing epidemiologists years ago for several reasons. They believe you can prove causality with statistics, they believe confidence intervals are important and they extrapolate from the sample population to the whole population of the country. Years ago I had a math professor that used to warn us students that the only thing more risky than extrapolation is predicting the future. BTW, that’s a joke.

  5. I don’t suppose they looked at other regions where there never had been such plants or at regions where plants had not been closed. If other regions had seen similar changes, the cause could not be a special one, but a common cause affecting multiple regions, like improvements in prenatal care.

  6. I received a “You do not currently have access to this article” message when I tried to look at the original paper. But Matt is a very reliable reporter when it comes to research papers like this (he’s seen thousands, and been involved in at least dozens).

    So, taking Matt at his word as to its contents, I want to remark on how extraordinarily typical the flaws of this paper are within epidemiology.

    1. The authors in this study, for example, are careful not to use the word ’cause’. (All epidemiologists are taught to studiously avoid that term by the first few pages of their Epi 101 text). Thus they don’t say that an address is a cause of preterm birth. But then they act as if it is! They act (they write their discussion section) as if a (dubiously) calculated ‘association’ is a cause, without showing anything of the sort. This is typical within epidemiology.

    2. They use a proxy as the fundamental measurement. The proxy is not argued for, let alone established as a good one. We are to use our imaginations as to why that proxy is the right one. Any distortions or other errors introduced by the use of this particular proxy are not even alluded to, let alone calculated. Thus the error of conflating a calculated ‘association’ with a cause is multiplied by an unknown amount when the authors ignore that any ‘association’ they have calculated is entirely an association with a proxy merely. All this is typical within epidemiology.

    3. As Matt noted, to the authors, it is as if predictive statistics does not even exist. They’ve never heard of it. They assume that calculating parameters is itself the prediction. Completely typical within epidemiology, again.

    4. You can tell this study is “serious” because they examined possible confounders and included them in their model. I want to emphasize how standard this is within epidemiological studies. Like so many other authors, these researchers introduce parameter after parameter to avoid being criticized for neglecting other possible ‘associations’ to pre-term birth. But as Matt wrote, “That’s a lot of terms”.

    And it’s a two-edged sword: you ‘account’ for the confounders, by introducing more parameters — which, as Matt has taught us, multiplies the uncertainty in the results. And (Point 3, again), no effort is made to ‘integrate out’ all these parameters, either.

    And why not one million confounders in their model, instead of the paltry dozen or so that they thought “relevant”?

    And yet, as “Ye Olde Statistcian” points out, what about obvious confounders that they have indeed left out — not even considered?

    5. “The Epidemiologist Fallacy” is that epidemiological studies can identify causal associations between risk factors and disease. In Epi 101, students learn both that you can’t do that — and that you can. Epidemiologists are taught to religiously avoid the word ’cause’ and to religiously use the word ‘association’ instead, but out of the other side of their mouth, they congratulate themselves on finding ‘causal associations’.

    Maybe Matt is right, and epidemiology as a field is founded on a fallacy. I don’t know if I want to go so far as to say, let’s eliminate them all. I still want them around to investigate Hep A and cholera and Ebola instances.

    But sometimes, I’m tempted to agree with Matt.

  7. My old boss, Ed Shrock, used to say that you have not established a cause until you can turn the problem off and on several times.

  8. Since I am a jobbing clinician with some ability to crunch numbers, you miss another point. This paper has a historical control. This is not serial case control studies. There is no there is no attempt to show that (a) this is not a usual variation (b) this rate of miscarriage is higher. A variation from 5.1 to 7 percent is an odds ratio of 1.37 (I’d probably report that at 1.4) and for me to consider that there may be a causal link from a correlation I would want an Odds Ration of at least four from a case control study (well done) and much higher from something like this.

    Half the time statistical significance is a consequence of a large population. It has little to do with clinical relevance.

    The trouble is that you cannot turn miscarriage on and off. We would never deliberately expose women to something that we think could cause miscarriages, for ethical reasons: we don’t let pregnant women into clinical trials of medications for that reason.

    So you need overwhelming indirect evidence to deduce a causal link. Koch would call that weak, and he would be correct. The trouble is that the editors were seduced my random noise that aligned with a political point they want to make.

  9. Pukeko, as an OB/Gyne I could not agree more! We have to look at all such data with skepticism. However, there is a lot of what you speak going on in Medicine today and it saddens me a great deal.

Leave a Reply

Your email address will not be published. Required fields are marked *