This is a modified, shortened chapter from my upcoming book Too Damn Sure. At least, I hope it’s forthcoming.
Marchmain Pharmaceuticals distributes profitizol, said to cure the screaming willies. To assess that claim, Marchmain assembled volunteers suffering this dread malady. Each provided his address, which was used to count the number of pharmacies within a five-mile radius of where each person lived.
Most pharmacies sell profitizol, and the more pharmacies near a volunteer the more opportunities there were for volunteers to purchase profitizol, and then eat it. Marchmain asked a statistician whether living near more pharmacies correlated with being cured, which resulted in the announcement, “Taking Profitizol Increases Chance of Screaming Willies Cure.” Marchmain’s stock subsequently soared.
I know what you’re thinking: Surely nobody would claim that merely living near a pharmacy is proof that profitizol “works.” That’s where you’re wrong, friend. Not only do people routinely make claims of this very sort, entire scientific fields and numerous government bureaucracies positively rely on this trick for their existences.
The epidemiologist fallacy occurs when an epidemiologist says or implies X causes Y, but when the epidemiologist never actually meets, measures, or monitors X, though everybody pretends he has.
The “X”s and “Y”s are placeholders, stand-ins for common English propositions like X = “It is cloudy” and Y = “It is raining.” If X is correlated1 with Y, it means our understanding of the values (states) of Y change according to changes in the values (states) of X. If X = “It is cloudy” it is likely to Y = “rain”, and vice versa. Cloud cover and rain are correlated. But if, no matter what value X takes, our uncertainty in Y remains unchanged, then X is uncorrelated with Y; knowing X is irrelevant to our knowledge of Y; classical statisticians say X is independent of Y.
For Marchmain, Y = “Person cured of the screaming willies” and X = “Person ate profitizol.” But while Y was measured, X was never observed. Yet Marchmain still announced that X was correlated with Y. How?
Via the epidemiologist fallacy. Marchmain invented a W, which is not X, but which was kinda sorta like X—well, loosely like X, in a vague way, if you squinted—-and then swapped X for W. The statistician modeled the correlation between W and Y, but announced that this correlation was between X and Y. W was forgotten. Since everybody wanted news of X and Y, they failed to see it was W and not X over which the fuss is being made. Government grants were awarded.
Global warming causes cataracts in babies
The peer-reviewed paper “A Population-Based Case-Control Study of Extreme Summer Temperature and Birth Defects” appeared in the journal Environmental Health Perspectives (2012 October; 120(10): 1443–1449) by Alissa Van Zutphen et alia. It purportedly investigated birth defects in New York residents (the Y) and heat waves during pregnancy (X), which were claimed to increase in frequency and severity once global warming finally strikes. “We found positive and consistent associations between multiple heat indicators during the relevant developmental window and congenital cataracts [in newborns]”. Various statistical measures of correlation were attested to, and if the reader wasn’t careful she would decide to stay out of the heat lest her unborn child develop congenital cataracts.
But exposure of women to heat during their “relevant development windows” was never measured on any woman. There was no X. But there was a W: the daily air temperature at “18 first-order airport weather stations”. Women were assigned the temperature at the stations closest to where they listed their residence at the time of birth for just those days thought to be crucial to fetal development. Nobody knows where the women actually were during these days: it may have been near the assigned airport, or it could have been Saskatchewan, or perhaps in some cool building (“we were unable to incorporate air conditioner use data”). This paper was taken seriously by the press. More research is needed.
Fourth of July parade attendance turns people into Republicans
Harvard Kennedy School Assistant Professor David Yanagizawa-Drott and Bocconi University Assistant Professor Andreas Madestam wondered how it could be that so many innocent Americans turned into Republicans (their Y). They suspected Fourth of July parade attendance (X). Exposure to raw, unfiltered patriotism would take its inevitable toll and cause people to turn wistful at the mention of Ronald Reagan. They speculated, “Fourth of July celebrations in the United States shape the nation’s political landscape by forming beliefs and increasing participation, primarily in favor of the Republican Party.”
It was widely reported that X caused Y. Only it wasn’t so. Yanagizawa-Drott and Madestam instead created a W. They gathered precipitation data from 1920-1990 in towns where study participants claimed to have lived when young. If it rained on the relevant Fourths of July, the authors claimed the participants did not go to a parade, because they assumed all parades would be canceled. If it did not rain, they claimed participants did go to a parade, because all towns invariably have parades on clear days, and if there is a parade one must attend. Nowhere was actual parade attendance (X) measured. And just think: if their hypothesis were true, San Francisco would be teeming with Republicans because it almost never rains there on the Fourth of July.
Air pollution causes heart disease
Yours Truly was involved in a critique of a study submitted to the California Air Resources Board (CARB) which claimed to have discovered a correlation between air pollution (X; particulates of a certain size) and heart disease (Y). A weak, barely there finding of statistical “significance” was enough to embolden CARB to create new and enhance old air pollution regulations in order to “save lives.” Yet X was never measured.
At a very few places, particulate measures were taken for a limited time. The air pollution in these places was then crudely extrapolated to areas in which it was not measured. Finally, the extrapolated air pollution nearest the address of the study participants (where they lived at one time, ignoring moves) was taken as the exposure; this was their W. Nobody knows how much air pollution anybody was actually exposed to.
The sequel to this story is fascinating. I submitted written critiques where they were discussed at a CARB meeting. One panel member thanked me, called me learned, and took my criticisms of the epidemiologist fallacy seriously. But it was judged that—and here you must laugh—because the fallacy was so common that it led to many results referenced by CARB, that this current study was no different. And therefore acceptable.
We could continue examples indefinitely. It has become rare not to see it used, particularly in faddish research, such as the evils that awaits us when global warming eventually strikes.
Scientists get away with this fallacy because often W and X are correlated themselves, or are thought or hoped to be. Yet logic insists we must necessarily be less sure of the relationship of X and Y than we are of W and Y. If W is correlated to X, which in turn is correlated to Y, then W will be correlated to Y. But the correlation between W and Y is not the same as that between X and Y. Overconfidence abounds.
It’s easy write papers invoking the epidemiologist fallacy. All it takes is an afternoon, a computer, and a wild theory. And with the brutal competition to publish, to be cited, and to win grants, there just isn’t any way to stop researchers misbehaving. Your only hope is to cease blindly trusting science reporting and government agencies.
And now my favorite line, beloved by activists everywhere: it’s worse than we thought! Not only do scientists incorrectly swap W for X, sometimes a Z and not a Y is measured. Yet the story is still X causes Y. Sociologists have the most fun with this. Take the peer-reviewed article “Red Light States: Who Buys Online Adult Entertainment?” (Journal of Economic Perspectives, Volume 23, Number 1, Winter 2009, Pages 209–220) by Benjamin Edelman at the Harvard Business School who claimed red states consume more pornography than blue states. He implied conservatives are naughtier than progressives. Yet no individual’s consumption or political views were ever measured.
1Statisticians have a formal definition of correlated which I do not use here. Correlated merely means what it says above: an association.