A Common, Unfortunate, Avoidable, Devastating Error In Statistics

Smilin' Joe demonstrates our fallacy.
Smilin’ Joe demonstrates our fallacy.

It’s a doozy, this error of ours. So ubiquitous is it that it’s hardly noticeable. Yet it is sinking us into scientism and wild overconfidence.

Every time it appears, both the public and scientists themselves become a tiny bit more over-enamored of science, giving it more honor than it deserves. The effect of any one appearance of the error is small, scarcely noticeable. But when it is repeated ad nauseam the product is deadly to clear thinking.

Of ado, no more. Here’s an example: “conservatives demonstrate stronger attitudinal reactions to situations of threat and conflict. In contrast, liberals tend to be seek out novelty and uncertainty.”

Did you see it? Maybe not. If you thought the corruption lay in the subject matter of the proposition itself, you were understandably wrong. The quote was taken from the peer-reviewed paper “Red Brain, Blue Brain: Evaluative Processes Differ in Democrats and Republicans” by Darren Schreiber and several others in PLOS One1.

What you thought was the main error was instead yet another in a long and growing line of misguided, probably ideologically but unconsciously motivated attempts to demonstrate to the level of satisfaction required by progressive academics that conservatives are biologically different than they are.

Need a hint about the bigger error? Here’s another example, culled from the same paper: “Republicans and Democrats differ in the neural mechanisms activated while performing a risk-taking task.”

Have it yet? Not the content. After an incredible amount of statistical manipulation, such that we can’t really be sure of what we’re seeing, the authors discovered that slightly more registered Democrats had high (statistically derived) activity in their left posterior insula than did registered Republicans (groups which they later re-labeled as liberals and conservatives).

From this, I remind us, they concluded that Republicans and Democrats differed.

This is false. They did not differ; or, at least, not all of them did. Only just enough differed to (after scads of manipulation) provide a wee p-value. But because all of them did not differ, and there is no reason to suppose that in new batches of registered party members, all of them will differ either. The statement is false.

Nor did, as cited above, “conservatives demonstrate stronger attitudinal reactions to situations of threat and conflict” than liberals. Leaving aside the soaring ambiguity in measuring political attitude and the even greater hand waving in defining “situations of threat and conflict”, the statement is still false. It was only found that slightly more “conservatives” than “liberals” answered some questions one way rather than another.

You must have it by now. The error is Irresponsible Exaggeration, which leads inevitably to Gross Over-Certainty. It is a crude mistake, common among the untrained and ill educated (reporters, etc.), and should be rare among scientists, but it increasingly isn’t, as our examples prove (here are many more).

It is now (near?) impossible to read any public report of research without this error—let us call it the Statistical Exaggeration Fallacy. Reports are lazy, harried, or not intelligent enough to realize they are making the mistake. But it’s surprising that it is never corrected by scientists.

Now as proved here, the purpose of statistics is not to say anything about what happened in a particular experiment, but what that experiment might mean in the future. The future must necessarily be less certain than the past, where the experiment lives (proved here). And not only that, it is a consequence of the crude statistical methods used by researchers, but their results are even less certain than implied even without the Statistical Exaggeration Fallacy (are all Republicans “conservatives”?).

I mean, relying on p-values already guarantees over-certainty, which is multiplied in the presence of the SEF. And by the presence of over-extended definitions, like calling Republicans “conservatives”, and conflating the answers on some questionnaire with some deep-seated and real psychological tendency.

Your help needed

What I’d like you to do, sisters and brothers, when you have the time, is to note in the comments whenever you see an instance of the Statistical Exaggeration Fallacy. It is well to have a large, contemporaneous collection of these to prove my claim of its non-rarity.

Of its harmful effect, well, if it is not obvious to you, it will be after you read the examples.

——————————————————————————

1A curiosity of this journal. They put the Results and Discussion before the boring, who-really-needs-to-read-it Methods section, which appears at the bottom.

35 Comments

  1. Sheri

    “they support recent evidence that conservatives show greater sensitivity to threatening stimuli”—wouldn’t that mean conservatives who actually did show this tendancy would have a far greater chance of survival since they recognize threats earlier? Instead of waiting for the tiger to come in and eat villagers, they would go out and find the tiger and dispense with it? Democrats demonstrating the claimed response would start a # campaign to plead with the tigers to stay out of the village.

    Will look for example of the fallacy and post them later.

  2. Using the comments here as our sample it seems clear so far that “conservatives are quicker than liberals at turning everything into a game of political one-upmanship”.

    But thanks, Briggs, for pointing out a very important real issue here – though I wouldn’t call the confusion between group and individual differences a matter of “exaggeration” exactly. No matter how carefully one tries to express a claim about population differences, the risk of feeding prejudice about individuals is always substantial – so much so that I think there may be many true statements that would be best left unsaid.

  3. Sheri

    Alan: There’s one comment besides yours. Is that your “sample”? What is “political one-upmanship” because I don’t see that relates to the one sample you seem to be referencing.

  4. DAV

    All republicans are Red and democrats are Blue.
    Does this mean republicans are “hot” and democrats are “sad”? 🙂

    Maybe this qualifies for the list: Forget Red State, Blue State: Is Your State “Tight” or “Loose”?. Whatever “tight” and “loose” mean. When I was growing up some people had to get “tight” to be “loose”. Alabamans are “tight” and Marylanders not so much? Ha.

    In fairness, the article does say what they might mean.

  5. Ye Olde Statisician

    The phenomenon is related to the fallacy of reification. This means taking a box with a label on it and imagining that the box is a real thing and the items in the box take their characteristics from their being in the box. That is, there is a real substance called a “liberal” and not simply a heterogeneous agglomeration of people who at this point in time share a number of opinions on certain issues.

    It was hard enough sifting this when the objects of interest were inanimate objects. At least the output of a draw-and-iron machine really can be regarded as a population and can profitably be compared with the outputs of other such machines. But one still must be aware that when the two product streams differ, it is not due simply to the fact that one is Machine A and the other is Machine B.

  6. @YOS, While I suspect that many of us are guilty from time to time of “reification”, I also suspect that many of the accusations of that sin arise from the accuser’s unfounded belief that he or she knows what the accusee is thinking.

    @Briggs, one can also be guilty of the SEF (ie interpreting a statistical claim as having implications about particular instances) in its contrapositive form -eg by taking one instance of a blue-eyed person with dark hair as disproving the claim that blue eyed people have, on average, lighter hair colour than average – or by taking a few years of slightly lower temperature as proof that there is not a long-term increasing trend.

  7. JH

    But… what the researchers really mean is, for simplicity, that there is a difference between conservatives and liberals in their mean/average score of a variable. Just like, on average, males are taller than females. So could it be possible that such reporting is a standard practice because the meaning is taken for granted?

    Now as proved here, the purpose of statistics is not to say anything about what happened in a particular experiment, but what that experiment might mean in the future.

    Proved?

    The purposes of statistics include, but not limited to, making inference about the future and the unknown by generalizing from finding of data collected. Proved… because I say so? No proved needed, this is what statistics is!

  8. JH

    They …, who-really-needs-to-read-it Methods section, which appears at the bottom.

    A bit disappointing to hear you say this. Well, I am a statistician.

  9. max

    I get amazed by the vargarities of political attitude tests. I took two this week as time kills and one rated me a medium anti-authoritarian while the other a solid statist. The difference was that the first asked if the government should do X while the second asked if doing X was good. When slight differences in the way questions are asked can make such a big difference the tests are not useful for much.

  10. Ye Olde Statisician

    the accuser’s unfounded belief that he or she knows what the accusee is thinking

    Example: Line #1 uses fresh hydroxide while Line #2 uses recycled hydroxide. Batteries made on Line #2 have less satisfactory performance. Anyone concluding that recycled hydroxide impairs performance overlooks the possibility that there might be other differences between the two production lines.

    But to take two groups of people who self-identify as Democrats and Republicans and then, after eliding this to liberals and conservatives conclude that if more of one than the other answers a set of questions differently that there is a connection between the label and some alleged essential quality of the group is probably a social scientist.

    Pfui, sez I, being in a curmudgeonly mood.

  11. alan cooper, how do you define “few” in “a few years of slightly lower temperatures”?
    2, 3, 20? 100?
    That’s the type of thinking that politicians love, and scientists do not.

  12. and, Alan Cooper, you haven’t replied to Sheri’s very apt criticism of your first comment, as taking one comment as your sample in making a generalization.

  13. @YOS, you may be right sez I, being in a gregarious and generous mood.
    But it is only true that “*Anyone* concluding” *without having additional information* “that recycled hydroxide impairs performance overlooks the possibility that there might be other differences between the two production lines.”

    PS I am not old enough to have observed the origins of the Republican party, but I do know the story and I am old enough to remember the ‘Dixiecrats’. And in Canada I still have a few friends among the “red Tories” who choked on the Conservative Reform Alliance Party, so I share your concern about eliding party affiliation *of an individual* with position on any particular issue. On the other hand, while in no way a defender of current practice in the social “sciences”, I can imagine that it may well be true that recently professed party affiliations may well correlate quite strongly with positions on some currently pressing social issues.

  14. An Engineer

    “Scientists have been warning about global warming for decades. It’s too late to stop it now, but we can lessen its severity and impacts.” — David Suzuki

  15. @Bob, In order to qualify as an instance of contrapositive SEF “few” would have to be a small number in comparison with the length of the preceding sample and in the light of the as yet unexplained and apparently “random” fluctuations. I do not pretend to know how small that would actually be but from eyeballing the purported data I would guess at somewhere between 5 and 40.

    And with regard to my first comment, perhaps those without a sense of humour would not have realized that using a sample of one was a joke – based on the observation that, no matter what the topic, almost every one of Sheri’s comments that I have seen includes some kind of a dig at “liberals” (or “Democrats” which she appears to take as the same).

  16. Alan Cooper, thank you for your good-natured response. I apologize for making snarky comments. I guess my attitude toward the AGW faithful is very much like that of Dawkins or Hitchens to theists, a bad trait. So in the future I will try to be charitable (in the Christian tradition) to those whose faith is not my own.

  17. Sheri

    JH: So you would put the results first and methods last so no one reads the methods? Seems sneaky to me. (Nice you’re a statistician. Do you work for the government?)

    Alan: I was not making a “dig” at Democrats—well, maybe the # tag remark, but you really can’t blame me. That does seem to be the current method of dealing with things. I was actually observing that the report, which I would guess was meant to knock conservatives, actually did not bode well for Democrats. I would have done the same thing if the results were reversed. As for knocking Democrats at every opportunity, guilty. As for knocking conservatives at every opportunity, guilty. Politics creates such easy targets. I tend to knock all politicians of recent times—equal opportunity knocker.

  18. JH

    Sheri,

    The order of the two sections doesn’t matter to me. Authors have to follow strict guidelines given by the journal. Why does it seem sneaky?

    Ah, I get it; I get it! Briggs’ comment of “the boring, who-really-needs-to-read-it Methods section means to imply the authors are sneaky. You definitely can understand him better than I do!

    The method section is important because how the data are collected and the data structure play a key role in statistically modelling.

  19. ok Sheri it’s time to bury the hatchet so I’ll take your word for it re “equal opportunity” (but may be tempted to keep count for a while).
    PS I am curious to see that you seem to have (here) a pattern of contrasting “conservatives” and “Democrats” rather than “conservatives” and “liberals” or “Republicans” and “Democrats”. Is this intentional or just a fluke?

  20. Thanks Bob. Actually I try not to express any “faith” re AGW (or anything else for that matter). But I do tend to find that in matters of applied physics the simplest models often make useful predictions, and so I consider it foolhardy to expect that some unknown effects of more complicated models will protect us from the effect of increased CO2 that is predicted by a model which does not pretend to know (eg) whether that increase will raise or lower the planetary albedo.

  21. Sheri

    JH: Yes.

    Alan: Please feel free to count. Keep in mind I am somewhat bound by Brigg’s writings (not that we actually stick to the subject all the time). However, it would be an interesting count.
    Conservatives and Democrats is just a fluke. All four terms were used in the post and I just typed which ever one that came to mind. Sometimes I go back and check for the pairing in the post, but it’s really pretty random (as far as I know, anyway. It could be some biological thing I am not aware of! 🙂 )

  22. Briggs

    All,

    Alan reminds us that the number of asinine studies which purport to show academics are different than the rest of us is long. Here is a small sampling. Pay particular attention to the fMRI studies. Really, we could dissect a paper a day and never keep up. And recall that if you take any two groups of self-selecting people, if you keep probing with p-values, you’re bound to discover there are “statistically significant” differences between them.

    JH reminds with the SEF is related to another fallacy. To say “Men are taller than women” is to speak truly in a general sense. But to conclude from that that, “All or most men are taller than all or most women” would be to commit the exaggeration fallacy. Nobody does that with height, of course, but plenty do it with sketchy propositions, like those given in the post.

    (And JH, that thing about the Methods section was sarcasm. I was noticing how they “hid” that section so as to disguise the warts of the study.)

  23. Sheri

    Actually, Briggs, clothing manufacturers seem to believe the “all men are taller than women” exaggeration.

  24. Briggs

    Sheri,

    If they believed the fallacious interpretation, no woman would fit into any man’s clothing, and vice versa. We all believe the “on average” interpretation, which only means that if you take any man and any woman (without knowing anything other than their sex) then there is a greater than 50% chance that man will be taller than that woman. And that’s all it means.

    Actually, manufacturers used to do a very poor job estimating the distribution of sizes. Probably still do. Particularly at the extremes.

  25. Sheri

    Briggs: Wouldn’t tall women fit into men’s clothing and not women’s (says the woman that buys jeans and shirts off the men’s rack)? Short men can have the jeans and pants tailored to be shorter or buy off the women’s rack in some cases. I agree that manufacturers are not good at estimating the distribution of sizes, particularly at the extremes.

    I guess I’m looking at this from a non-statistical viewpoint. For a long time, I had always believed men were taller than women, until it became apparent that I am as tall or taller than many of my male colleagues. I then began to notice that I was taller than many of the women I know–by a good three to four inches or more. It may have been a subconscious thing, but I really did buy into the “most men are taller than women” belief. I knew it wasn’t “all”, of course.

  26. With respect to fMRI studies, speaking as someone who’s practiced in MRI, I would conclude that many of the studies that try to distinguish between more profound attitudes are not worth much. The problem is that fMRI requires very steep and frequent magnetic field gradient pulses, which are exceedingly loud and distracting (imagine trying to think about Aristotle with a succession of cannon shots outside your window). There is indeed an article that verifies (as much as one can with the small samples required for expensive MRI studies) the distracting effect of noise:
    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0080564;jsessionid=CA173E81E58716D7D1A3961A9BB7ED8C

  27. JH

    Briggs,

    I understand some people insist that salad is to be served before the main course, however, an implication of a host’s sneakiness or any sarcasm toward the host due to the unexpected serving order is beyond me.

    From this, I remind us, they concluded that Republicans and Democratsdiffered.

    This is false. They did not differ; or, at least, not all of them did. Only just enough differed to (after scads of manipulation) provide a wee p-value. But because all of them did not differ, and there is no reason to suppose that in new batches of registered party members, all of them will differ either. The statement is false.

    Do you have the data they used? If not, how do you know they did not differ?

    According to the reasoning, does this imply that if an educational major had the same IQ as one of the math majors in my sample, then I could conclude that Educational majors and Math majors did not differ in their IQ… your definition of “differ”?

    Does this also imply that if none of my samples of female and male history majors turned out to have the same IQs (this is probably the case in real life), then I could conclude that female and male history majors differed in their IQ?

    I imagine that no two experimental units will have the same response values – the volumes activated in those two areas of interests while performing a risk-taking task. Just like, if I sample a group of students, it is highly likely that no two students would have the same heights. I could be wrong.

    You definition of “differ between two groups” and your explanations as to why the statement is false don’t make sense. Eye-balling the data and decide whether two groups differ doesn’t work. It’s awfully hard to eye-ball multivariable data.

    Or course, you can always say that the entire statement quoted or your definition of “differ” was sarcasm.

    If you want to spend time criticizing papers such as this, my suggestion is that you first explain what the authors have done and what their conclusions are fairly, and then offer criticism.

    A conclusion such as “there is greater than 50% chance that … “ could be very powerful when one has minimum informationand needs to make a decision according to the majority rule.

  28. Sheri

    JH: Majority rules is politics. Science is not majority rules. If you don’t have enough information to make a decision, then wait for more information. If you cannot wait, flip a coin. It’s probably pretty close to using data from one study with minimal participants anyway. Much cheaper and easier.

  29. Ye Olde Statisician

    Actually, in industry our reports were supposed to go:
    o Summary
    o Recommendations
    o Results
    o Discussion
    All of the gory details were in the discussion section. What management wanted to know first was the basic gist, then what we were recommending, then the results that supported the recommendations, then the data and analysis.

  30. Sextus

    Here is a good example of (at least) exaggeration. The authors found 0.08% difference in mortality rate between 2 different techniques employed in anesthesia for orthopedic cases. The finding is reported with attractive p value. Interestingly, the report was followed with a critique that was, lets say, misunderstood by the authors. Here is the letter to the Editor.
    The Overpowered Mega-study Is a New Class of Study Needing a New Way of Being Reviewed. Anesthesiology 2014; 120, p 246.

    Perioperative Comparative Effectiveness of Anesthetic Technique in Orthopedic Patients
    Memtsoudis, Stavros G. M.D., Ph.D., F.C.C.P.*; Sun, Xuming M.S.†; Chiu, Ya-Lin M.S.†; Stundner, Ottokar M.D.‡; Liu, Spencer S. M.D.§; Banerjee, Samprit Ph.D., M.Stat.‖; Mazumdar, Madhu Ph.D., M.A., M.S.#; Sharrock, Nigel E. M.B., Ch.B.§
    Anesthesiology: May 2013 – Volume 118 – Issue 5 – p 1046–1058

    Background: The impact of anesthetic technique on perioperative outcomes remains controversial. We studied a large national sample of primary joint arthroplasty recipients and hypothesized that neuraxial anesthesia favorably influences perioperative outcomes.
    Methods: Data from approximately 400 hospitals between 2006 and 2010 were accessed. Patients who underwent primary hip or knee arthroplasty were identified and subgrouped by anesthesia technique: general, neuraxial, and combined neuraxial–general. Demographics, postoperative complications, 30-day mortality, length of stay, and patient cost were analyzed and compared. Multivariable analyses were conducted to identify the independent impact of choice of anesthetic on outcomes.
    Results: Of 528,495 entries of patients undergoing primary hip or knee arthroplasty, information on anesthesia type was available for 382,236 (71.4%) records. Eleven percent were performed under neuraxial, 14.2% under combined neuraxial–general, and 74.8% under general anesthesia. Average age and comorbidity burden differed modestly between groups. When neuraxial anesthesia was used, 30-day mortality was significantly lower (0.10, 0.10, and 0.18%; P 75th percentile) length of stay, increased cost, and in-hospital complications. In the multivariable regression, neuraxial anesthesia was associated with the most favorable complication risk profile. Thirty-day mortality remained significantly higher in the general compared with the neuraxial or neuraxial–general group for total knee arthroplasty (adjusted odds ratio [OR] of 1.83, 95% CI 1.08–3.1, P = 0.02; OR of 1.70, 95% CI 1.06–2.74, P = 0.02, respectively).
    Conclusions: The utilization of neuraxial versus general anesthesia for primary joint arthroplasty is associated with superior perioperative outcomes. More research is needed to study potential mechanisms for these findings.

  31. Marty

    Isn’t this the same as, or very close to, the “fallacy of the excluded middle”? Where a subtle or tiny difference is re-framed into polar opposites and the range of possibilities in between is assumed away?

    Ignoring the other problems with this sort of research, which is mostly insanity on stilts to begin with.

  32. Marty

    Responding to Mr. Briggs request–global warming. There is some indirect reason to believe that through technological processes humans may be causing the climate to warm. This is exaggerated into certain, looming disaster that requires extensive control of those technological processes, in which case all aspects of disaster are avoided. All that, and rather than developing policy based on science, corrupting the science to support the policy, but winding up where the pseudo-science says that even the cure (CO2 restriction) will hardly affect the problem.

  33. JH

    Sheri,
    That’s right, science is not majority rules. Majority rule is a decision rule, politics or not. Though I am not sure what your comments are about, thanks for the advice, anyway.

Leave a Reply

Your email address will not be published. Required fields are marked *