William M. Briggs

Statistician to the Stars!

Cluster Failure: Biggest ‘I Told You So’ Yet. fMRI Stinks

5383456541_4c65d48479_b

As reader Nate Winchester surmised, today, the biggest I Told You So Yet.

Headline: “MRI software bugs could upend years of research: This is what your brain looks like on bad data“.

Listen, dear reader, when your Uncle Matt tells you there’s bad statistics, there’s bad statistics. And when I warned fMRI applied to personality and “free will” research was little better than electronic phrenology, I hope you were paying attention, because I told you so.

A whole pile of “this is how your brain looks like” MRI-based science has been invalidated because someone finally got around to checking the data.

The problem is simple: to get from a high-resolution magnetic resonance imaging scan of the brain to a scientific conclusion, the brain is divided into tiny “voxels”. Software, rather than humans, then scans the voxels looking for clusters.

When you see a claim that “scientists know when you’re about to move an arm: these images prove it”, they’re interpreting what they’re told by the statistical software.

Now, boffins from Sweden and the UK have cast doubt on the quality of the science, because of problems with the statistical software: it produces way too many false positives.

In this paper at PNAS, they write: “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

The earthy title of the peer-reviewed paper is “Cluster F— —: Why fMRI inferences for spatial extent have inflated false-positive rates”.

No, it’s “Cluster failure“. But one is tempted…One is sorely tempted.

Authors are Eklund, Nichols, and Hans Knutsson. They open the abstract with, “Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data.”

Surprisingly not been validated? Not surprising to us. We know how greedy “researchers” are for results, how willing they are to cut corners, how delighted they are to jump to theory, how apt they are to latch onto the bandwagon and not let go. Scientists, regular readers know, are people too.

Here’s a small sample of what I said in the past (for fuller results, use this search, which will also turn up this article).

Item: “Regular readers will know my opinion on fMRI research. Nothing but newfangled electronic phrenologic theory-discovering machines.” fMRI Discovers Freud, Distribution Plushies Lurking In Brain.

Item: “Can fMRI Predict Who Believes In God?” No. This series takes apart Sam Harris’s “celebrated” study.

Item: Nonpolitical Images Evoke Neural Predictors Of Political Ideology?

Is this a good point to remind us the fMRI data are not pictures of the brain but are themselves output of models and heuristics (“Functional data were first spike-corrected to reduce the impact of artifacts using AFNI’s 3dDespike”, etc., etc.) which themselves are subject to uncertainty which should be carried forward in any analysis but which usually aren’t, and weren’t here? If not, let me know when is.

That 3dDespike AFNI is one of the faulty statistical routines discovered in Cluster Failure.

Item: Brain-Scan Lie Detectors Don’t Work

Item: Our Brains Are Not Us: Review of Brainwashed. It’s more than just bad statistics.

The lights were from a functional magnetic imaging device, or fMRI, an instrument which Sally Satel (psychiatrist) and Scott Lilienfeld (psychologist) in their terrific Brainwashed: The Seductive Appeal of Mindless Neuroscience compare to an automated phrenological machine, a contrivance which when placed in proximity to the skull is purported to reveal all secrets, desires, motivations; even to expose lies and to prove that we are nothing but wet meat machines, mere automatons…

Studies also rely on those colorful brain scans which are not, as many think, “photographs of the brain in action in real time. Scientists can’t just look ‘in’ the brain and see what it does. Those beautiful color-dappled images are actual representations of particular areas in the brain that are working the hardest—as measured by oxygen consumption—when a subject performs a task such as reading a passage or reacting to stimuli” or when they go off script and wonder why they volunteered to be squeezed into a claustrophobia-inducing tube and told to lie as “still as a corpse” for over an hour.

This distinction is important because there is no (non-circular) way to check if a person is thinking what he is told, thus it’s only a possibility that the heavy oxygen-using regions are directed toward the specified experimental tasks. The best that can be said is the areas which glow brightly are correlated with the emotional states said to be under investigation—never minding that emotions are difficult to define, extraordinarily complex things. Is the “hate” center of the brain found in one experiment that same “hate” found in another experiment?

So why are fMRI statistical analyses so bad? What went wrong? The standard: wee p-values, hypothesis testing, etc., etc., etc. No hypothesis test (whether by wee p-value or Bayes factor) should ever be used again. I mean that “No” as in “No.” Eklund and pals also use hypothesis tests, in their demonstration that past research is too certain, which means even Eklund’s suggested corrective methods will produce results which are too certain. You have been warned. In any case, I’ll let Eklund have the last word.

It is not feasible to redo 40,000 fMRI studies, and lamentable archiving and data-sharing practices mean most could not be reanalyzed either…

Finally, we point out the key role that data sharing played in this work and its impact in the future. Although our massive empirical study depended on shared data, it is disappointing that almost none of the published studies have shared their data, neither the original data nor even the 3D statistical maps.

On the other hand, I’ll take the last word. I told you so!

27 Comments

  1. If one uses a computer and models, doesn’t that make it science?

    Luckily, I’m not in an fMRI tube and no one knows what part of my brain is “lighing up” at the moment. 😉

  2. Wasnt’ there a study where someone hooked up an fMRI machine to a dead fish, and found that it registerd “reactions” to things? I wish I could remember where I read that…

  3. Gail, Google can help with that. From 2009:

    http://www.wired.com/2009/09/fmrisalmon/

    (Scanning Dead Salmon in fMRI Machine Highlights Risk of Red Herrings)

  4. What is an open question is if, how, and when these findings will seep into the public body of knowledge so that the guy on the street will know these results. After all, he has been passively bombarded with tiny bits of data for a quarter century that suggest the magic of fMRI. While he hasn’t read any papers, he takes this as some kind truth. It is going to take some doing to penetrate that shell that has been built up and calcified over time, and the process will be quicker for some than for those who will willingly grasp on to a bad “truth” because the scientists said it was so.

  5. According to the FBI director such crimes don’t matter when you haven’t intended to do anything wrong. The researchers should get a stern (heh) scolding and be sent off with more grant funding to continue their work. Maybe somebody will “modify” the software to adjust those rascally voxels. Or not. To quote a leading politician, “What does it matter?”

  6. Does this mean that probably really do have free will?

  7. Yawrate: No, but you’re free to believe that if you wish. 🙂

  8. Notice that the paper does not discuss the quality of MRI scan results or the ongoing dramatic improvement of MRI sensitivity and resolution?

    I don’t know if one should be surprised to find out how the MRI data are sometimes poorly analyzed and why the fMRI software packages SPM, FSL, and AFNI are not validated by practitioners. I think it’s important to note that the false positive rates produced by all the software are calculated based on simulations.

    For large data, p-hacking is an easy task and can also be easily prevented. What else went wrong?

    Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape.

    Second, a 15-year-old bug was found in 3dClustSim while testing the three software packages (the bug was fixed by the AFNI group as of May 2015, during preparation of this manuscript).

    Well, a college alumnus friend, who works as a director at SAS, complained that he was constantly on the lookout for a great talent because the blend of computational/programming and statistical modeling skills was a rare commodity in high and competitive demand.

  9. Ooops, spelled “blockquote” incorrectly.

  10. Ah, at last something I know something about (fMRI–I was the MRI physicist at Geisinger Medical Center and helped set up their MRI). I think a factor to be considered even above and beyond the errors in statistical analysis is the obtrusive environment: the very loud scans (due to field gradient pulses), the cramped environment make answers to questions that require intellectual focus irrelevant.
    Moreover, most radiologists regard statistics as a screwdriver to unlock the box for another paper. I’ve seen poster papers with regression lines drawn through what looked like random shots on a barn wall. That being said, radiologists do have a trained eye for discovering patterns–textures, forms–that are above and beyond computer analysis. But that is a capacity that applies more to the detection of malignancies than to investigations in “neuroscience” or “cognition”.
    PET and SPECT are better techniques than fMRI for studies of this sort, even though their spatial resolution is not as good as that of fMRI. This has been shown by studies of brain areas modified by prayer and contemplation, See (department of shameless self-promotion) “Are we hard-wired for faith” at
    http://rationalcatholic.blogspot.com/2014/03/are-we-hard-wired-for-faith-religious.html

  11. Likely why they didn’t use an fMRI in the “Here’s your brain on drugs” ads.

    Here’s the original late 80’s remake of “Reefer Madness”:
    https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiUlc38p9_NAhVMOSYKHXcyDXcQtwIIKjAB&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dub_a2t0ZfTs&usg=AFQjCNFnWPQW7WqbCltwaeJyF434o913hQ&sig2=EmnbdFI2749pWpg58WbDwQ

    Not sure how they confused a frying an with statistics but I guess you could cook up your paper with some other pot. Some may.

  12. I like the headline: “Bugs” – as if the design were OK, but regrettable errors were made; “May” – as if we should have confidence in the results until we hear otherwise, you know, just a heads up?

  13. “Although our massive empirical study depended on shared data, it is disappointing that almost none of the published studies have shared their data, neither the original data nor even the 3D statistical maps. ”

    Reminds me of something else. No, don’t help me – I am sure it will come to me in time…

  14. One day in the medium future we’ll be informed that the role of DNA transfer between generations has been greatly misunderstood.

  15. Bob: I wondered about MRI’s for structural irregularities. Seems that would be easier to get actual data on and know the error rate with more accuracy.

  16. Sheri, for structural irregularities, for unusual states of tissue water (e.g. hydrated tumor vs normal grey or white matter), MRI is excellent. And as I said in my comment to Swordfishtrombone, practicing radiologists have an eye for discerning subtle differences in form, texture, intensity that outdoes any computer learning program (programs that can match radiology residents).
    But the fMRI results are quite different, there a subtle differences in intensity that require good analysis to detect–in some cases they shine out, but this might not happen generally. Whence the reliance on intricate statistics in the image processing, a feature that is not generally required for conventional diagnosis of brain lesions.

  17. there a subtle—> there is a subtle

  18. There’s nothing like a good I-told-you-so unless it is on me.

  19. Bob Kurland wrote:

    “Ah, at last something I know something about…”

    Lucky you. I check Briggs every day hoping there’s a topic I know something about so I can shoot my mouth off. No joy.

  20. I’m not surprised to see the real data showing people have a widely disparate variety of neural schematics and that it is very difficult to find specific patterns among large population studies. It is the adaptability of the human brain that makes us so successful a species.

    JMJ

  21. Jersey,
    You seem to blithely assume the scientistic dogma that consciousness and conception is a purely chemical process of the brain.

    I will contend that the brain is only the physical organ that connects the metaphysical mind with the physical and sensory world.

    To me, it is incomprehensible that one could claim that chemistry produces or equals understanding or comprehending unless they are superstitiously committed to the dogmas of Materialism… in which case any observation or logic will be dismissed because, and only because, it conflicts with the ideological assumption.

  22. Bob Kurland: That was what I suspected. I was never really sure why MRI’s were used for psychological process identification. As you noted, PET scans are better and at what time that was what was used.

    JMJ: Nice.

    Oldavid: I think you’re responding more to JMJ’s comment based on his past responses to other things. I guess I’m giving him credit for being very diplomatic this time. (I don’t disagree with you necessarily, but sometimes JMJ makes good points in spite of himself or because of himself, I’m not sure which.)

  23. I wonder if the statistics applied to the models used in the search for “exo-planets” is similarly flawed.

    The technique used for “finding” planets around distant stars arises from analysis of tons of measurements looking for tiny systematic variations that fit a pre-conceived mental model of what those variations SHOULD look like IF a planet was causing them. Seems to me the bias towards false positives may be very high. That bias, IF augmented by any statistical flaws, may have generated a huge amount of popular ideas that are not in any way true. Perhaps planets are not common; perhaps life in the universe is not ubiquitous; perhaps intelligence and souls-in-the-image-of-a-Creator are unique and precious.

    Just wondering

  24. Chuck Bradley

    July 7, 2016 at 11:10 am

    A recent book, “The end of Average: how we succeed in a world that values sameness” by Todd Rose, has a section about fMRI. Attempts to locate regions of the brain that do certain things (by fMRI) seem worthless. He shows the many images that were averaged to locate some emotion or process so that deviations from that average could be used to analyze other subjects/patients. But there was no central tendency in the original images.

    My analogy of the result is we found the average weight of some mice, worms, and horses to conclude that Susan is too fat.

  25. @Oldavid:
    “I will contend that the brain is only the physical organ that connects the metaphysical mind with the physical and sensory world.”

    That seems to me to be concluding that because our tools are too coarse to see all details of physical activity, then the correct conclusion is that there is some non-physical entity involved, despite there being no positive evidence for that, and no explanation that ties that to the physical processes we do know to be true (gravity, electromagnetic radiation, sub-atomic particle behavior, etc.). Why should I not dismiss your “metaphysical mind” as a completely imaginary construct (constructed of course by chemical processes in your brain)?

  26. @Chris Claude:
    Since we’ve gotten to this tangent, let me point out the following:
    1. There is one thing I know to be true, and that is the reality of my current conscious experience.
    2. All the rest (gravity, chemical processes in the brain, not to mention sub-atomic particles) is stuff I heard about from teachers in school and read about in books. I don’t really know all that stuff to be true. I just trust the various (hopefully) smart people who told me that stuff. Occasionally (for example now, with fMRI), my trust in their good judgment turns out to be misplaced. And, in fact, if you think about scientific method for a moment, you’d realize that all those theoretical constructs you are citing as truths should one day be proven false with the future progress of science (they are usually much better guides for practical activities and policy than some other stuff people are putting out, but they are not meant to be the ultimate truth).
    3. Chemical processes in the brain, or what have you, are not the conscious experiences I am having now. The claim that they are is not even false; it’s utterly meaningless, and is tantamount to the claim that I don’t exist (as a conscious being). You may argue that whenever a certain kind of chemical process goes on in a person’s brain, that person would have a certain conscious experience (although we are very far from finding empirical support for any claim of that sort), but even if this is true, you still can’t seriously argue that the experience just doesn’t exist, and all there is are the chemical processes in the brain (are you saying the experience is just an illusion? Well, now, and who’s experiencing that illusion? Brain chemistry can’t be an illusory experience any more than it can be a non-illusory one).
    4. To sum up, you are trying to use school-level “knowledge” to convince me of the falsity of the one thing I can directly attest is true—my own existence as a conscious being in the here-and-now (to put it in much too fancy terms). Sorry, this doesn’t hold water.

Leave a Reply

Your email address will not be published.

*

© 2016 William M. Briggs

Theme by Anders NorenUp ↑