What The Sports Illustrated Curse Says About The Reproducibility Crisis

What The Sports Illustrated Curse Says About The Reproducibility Crisis

The so-called Sports Illustrated curse is easy to understand. A player or team excels, which is to say it does much better than they usually do, the exceptional performance attracts the attention of reporters, who then feature the player or team.

After the issue with their picture on the cover is delivered, the player or team is then often (but not always) “cursed” by sinking back to, or below, their usual level or performance.

Stated as blandly as that, it’s obvious there is no curse. It’s nothing but a form of “regression to the mean”, where the usual non-exceptional non-cover-worthy performance is evinced after the cover appears, as expected.

Which is another way of saying that a good model, one that makes excellent predictions, is to guess players and teams will perform at or near their average performance. This is so obvious that it’s almost a truism.

The key, of course, is that it is the comparatively rare burst of excellence that got the team on the cover. The exceptional performance was real enough—it happened, and was therefore caused to happen—but the circumstance that led to these causes are difficult to reproduce.

Or even impossible.

Those causes that led to exceptional performance might not have been related to the abilities of the team members themselves; they could have been circumstantial. For instance, a “big game” is won because the star player on the opposing team is (maybe secretly) playing injured. Or the field on which the game is played is somehow juiced, affecting the opposing team more.

Sports fans will undoubtedly think of many examples. All of which lead to the player or team being called “overrated”, or similar such words.

All right, that’s sports. But the same thing happens in science. And when it does it’s called the “reproducibility crisis.”

Maybe it’s already clear to you, but if not, it’s easiest to see at what might be called the edges of science—though I want to emphasize it happens in all of science all the time.

Take parapsychology, which is always struggling with versions of the Sports Illustrated curse. So common are these curses that they have been incorporated into the field as theory. Here’s one author on this (my emphasis):

Many parapsychological writers have suggested that psi may be capricious or actively evasive. The evidence for this includes the unpredictable, significant reversal of direction for psi effects, the loss of intended psi effects while unintended secondary or internal effects occur, and the pervasive declines in effect for participants, experimenters, and lines of research. Also, attempts to apply psi typically result in a few very impressive cases among a much larger number of unsuccessful results.

Another opens his paper saying

One of the most puzzling aspects of psi is its apparently capricious nature (Beloff, 1994; Hanson, 2001; Kennedy, 2003). This refers to the oft-reported difficulty of repeating highly successful pilot studies in formal replications, or worse, finding that strong effects in one study significantly reverse in follow-up attempts.

Large numbers of people are set to, say, guessing cards; invariably high scorers will appear, even after successive testing. But not after too much successive testing. The initial astounding results fade. The effect disappears. The curse has taken hold.

And it is a curse. That is, theorizers in the field say it is. The last paper quoted cites “tricksters”, entities which sabotage experiments and in whom the blame is laid for poor results. Just like overrated sportsmen, the superior performance in card guessers did happen, but the causes were outside the guesser’s skills.

You may be chuckling by now, and it is amusing, but one thing parapsychology has to their great credit over many other sciences is the determination to re-do their experiments, over and again, ad infinitum, in the hunt for their ever-elusive signal.

Whereas most other sciences are happy to run their experiment once, get their astonishing result, wave their wee p-value in everybody’s face, and then call their theory true. And then move on to the next theory, which is premised on the truth of the first, and start the cycle anew.

Entire mountains of error are built this way.

This is being discovered when people go back and try to re-do so-called foundational experiments. Those same significant reversals of direction and pervasive declines in effect are found all over—and usually for the same reason.

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

4 Comments

  1. Hagfish Bagpipe

    ”The last paper quoted cites “tricksters”, entities which sabotage experiments and in whom the blame is laid for poor results.”

    Positing “tricksters” as cause isn’t science, per se, but it’s a solid storyline in this Trickster Golden Age*.

    *Fool’s gold.

  2. Incitadus

    Parapsychology’s got nothing on parasomnia…

  3. Kees

    No podcasts anymore?

  4. Forbes

    Kahneman and Tversky (now deceased) made their start, as I best recall, in this field. An early one, written up, in (of all places) Sports Illustrated, about the “hot hand” in basketball, whereby the shooter who has hit several in a row, gets the ball fed to him to shoot again. It was not uncommon for a shooter who has hit, say two in a row, to be fed the ball again and again until he misses. Of course, the ball stops getting fed to him once he misses the next shot. So, strings of 6 or 7 consecutive made shots happen, in the same manner that coin flipping might turn up 6 or 7 heads in a row–occasionally, not regularly. But just often enough for the “hot hand” label to be affixed.

    IIRC, Kahneman and Tversky also did some work evaluating training systems for Israeli Air Force pilots, and sussed out similar reversion to the mean outcomes. Pilots praised for exceptional performance often followed up with worse performance, while pilots reprimanded for suboptimal performance rebounded with improvement. K&T’s analysis demonstrated reversion to the mean was far more responsible for the outcomes, than any praise or upbraiding–despite the training officers’ disposition towards carrots and sticks.

    These are recollections from 40+ years ago, so don’t hold me to specifics…

Leave a Reply

Your email address will not be published. Required fields are marked *