October 12, 2008 | 9 Comments
The way peer review works is broken, according to a new finding by John Ioannidis and colleagues in their article “Why Current Publication Practices May Distort Science”. The authors liken acceptance of papers in journals to winning bids in auctions: sometimes the winner pays too much and the results aren’t worth as much as everybody thinks.
What normally happens is that an author writes up an idea using the accepted third person prose, which includes liberal use of the royal we, as in “In this paper we prove…” His idea is not perfect, and might even be wrong, and he knows it. But he needs papers—academics need papers like celebrities need interviews with network news readers—and so he sends it in, hopeful.
Depending on how good our author thinks his paper is, coupled with the size of his ego, he will choose a journal from a list ranked by quality. This rating is partly informal—word of mouth—and partly pseudo-statistical—“impact factors.” “Impact” factors are based on a formula of how many citations papers in the noted journal get. The idea is that the more citations a work gets, the better it is. This is, as you might easily guess, sometimes true, sometimes not.
“Gaming” of impact factors is explicit. Editors make estimates of likely citations for submitted articles to gauge their interest in publication. The citation game has created distinct hierarchical relationships among journals in different fields. In scientific fields with many citations, very few leading journals concentrate the top-cited work: in each of the seven large fields to which the life sciences are divided by ISI Essential Indicators (each including several hundreds of journals), six journals account for 68%â€“94% of the 100 most-cited articles in the last decade.”
One of the main advantages of the publish and perish model of academic careerism has been the explosive growth of journals. In the field of mathematical statistics, for example, we have JASA and The Annals, the Cadillac and BMW of journals, but we also have Communications in Statistics and the Far East Journal of Theoretical Statistics, the Pinto and Yugo of publications. As Ioannidis says, “Across the health and life sciences, the number of published articles in Scopus-indexed journals rose from 590,807 in 1997 to 883,853 in 2007, a modest 50% increase.” Similar increases can be found in every field.
Even though there is, as the common saying goes, a journal for every paper, many authors shoot for the best at first because, as the commercial says, “Hey, you never know.” Naturally, then, the better journals end of rejecting most of their submissions. What happens next partially highlights the auction analogy.
Journals closely track and advertise their low acceptance rates, equating these with rigorous review: “Nature has space to publish only 10% or so of the 170 papers submitted each week, hence its selection criteria are rigorous”—even though it admits that peer review has a secondary role: “the judgement about which papers will interest a broad readership is made by Nature’s editors, not its referees”. Science also equates “high standards of peer review and editorial quality” with the fact that “of the more than 12,000 top-notch scientific manuscripts that the journal sees each year, less than 8% are accepted for publication”.
“Elite” colleges and universities do much the same thing: encourage as many applications as necessary just so that they can lower their acceptance rates, that figure figuring high in the algorithm of Eliteness.
Publish or Perish
The auction analogy breaks down at this point because there are some many other outlets for publication. The top journals do end up with better papers because of at least three things: there are so many outlets that a natural ranking always results, the citation arms race, and because of the non-numerical prestige factor. It is true that just because a paper is in a top journal, it is no guarantee that its findings are correct and useful, but I would say that it increases the probability that they are correct and useful.
If you cannot find a journal to take your paper, no matter how atrocious it is, then you aren’t trying hard enough. Many journals’ entire reason for existence is to take in strays. Sending in dreck to a fourth-rate journal isn’t always irrational. Publish or perish is a real phenomenon, and very often those judging your “tenure package” do nothing more than count the papers. When I was at Weill-Cornell (Med School), I was told that the number was 20. Naturally, this number is unofficial and never written down, but everybody knows it. Your colleagues will, however, be aware which journals are bottom feeders. A friend of mine once said “I give 1 point for every JASA or Annals paper. And I subtract 2 for every Communications.”
Ioannidis and his co-authors missed one important auction analogy: Fads. I’m thinking of that “artist” who pickles sharks and other dead animals and calls it “art.” That guy recently had an auction selling his taxidermy and raked in millions from fools bigger than himself. Sooner, and probably later, people will return to their senses and no longer buy what this guy is selling.
The same thing happens in “science” publishing. Papers within a fad are given what amounts to a free pass and proliferate. There was a time, right after the discovery of x-rays for example, when there was a proliferation of new “ray” discovery papers. The most infamous is Blondlot’s N-rays. In the ’80s and ’90s in psychology, the fad was “recovered memories” and “satanic cult discovery.”
Once a fad starts, new fad-papers cite the old ones, papers appear at an accelerating rate, and an enormous web of “research” is quickly built. Seen from afar, the web looks solid. But peer closer and you can see how easily the web can be torn to shreds. Today’s fad is “The Evils That Will Befall Us Once Global Warming Hits.” An example of how ridiculous this fad has gotten is this paper, which purports to show how suicides will increase in Italy Once Global Warming Hits.
It is not clear, as it probably never is when in the midst of one, when this fad will peter out. In any case, there is more than auction frenzy and faddishness that explains why peer review is not perfect.
For example, the Italian global-warming suicide paper used statistics to “prove” its results. The statistical methods they used were so appalling that I am still recovering from my review of the paper. The frightening thing is that this paper was not an exception.
Ioannidis is well known for a paper he wrote a few years ago claiming that most published research (that used classical statistics methods) was wrong. He said (quote from the auction paper)
An empirical evaluation of the 49 most-cited papers on the effectiveness of medical interventions, published in highly visible journals in 1990â€“2004, showed that a quarter of the randomised trials and five of six non-randomised studies had already been contradicted or found to have been exaggerated by 2005…More alarming is the general paucity in the literature of negative data. In some fields, almost all published studies show formally significant results so that statistical significance no longer appears discriminating. [emphasis mine]
Regular readers of this blog will recognize the sentiments. The simple fact is that if you use classical statistics methods—or even a lot of Bayesian parameter-focused methods—the results will be too certain. That is, the methods might give a correct answer to a specific question, but nobody can remember what the proper question is and so they substitute a different one. The answer thus no longer lines up with the question, and people are misled and become too certain.
Just why this is so will have to wait for another day.