Peer Review Not Perfect: Shocking Finding

The way peer review works is broken, according to a new finding by John Ioannidis and colleagues in their article “Why Current Publication Practices May Distort Science”. The authors liken acceptance of papers in journals to winning bids in auctions: sometimes the winner pays too much and the results aren’t worth as much as everybody thinks.

What normally happens is that an author writes up an idea using the accepted third person prose, which includes liberal use of the royal we, as in “In this paper we prove…” His idea is not perfect, and might even be wrong, and he knows it. But he needs papers—academics need papers like celebrities need interviews with network news readers—and so he sends it in, hopeful.

Impact Factors

Depending on how good our author thinks his paper is, coupled with the size of his ego, he will choose a journal from a list ranked by quality. This rating is partly informal—word of mouth—and partly pseudo-statistical—“impact factors.” “Impact” factors are based on a formula of how many citations papers in the noted journal get. The idea is that the more citations a work gets, the better it is. This is, as you might easily guess, sometimes true, sometimes not.

“Gaming” of impact factors is explicit. Editors make estimates of likely citations for submitted articles to gauge their interest in publication. The citation game has created distinct hierarchical relationships among journals in different fields. In scientific fields with many citations, very few leading journals concentrate the top-cited work: in each of the seven large fields to which the life sciences are divided by ISI Essential Indicators (each including several hundreds of journals), six journals account for 68%â€“94% of the 100 most-cited articles in the last decade.”

One of the main advantages of the publish and perish model of academic careerism has been the explosive growth of journals. In the field of mathematical statistics, for example, we have JASA and The Annals, the Cadillac and BMW of journals, but we also have Communications in Statistics and the Far East Journal of Theoretical Statistics, the Pinto and Yugo of publications. As Ioannidis says, “Across the health and life sciences, the number of published articles in Scopus-indexed journals rose from 590,807 in 1997 to 883,853 in 2007, a modest 50% increase.” Similar increases can be found in every field.

Even though there is, as the common saying goes, a journal for every paper, many authors shoot for the best at first because, as the commercial says, “Hey, you never know.” Naturally, then, the better journals end of rejecting most of their submissions. What happens next partially highlights the auction analogy.

Journals closely track and advertise their low acceptance rates, equating these with rigorous review: “Nature has space to publish only 10% or so of the 170 papers submitted each week, hence its selection criteria are rigorous”—even though it admits that peer review has a secondary role: “the judgement about which papers will interest a broad readership is made by Nature’s editors, not its referees”. Science also equates “high standards of peer review and editorial quality” with the fact that “of the more than 12,000 top-notch scientific manuscripts that the journal sees each year, less than 8% are accepted for publication”.

“Elite” colleges and universities do much the same thing: encourage as many applications as necessary just so that they can lower their acceptance rates, that figure figuring high in the algorithm of Eliteness.

Publish or Perish

The auction analogy breaks down at this point because there are some many other outlets for publication. The top journals do end up with better papers because of at least three things: there are so many outlets that a natural ranking always results, the citation arms race, and because of the non-numerical prestige factor. It is true that just because a paper is in a top journal, it is no guarantee that its findings are correct and useful, but I would say that it increases the probability that they are correct and useful.

If you cannot find a journal to take your paper, no matter how atrocious it is, then you aren’t trying hard enough. Many journals’ entire reason for existence is to take in strays. Sending in dreck to a fourth-rate journal isn’t always irrational. Publish or perish is a real phenomenon, and very often those judging your “tenure package” do nothing more than count the papers. When I was at Weill-Cornell (Med School), I was told that the number was 20. Naturally, this number is unofficial and never written down, but everybody knows it. Your colleagues will, however, be aware which journals are bottom feeders. A friend of mine once said “I give 1 point for every JASA or Annals paper. And I subtract 2 for every Communications.”

Fads

Ioannidis and his co-authors missed one important auction analogy: Fads. I’m thinking of that “artist” who pickles sharks and other dead animals and calls it “art.” That guy recently had an auction selling his taxidermy and raked in millions from fools bigger than himself. Sooner, and probably later, people will return to their senses and no longer buy what this guy is selling.

The same thing happens in “science” publishing. Papers within a fad are given what amounts to a free pass and proliferate. There was a time, right after the discovery of x-rays for example, when there was a proliferation of new “ray” discovery papers. The most infamous is Blondlot’s N-rays. In the ’80s and ’90s in psychology, the fad was “recovered memories” and “satanic cult discovery.”

Once a fad starts, new fad-papers cite the old ones, papers appear at an accelerating rate, and an enormous web of “research” is quickly built. Seen from afar, the web looks solid. But peer closer and you can see how easily the web can be torn to shreds. Today’s fad is “The Evils That Will Befall Us Once Global Warming Hits.” An example of how ridiculous this fad has gotten is this paper, which purports to show how suicides will increase in Italy Once Global Warming Hits.

It is not clear, as it probably never is when in the midst of one, when this fad will peter out. In any case, there is more than auction frenzy and faddishness that explains why peer review is not perfect.

Bad Statistics

For example, the Italian global-warming suicide paper used statistics to “prove” its results. The statistical methods they used were so appalling that I am still recovering from my review of the paper. The frightening thing is that this paper was not an exception.

Ioannidis is well known for a paper he wrote a few years ago claiming that most published research (that used classical statistics methods) was wrong. He said (quote from the auction paper)

An empirical evaluation of the 49 most-cited papers on the effectiveness of medical interventions, published in highly visible journals in 1990â€“2004, showed that a quarter of the randomised trials and five of six non-randomised studies had already been contradicted or found to have been exaggerated by 2005…More alarming is the general paucity in the literature of negative data. In some fields, almost all published studies show formally significant results so that statistical significance no longer appears discriminating. [emphasis mine]

Regular readers of this blog will recognize the sentiments. The simple fact is that if you use classical statistics methods—or even a lot of Bayesian parameter-focused methods—the results will be too certain. That is, the methods might give a correct answer to a specific question, but nobody can remember what the proper question is and so they substitute a different one. The answer thus no longer lines up with the question, and people are misled and become too certain.

Just why this is so will have to wait for another day.

clyde_m

October 12, 2008, 8:41 am

You wrote, “That is, the methods might give a correct answer to a specific question, but nobody can remember what the proper question is and so they substitute a different one. The answer thus no longer lines up with the question, and people are misled and become too certain.”

I think this applies to many aspects of life.

In politics, we are constantly misled by all participants regarding the opponent’s record. “What does [he, she, it] propose for middle class Americans? The largest tax increase in the history of the Universe.” When we dig deeply into the data, we might find, in fact, the largest proposed tax increase, but then we realize that the question is a bit off, a measure too targeted. The rhetorical question doesn’t line up with the summary answer – yet the fact checkers rarely get around to saying, “While a series of convex mirrors in a ill-lit narrowing hallway might support the answer given, it would be more accurate ab initio if the question were posed as …”

In economics, the answers given are often a finite slice while the questions proffered are broad: “What did President Bush do to the economy? He resided over the largest deficits in history.” The question ignores the role of Congress in writing and passing the budget. It ignores the expansion of GDP in light of the shock of 9/11 and all the security measures – overhead resulting in increased operating costs for most segments of the economy. It ignores the drain of the GWOT. The answer ignores the substantial delayed spending of the Clinton Administration so that their books would look better.

Similar economic misstatements occur when they speak of unemployment “rising” while ignoring frictional and structural unemployment. Journalists can understand the concept of seasonal unemployment, and the published data is usually “seasonally adjusted,” so we get that. But we never hear discussion of true “full employment” and the diminishing productivity of workers taken from the frictional and structural pools. So we get a finite answer cast against the landscape of a broader question.

9 Comments

clyde_m

October 12, 2008, 8:41 am

You wrote, “That is, the methods might give a correct answer to a specific question, but nobody can remember what the proper question is and so they substitute a different one. The answer thus no longer lines up with the question, and people are misled and become too certain.”

I think this applies to many aspects of life.

In politics, we are constantly misled by all participants regarding the opponent’s record. “What does [he, she, it] propose for middle class Americans? The largest tax increase in the history of the Universe.” When we dig deeply into the data, we might find, in fact, the largest proposed tax increase, but then we realize that the question is a bit off, a measure too targeted. The rhetorical question doesn’t line up with the summary answer – yet the fact checkers rarely get around to saying, “While a series of convex mirrors in a ill-lit narrowing hallway might support the answer given, it would be more accurate ab initio if the question were posed as …”

In economics, the answers given are often a finite slice while the questions proffered are broad: “What did President Bush do to the economy? He resided over the largest deficits in history.” The question ignores the role of Congress in writing and passing the budget. It ignores the expansion of GDP in light of the shock of 9/11 and all the security measures – overhead resulting in increased operating costs for most segments of the economy. It ignores the drain of the GWOT. The answer ignores the substantial delayed spending of the Clinton Administration so that their books would look better.

Similar economic misstatements occur when they speak of unemployment “rising” while ignoring frictional and structural unemployment. Journalists can understand the concept of seasonal unemployment, and the published data is usually “seasonally adjusted,” so we get that. But we never hear discussion of true “full employment” and the diminishing productivity of workers taken from the frictional and structural pools. So we get a finite answer cast against the landscape of a broader question.
ElKoz

October 13, 2008, 4:24 am

The one thing that has come to be true for researchers since the end of WWII is that in order to succeed in “science” they must be able to bring in government grant money.lots of grant money. Truth is no longer the holy grail of science. It is now and in the foreseeable future….GRANT MONEY. Pronouncements of scientific claptrap is rampant and “researchers” have sold their integrity in order to acquire it. Researchers and universities grovel at the shrine of anthropogenic global warming, now called climate change since everything they predicted is falling apart. If those dolling out the money demanded that there was no such thing as anthropogenic global climate you would see these universities and researchers do an about face so fast that you would swear that they were the color guard in a military parade. The reason that peer review is failing is because the whole system; those granting research money, he researchers, the universities and the science journals have been so thoroughly corrupted that is amazes me that any deceptions are discovered or exposed. There is a reason why so much mistrust is directed at the scientific community; youâ€™ve earned it.
Clark

October 13, 2008, 1:37 pm

“It is true that just because a paper is in a top journal, it is no guarantee that its findings are correct and useful, but I would say that it increases the probability that they are correct and useful.”

Actually, the reverse is true. In a study a couple years ago, Nature and Science were found to have the lowest accuracy of major journals. This is certainly because one criterion is the “novelty” factor, where papers making an unprecedented finding, or contradicting a long-held idea are more likely to get on. Of course, long-held ideas have this status for a reason, and a new contradictory finding usually turns out not to contradict in the long run.

A second reason is that Nature and Science papers are so short that they are often not nearly so well supported by experimental data as papers in a more main-line journal.
Francois Ouellette

October 17, 2008, 3:43 pm

Personnally, I find that the one sad consequence of the publish or perish system is that we are drowned in insignificant science. Most papers are “correct”, but only because the authors dare not publish something that would be so bold and original that they could be “incorrect”. Most published papers are neither useful, nor cited, nor even read.

When I started in science, I was very excited to read all the new papers coming out. But after ten years, I realized that there were very few novel ideas, and many papers would just rehash something that had already been done. Sometimes you see the same idea being published every ten years or so!

There must be a way out of this system!
TCO

October 18, 2008, 7:23 am

Lot of good content here, but you wander from the subject of publication dynamics to frequentist-bashing.
Dave Andrews

October 18, 2008, 2:00 pm

Francois,

“. Sometimes you see the same idea being published every ten years or so! ”

In a previous incarnation I had to study a lot of management theory – it was basically a circular system which ‘reinvented the wheel’ with new nomenclature but precious little in the way of new ideas:-)
Briggs

October 18, 2008, 3:29 pm

TCO,

Force of habit. Plus, the corpse is still twitching…
lucia

October 18, 2008, 6:24 pm

Plus, the corpse is still twitchingâ€¦

Briggs wins the thread! (For usage see Volokh Conspiracy.)
Briggs

October 19, 2008, 4:30 am

All,

Just for the sake of accuracy, there is no such thing as a “Nobel Prize in Economics.” There is a prize created by the Swiss bank in honor of the Nobel Prize. Nobel himself was wiser then creating an award for the dismal science.

Peer Review Not Perfect: Shocking Finding

Related

9 Comments

Leave a Reply

Share this:

Related

9 Comments

Leave a Reply