Have doubts about the latest finding from researchers? Confused that last week red wine “increased the risk of heart disease” but this week red wine “decreased the risk of heart disease.” Concerned about that the apocalyptic findings of environmentalist scientists might be mistaken?
You needn’t be. Because science—if you will allow the reification—engages in the act of replication (behind closed doors, and using white lab coats for protection).
In a recent Skeptical Inquirer, James Alcock was saddened that the Journal of Personality and Social Psychology, the journal which published Daryl Bem’s positive “finding” of pre-cognition, would not publish his (Alcock’s) replication of Bem’s experiment, this time with the conclusion of no effect, on the grounds that they (the journal) only publishes on subjects positively.
Alcock lamented the journal’s attitude, and claimed that they were violating the sacred tenet of replication. Even worse, the policy of letting in positive findings and the barring of all negative findings guaranteed over-certainty. Dr Alcock was right. But more interesting is why he was so.
It makes sense that if you only publish news of positive “findings,” and never let fall a whisper of negative ones, you are bound to occasionally cry wolf or to celebrate erroneously. But what does replication have to do with it? Why replicate at all? I mean, why do scientists themselves say that replication is sacred?
Obviously, and trivially, it is because they acknowledge that mistakes are not only made in experiments, but they are made often. Replication is the hope that the other guy won’t make the same mistake the first guy did. The word is “hope”, because it can and has happened that the other guy does make the same mistake, or makes new mistakes, or makes both new and old ones.
It is only human nature to feel that it is the other guy is the one more likely to make mistakes. This is why many replications are made by a second guy who is convinced the first guy is nuts. He feels that not only will the replication prove this nuttiness, but it will elevate himself as a beacon of Objectivity in the process. Two birds, etc.
Mistakes are more common when scientists are convinced, or nearly so, about the outcome of an experiment before the experiment is carried out. So intent are they in looking for the answer they know will come, they often find it, even when it isn’t there. This is why we find the, to some, curious phenomenon that effect size (for some experiment) decreases the more it is replicated.1
From this principle, we have the corollary that the second guy replicating the first guy’s alleged nutty work may himself fall into error, confusing his animosity for his enemy for the unlikelihood of his enemy’s results. And we mustn’t forget the possibility that both guys are wrong.
Another common time for mistakes is when scientists are in the grip of Theory. Not your everyday, mundane theory; no, sir. The Theories I mean must be bright, sparkling, beautiful, and large, in the sense that they purport to explain the Human Condition And How To Fix It. Results contingent on the Theory rise like a tsunami, an unstoppable force, washing away everything before them. Even proof of the Theory’s falsehood, via replication or logic, cannot hold the waters back. We must instead stand aside and wait for the force to spend itself.
The less objective or the more complex the subject, the easier it is to see your own reflection in the data. Anything to do with people is maddeningly complex and rarely wholly objective. This is why mistakes are made in these areas more than others. Physicists trying to discover which mixtures of atoms leads to the highest temperature superconductor are less apt to trip than sociologists on the hunt for “disparities and their causes.”
It is also so that the more encompassing the Theory, the clearer the levers of social control within it, the closer it aligns to one’s Way Of Life, the more beautiful it is, thus the more compelling it is and the easier it is to cherish.
But love is blind and lovers cannot see
The pretty follies that themselves commit;
Replication or not!
——————————————————————-
1Even though all the experiments had small p-values!
Well stated. Selection biases alone are enough to cast doubt on many findings (more so with social sciences, less so with physical).
What are your thoughts on nutrition and health science? Between the fact that studies are enormously complex (dealing with humans) and are often epidemiological, and the fact that monetary incentive might drive this particular science more so than say, physics…do nutritionists really know what’s healthy and what’s not?
Maybe it’s just easier if you assign a p-value to how confident we can be in the USDA’s new food plate???
It’s like AGW. If you find humans adversely effect the climate you obtain fame and funding. If you find no effect, you are ignored. You have to find an effect to get on the gravy train.
Briggs…Oh My…this little essay omitted a crucial reference–and you should have known better than to omit it!!! Fortunately you’ve got a cabal of readers that can help out.
Here’s what you missed: reference to John P. A. Ioannidis’ [now cult-classic] paper, “Why Most Published Research Findings Are False,” available at: http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124 In that paper Ioannidis doesn’t really address replication at all…thus making it very relevent to this essay.
Why?
Because that is linked to a challenge–and such challenges are part of the scientific process–to that paper: “Is Most Published Research Really False?,” succinctly reported at: http://www.sciencedaily.com/releases/2007/02/070227105745.htm reached the following conclusions:
“However, in this week’s PLoS Medicine, Ramal Moonesinghe (US Centers for Disease Control and Prevention) and colleagues demonstrate that the likelihood of a published research result being true increases when that finding has been repeatedly replicated in multiple studies.
“”As part of the scientific enterprise,” say the authors, “we know that replication–the performance of another study statistically confirming the same hypothesis–is the cornerstone of science and replication of findings is very important before any causal inference can be drawn.” While the importance of replication was acknowledged by Dr Ioannidis, say Dr Moonesinghe and colleagues, he did not show that the likelihood of a statistically significant research finding being true increases when that finding has been replicated in many studies.”
Briggs,
I’m not convinced that the publication’s posture on publishing a replication that challenges the original conclusions is all bad. Assuming their space is constrained, why not a letter from Alcock stating that he replicated their work, found the conclusions unsupported and the write-up is available on-line?
On the more general topic of screwing-up, I hired architects and engineers during my career which involved interviewing the dozens who made it through the resume screening (mine, not human resources’ – who would filter out those from the people I was most likely to want to work with). I wasn’t very good at this in the beginning and hired some problems.
Then I hit on the idea of asking what was the dumbest thing they’d ever done since they’d been involved in construction, what happened, and what did they do about it?
No-one lacking a story responsive to this question got hired. I found that these stories made the performance and approach of the people who were hired more predictable.
I also collected some truly remarkable stories – things almost unbelievable in terms of best intentions gone awry.
Check out: http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0040028
for: “Most Published Research Findings Are False—But a Little Replication Goes a Long Way,”
While the authors (Ramal Moonesinghe*, Muin J. Khoury, A. Cecile J. W. Janssens) agree with John Ioannidis that “most research findings are false,” here they show that replication of research findings enhances the positive predictive value of research findings being true.
And, a more controversial item:
“When Should Potentially False Research Findings Be Considered Acceptable?,” the abstract for which reads as (find it at: http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0040026):
“Ioannidis estimated that most published research findings are false, but he did not indicate when, if at all, potentially false research results may be considered as acceptable to society. We combined our two previously published models to calculate the probability above which research findings may become acceptable. A new model indicates that the probability above which research results should be accepted depends on the expected payback from the research (the benefits) and the inadvertent consequences (the harms). This probability may dramatically change depending on our willingness to tolerate error in accepting false research findings. Our acceptance of research findings changes as a function of what we call “acceptable regret,†i.e., our tolerance of making a wrong decision in accepting the research hypothesis. We illustrate our findings by providing a new framework for early stopping rules in clinical research (i.e., when should we accept early findings from a clinical trial indicating the benefits as true?). Obtaining absolute “truth†in research is impossible, and so society has to decide when less-than-perfect results may become acceptable.”
I’d wager the model described above is a nice amalgomation (“mish-mash”) of very objective analytical & quantifiable elements with highly subjective value-laden subjective opinions. As the authors note, such is inevitable in the real world…so the question arises, ‘how good is their model/analysis?’
Certainly worthy of commentary by a suitably smart statistician having interest in such…..
Not being an academic myself, I’ve thought for a long time that a journal devoted to negative findings could fill a huge niche. It could start as a catch all, and then possibly grow to be able to specialize in various fields.
Of course, if it were simply on-line, then space issues wouldn’t really apply. It might also reduce the publish or perish pressure to investigate more and more outlandish propositions, and help sort out the rubbish already in the literature.
Also, it is not necessary for “mistakes” to be made. Lorentz’ early attempts at climate modelling uncovered non-linear dynamical systems aka Chaos theory. The key factor was that under certain conditions minute changes in initial conditions led to wildly differing results.
Replication needed to be extremely accurately performed in these studies. Sometimes it is simply not possible to get good replication using very simple mathematical structures such as the Logistic Map – just think how many unknowns there are in all the other experiments we do!
I’m curious how one can uncover chaotic systems in nature. To do so would be to prove that the system is chaotic. As nature doesn’t necessarily flow from the axioms of mathematics, mathematics itself can’t provide proof of chaotic structure. The very essence of a chaotic system reduces the predictive power of any chaotic model to zero. So how does one go about proving that nature follows a chaotic model? Why would any given chaotic model be better than say a patternless pseudorandom number generator?
@Matt,
The Journal of Negative Results would indeed have large volumes. The fact that the Journal is not published is one of the reasons why “data mining” through examination of only published papers starts with an incredible bias that can’t be eliminated.
Chemistry and physics get by quite well without replication – they manage with repeatability and reproducibility instead, at least as far as precision is concerned, and for accuracy they can compare with man-made reference standards such as the meter, kilogram and second. The objective is to quantify the uncertainty of the measurement. Statistical analysis of experiment should be carried out before the experiment is performed (i.e. how much data will be needed to achieve a particular uncertainty target), instead of as an afterthought once the results are in.
For life sciences, there are no equivalent reference standards and therefore no ‘true’ values for comparison. I would have more confidence in studies that explain results in terms of mechanism of action than in studies that report only correlation statistics. A correlation may suggest a cause-and-effect relationship – but the next step should be to search for a mechanism that explains the correlation. I don’t see much added-value in simple replication of correlation statistics. Additional data can however be used to try to validate a model.
Briggs, on a related topic, you never did follow up on the “decline effect” briefly mentioned in one of your earlier posts: https://www.wmbriggs.com/blog/?p=3342
“The less objective or the more complex the subject, the easier it is to see your own reflection in the data.”
I’ll remember this well-minted aphorism.