Skip to content
December 26, 2007 | 8 Comments

How many false studies in medicine are published every year?

Many, even most, studies that contain a statistical component use frequentist, also called classical, techniques. The gist of those methods is this: data is collected, a probability model for that data is proposed, a function of the observed data—a statistic—is calculated, and then a thing called the p-value is calculated.

If the p-value is less than the magic number of 0.05, the results are said to be “statistically significant” and we are asked to believe that the study’s results are true.

I’ll not talk here in detail about p-values; but briefly, to calculate it, a belief about certain mathematical parameters (or indexes) of the probability models is stated. It is usually that these parameters equal 0. If the parameters truly are equal to 0, then the study is said to have no result. Roughly, the p-value is the probability of seeing another statistic (in infinite repetitions of the experiment) larger than the statistic the researcher got in this study, assuming that the parameters in fact equal 0.

For example, suppose we are testing the difference between a drug and a placebo. If there truly is no difference in effect between the two, i.e. the parameters are actually equal to 0, then 1 out of 20 times we did this experiment, we would expect to see a p-value less than 0.05, and so falsely conclude that there is a statistically significant difference between the drug and placebo. We would be making a mistake, and the published study would be false.

Is 1 out 20 a lot?

Suppose, as is true, that about 10,000 issues of medical journals are published in the world each year. This is about right to within an order of magnitude. The number may seem surprisingly large, but there are an enormous number of specialty journals, in many languages, hundreds coming out monthly or quarterly, so a total of 10,000 over the course of the year is not too far wrong.

Estimate that each journal has about 10 studies it is reporting on. That’s about right, too: some journals reports dozens, others only one or two; the average is around 10.

So that’s 10,000 x 10 = 100,000 studies that come out each year, in medicine alone.

If all of these used the p-value method to decide significance, then about 1 out of 20 studies will be falsely reported as true, thus about 5000 studies will be reported as true but will actually be false. And these will be in the best journals, done by the best people, and taking place at the best universities.

It’s actually worse than this. Most published studies do not have just one result which is report on (and found by p-value methods). Typically, if the main effect the researchers were hoping to find is insignificant, the search for other interesting effects in the data is commenced. Other studies look for more than one effect by design. Plus, for all papers, there are usually many subsidiary questions that are asked of the data. It is no exaggeration, then, to estimate that 10 (or even more) questions are asked of each study.

Let’s imagine that a paper will report a “success” if just one of the 10 questions gives a p-value less than the magic number. Suppose for fun that, every question in every study in every paper is false. We can then calculate the chance that a given paper falsely reports success: it is just over 40%.

This would means that about 40,000 out of the 100,000 studies each year would falsely claim success!

That’s too high a rate for actual papers—after all, many research questions are asked which have a high prior probability of being true—but the 5000 out of 100,000 is also too low because the temptation to go fishing in the data is too high.? It is far too easy to make these kinds of mistakes using classical statistics.

The lesson, however, is clear: read all reports, especially in medicine, with a skeptical eye.

December 24, 2007 | No comments

Two differences in perception between global cooling and global warming

As is well known by now, a passel of climatologists in the 1970s, including such personalities as Stephen “It’s OK to Exaggerate To Get People To Believe” Schneider, tried to get the world excited about the possibility, and the dire consequences, of global cooling.

From the 1940s to near the end of the 1970s, the global mean temperature did indeed trend downwards. Using this data as a start, and from the argument that any change in climate is bad, and anything that is bad must be somebody’s fault, Schneider and others began to warn that an ice age was imminent, and that it was mainly our fault.

The causes of this global cooling were said to be due to two main things: orbital forcing and an increase in particulate matter—aerosols—in the atmosphere. The orbital forcing—a fancy term meaning changes in the earth’s distance and orientation to the sun, and the consequent alterations in the amount of solar energy we get as a result of these changes—was, as I hope is plain, nobody’s fault, and because of that, it excited very little interest.

But the second cause had some meat behind it; because, do you see, aerosols can be made by people. Drive your car, manufacture oil, smelt some iron, even breath and you are adding aerosols to the atmosphere. Some of these particles, if they diffuse to the right part of the atmosphere, will reflect direct sunshine back into space, depriving us of its beneficial warming effects. Other aerosols will gather water around them and form clouds, which both reflect direct radiation and capture outgoing radiation—clouds both cool and warm, and the overall effect was largely unknown. Aerosols don’t hang around in the air forever. Since they are heavy, over time they will fall or wash out. It’s also hard to do too much to reduce the man-made aerosol burden of the atmosphere; except the obvious and easy things, like install cleaner smoke stacks.

Pause during the 1980s when nothing much happened to the climate.

Continue reading “Two differences in perception between global cooling and global warming”