Title of today’s post is taken from article of the same name in Nature by William Sutherland, David Spiegelhalter, Mark Burgman. Several readers asked me to comment.
I’ll assume you’ve read the original. I kept the same order and wording as their points, and try not to repeat any of their good points.
Differences and chance cause variation. Chance can’t and doesn’t cause anything. Chance isn’t a thing, therefore it can never be a cause. Differences don’t cause things per se: things do (sizes of differences can certainly change rates of change). We cannot always identify causal agents, just correlates of change.
No measurement is exact. Well, not quite, but I take their point. Measurement error is vastly more prevalent than acknowledged and almost never accounted for. Leading to…can you guess? Over-certainty.
Bias is rife. Amen and amen. But, just like admonishing the public by reminding them they look ugly in jeans, they always think it’s the other guy and not them. Yes, you, even you, are biased. Even you. And you. Even if you’re part of a team that won prizes.
Bigger is usually better for sample size. Indeed, except for cost and the possibility of being overwhelmed or misled by errata, bigger is always better.
Correlation does not imply causation. But the opposite is true: causation causes correlation. People often forget the distinction between ontology and epistemology. This is also my fault for not making this distinction clearer more often. Most probability models are epistemological, meaning they say what are the changes in probability of some outcome given changes in input variables. The problem comes when people interpret the changing probabilities of the outcome as being caused by the input variables, which is usually not true.
Regression to the mean can mislead. See this on the so-called Sports Illustrated curse.
Extrapolating beyond the data is risky. The reason is probability models are usually not causative and even when they are few check them for accuracy (everybody checks them for fit, via p-values, posteriors and the like).
Beware the base-rate fallacy. Think of it this way. If you’re forecasting “No rain” for Tucson each day, you’re likely to be right most of the time. But your boast carries little importance. Try the same forecast for Norman, Oklahoma and you’re accuracy heads south. This is why we should speak of skill—the improvement over naive predictions—instead of accuracy rates. Right, climatologists?
Controls are important. The only thing wrong with this point is that important should read everything. The more you can control, the closer your model comes to causality. Problem is that controlling “everything” in human interactions or in anything contingent is impossible. It will always—as in always—be possible that something other than what we thought caused the outcome.
Randomization avoids bias. No. Randomization is not a property. It “gives” nothing to your results. “Randomization” belongs to the old days of magical thinking. Rather, assigning control of an experiment to persons without a financial or emotional interest reduces but cannot avoid bias. That residual bias exists is why there are always calls for replication.
Seek replication, not pseudoreplication. And speaking of replication… Listen up sociologists, psychologists, and so on: It is not a replication unless the experiment is repeated in exactly the same way where they only differences are those things you could not control in the first experiment. “More or less” the same way is not exactly the same way and is therefore not a replication. A mass of published literature on the same subject is only a weak indicator of truth. Who remembers frontal lobotomies, etc., etc., etc.?
Scientists are human. And because they are typically in positions commanding money and people, they fall prey more often to the standard sins.
Significance is significant. No, it is not, or at least not necessarily. “Significance” means attaining a wee p-value, one less than the magic number. And this result may not have and usually does not have practical bearing on questions of interest about the thing at hand. Finding a wee p-value is child’s play. Finding something useful to say is far harder.
Separate no effect from non-significance. Here I must quote: “The lack of a statistically significant result (say a P-value > 0.05) does not mean that there was no underlying effect: it means that no effect was detected.” This is only partially true. Lack of a wee p-value might mean the effect was there but undetected. On the other hand, the effect might be there and detectable, too. It’s just the p-values are terrible at discovering which situation we’re in. An effect without a wee p-value may still be important. If instead we looked at probability models as they should be looked at, as predictive statements, we could say more.
Effect size matters. Wee p-values alone mean nothing. Repeat that until you get sick of repeating it. This is another call for predictive analytics.
Study relevance limits generalizations. It’s funny how many reporters never read the papers they report on.
Feelings influence risk perception. And this is because feelings are part of what we risk! Money, after all, is only a crude device to measure our feelings. And just because you hate fat people eating transfats does not mean the risk of disease from eating transfats is high. And just because you hate smoking does not mean that “second-hand” smoke is perilously dangerous, etc., etc.
Dependencies change the risks. Try not to look at anything in isolation, unless the thing is amenable to isolation. Dice come to mind. The changes that await us when global warming finally strikes (soon, soon) do not.
Data can be dredged or cherry picked. “Big data” anyone? One thing Big Data guarantees is shocked looks on the face of managers who were certain sure they picked up a “significant” signal in their gleaming, massive datasets.
Extreme measurements may mislead. Like I always say, any survey or result is true conditional on the set of premises belonging to the experiment. Vary any of these premises, the result no longer holds. The more premises, i.e. conditions, there are, the greater the chance the results are not meaningful beyond the realm of this single experiment. Journalists often change this premises in their reporting; but to be fair, so do many scientists when summarizing their work. Memorize this.