*A version of this post originally appeared on 20 October 2012. But after a Twitter conversation with our friend @Neuro_Skeptic, it’s time for an addition.*

Comes a *Salon* article entitled “The Internet Blowhard’s Favorite Phrase: Why do people love to say that correlation does not imply causation?“.

That article uses a lot of words, more than I in my shaken state can assimilate. It also has pictures of Karl (but unfortunately not Egon) Pearson and some blurry lines whose meaning escaped my bleary eyes.

Anyway, here’s the short truth of it: if there is causation, there is correlation. If there is no causation, there might be aberrant correlation. If there is correlation, there might be causation. If there is no correlation, there is no causation.

Thus correlation implies, but does not prove, causation. It implies in the colloquial definition of that word; it suggests. Its presence does not prove implication in the logical sense, though. But since most people are unaware of the distinction in meanings, and most take {\it to imply} as a synonym of {\it to suggest}, it’s not improper to say correlation implies causation.

Suppose you see me shoot my pistol at a plate glass window and observe that it breaks. There is correlation: the two events are coincident. There is causation. The correlation implies the causation.

Later you see a second person doing the shooting. But you learn after the fact that he is a magician practicing the bullet-catching trick. The window indeed breaks after you hear the bang but it breaks because of a hidden device he activates on his person. The bullet is not causing the window to break, but to your mind there is the correlation, the coincidence between the bang and breaking. So there is correlation and no causation.

A *correlation* of X and Y is this: knowledge of X changes our judgment of the possibilities of Y, and vicey verso. If X and Y are not correlated, then knowledge of X does not change our judgment of the possibilities of Y, and the other way round, too.

(Here’s the update.) It was suggested to me that there might not be correlation in the presence of causation. The example given was that X causes Y but Z counteracts X. But this isn’t right, because X did not cause Y, because it has been counteracted by Z. X can cause Y but not all the time because Z (or whatever) sometimes counteracts X. In this case the observed correlation between X and Y might not be recognized. This is true; but in those times that X did cause Y, there is correlation. We musn’t confuse our statistical tests’ ability to recognize correlation with the presence or absence of correlation.

I threw a bunch of navy beans (X) onto the floor, where they remarkably self-ordered into the vague visage of our Dear Leader (Y). That’s aberrant correlation for you, for, as is well known, navy beans normally take the shape of farm animals.

Now along comes an academic anxious for a paper. He sees my X and Y, puts the beans into a statistical model, and out pops a wee p-value, which is nothing more than the same evidence of our eyes unfortunately quantified. I say unfortunate, because the unnecessary quantification gives more evidentiary weight than is due. The X/Y observed correlation has been given a number, and numbers are easier to believe and to reify than anecdote.

And there is where the story usually stops, until the *peer-reviewed* paper^{1} is published, where it catches the eye of one of my dear readers, who dutifully forwards it on to me, wherein I in vain point out once again that academics should be penalized for publishing more than one paper per decade.

It is surely true that, for the observed experiment, X and Y are correlated, and thus causation is implied or suspected. Which is why we have to wait for replication. The statistical model that claimed X and Y are mates can be, and should have been, turned into a predictive model (all statistical models can and should so usually be turned). And that predictive model should have been compared against new data, data never before seen or used in any way to form the original model. Only then can we know if the observed correlation is aberrant or lasting and real.

Problem is, the academic who published the paper can’t be bothered to wait for new data: he is already out in the field on the hunt for new correlations, the curiouser or the more at odds with common sense the better. Find them he will, and so too will his brother and sister academics, who will flood the journals with their research.

This phenomena will be noted by civilians who will form the opinion, wrong as we now know, since every time spurious correlation is found causation is claimed by some academic, that correlation does not imply causation.

The whole thing is rather depressing.

*Update* Clearly there is more to be said about causation and correlation. We barely scratch the surface here.

——————————————————————————————

^{1}“The tendency of startled legumes to spontaneously form images of sainted politicians,” *Journal of Psycho-Social Academic Research*, vol. 72, pp 513-589.