Correlation And Coincidence

A hastily written entry today, folks. Busy day. Typos at no extra charge.

Reader Al Perrella invites us to Jerry Pournelle’s Chaos Manner, where a note from our friend Mike Flynn appears (get his books here) on correlation versus causation. A subject not unfamiliar to us (see this, this, and this among others).

Mike has an excellent example of correlation:

Columbia river salmon runs go up and down in roughly eleven year cycles. So do sunspots on the sun. Do sunspots cause salmon? Do salmon cause sunspots? Is there a lurking Z that makes salmon eager to spawn AND causes the sun to boil?

Any statistical analysis, Bayesian or frequentist, would call this data “significant”. A plot would show a coincidence of lines, one for the salmon, one for sunspots. Maybe somebody would figure that correlation was maximum when the peak of sunspots occurred two years before salmon numbers did, or whatever. P-values would be proffered. Theories would be created. Papers published.

But is it true that the quantify of salmon flesh influences sunspot number?

Well, the wee-ness of the p-value says nothing about the direction of the causation. Besides, if the data were ordered such that we thought sunspots caused salmon flesh, the statistical conclusions would be much the same, or even identical (depending on which classical test one used).

Point is, the ordering of Xs and Ys, as it were, must come from evidence outside the data. There is nothing in salmon numbers nor sunspots to tell us which is the cause and which the effect. But that’s okay, because in any statistical analysis there is no internal evidence which says, “Consider only this list of causes.” This is an inescapable limitation of probability.

Same thing in logic. I’ve used this example many times, but it’s still apt. If we’re interested in the proposition (or conclusion), “George wears a hat,” then we must have premises (evidence) which supply information about it. It is up to us and not logic to provide these premises (evidence).

If our (compound) premise is “All Martians wear a hat & George is a Martian” then we know, conditional on this evidence, that the probability George wears a hat is 1. George’s being a Martian causes him to wear a hat. There is a correlation between Martians and hat wearing. It is not a coincidence George wears a hat.

But if I, or you, change the premise to “Some Martians wear a hat & George is a Martian” then it becomes uncertain—unquantifiably so; see this—whether George wears a hat. The point being it is we who supply the evidence.

In regression, with some Y and list of Xs, it is we who say “Stop!” when delineating all the possible correlates of Y. It is we who compile the Xs. Did we get just the right ones? Well, sometimes, rarely, we know: say, in physics. But usually we do not know—where I use the word “know” to indicate certainty, a truth, and not a suspicion.

The reason we sometimes know in physics is because we have external evidence, a theory, which we accept as true (and which might not be, of course). If the theory is true—given we accept it as true, as we accepted either Martian premise—then we know just which Xs to collect.

Problem with data like salmons and sunspots is we have no theory we’re willing accept as true which provides us with the precise list of Xs and Ys we can use to infer causation. We’re probably willing to accept that salmon flesh does not cause sunspots—but understand that has not been proved true; it has merely been assumed true. That is the point. All premises are assumed true (there is a deeper sense in which some premises we say are “just true”, but that’s rare and a subject for another day; when I discuss miracles).

So we instead accept it might be true that sunspots causes salmon flesh. Or, as Mike suggested, we instead assume sunspots set some causal train in motion, the end result being a change in salmon flesh. Either set of premises we accept as true as we present our analysis. The data, however, say nothing about which if either sets of premises are true. Think about it: if the premises were known to be true from the data, then we’d have a circular argument. All we can infer from “George wears a hat” is the premise “George wears a hat”, which is true if we accept George wears a hat. Circularity.

The point Mike made, and which I hope I echoed properly, is fundamental. It is the difference between considering the salmon-sunspot signal a coincidence or part of a causal chain. It is a coincidence only if we assume there is no causal connection. We have not proved the lack of one by calling events coincidental.

Update Flynn sends this Dilbert and this relevant paper (pdf).

Comments

Correlation And Coincidence — 18 Comments

  1. This is a good example of a bad example: Specifically, why nobody should indulge themselves in analyzing things for which they lack any comprehension of the applicable physical processes involved in generating the things being measured. Such ignorance invariably leads to stupid outcomes.

    It is a plain fact of human nature to see correlations and then presume some cause-effect relationship. Its this fundamental human trait that leads to analytical oversimplifications–including the belief in lucky charms & so forth (e.g. last time I wore this shirt to the game my team won…so if I wear it again that will help them).

    The problem is, this same human feature leads people who ought to know better–and often those that pride themselves into knowing better–making essentially the same mistake of analytical oversimplification, consider:

    “The point … is fundamental. It is the difference between considering the salmon-sunspot signal a coincidence or part of a causal chain.”

    There is also a third (at least) possibility — that such a correlation is the result of salmon behavior (or any observed measure) being a proxy measure for something else that moves in tandem, etc. Which is to say that, under situational conditions for which we are ignorant at the moment and which may not (usually do not) persist for various reasons, are creating a transient cause-effect relationship that is fundamentally insubstantial.

    Which is to say from some perspectives it is a valid cause-effect relationship insofar as it has been measured but it is also a coincidence as those cause-effect relationships are really subordinate to other causal factors, which are really transient.

    This sort of thing is actually observed very often, and in many systems run by competent managers who actually understand the physics & underlying mechanisms (e.g. run raised thru the ranks & having real-world experience at all levels involved) in such cases they quickly adjust & no issues appear to arise. In comparable situation run by brilliant freshly-minted MBAs (these are management equivalent of philosophers elsewhere–often working in various political positions), one commonly encounters the well-paid consultant exploiting the situation–not only to fix it, but to preserve the image of the MBA hiring him/her to helf fix it.

  2. A physical theory is not evidence as such. The observations that caused us to not reject the theory is evidence, the theory is the way we think the causality works.

  3. @Ken: I’m not sure if you’re bringing up what I’d number as a fifth possibility or not. Let’s use A to stand for “an increase in the number of Salmon swimming upstream” and B to stand for “an increase in the number of sunspots”. I’d say the first four possibilities that come to my mind are: 1) A causes B, 2) B causes A, 3) X causes A and B, 4) A and B are coincidental. Where “causes” can include indirectly via C, D, E, etc.

    In that case, I think your proxy idea is actually case #3, but somehow it sounds like something different. I’d like to see more.

  4. Wayne: Your four possibilities are the first four I gave in the linked item at Chaos Manor. The others are riffs on #3. I cited the salmon/sunspots business as an example not necessarily of coincidence but only as an example of the necessary correlation of two data series that happen to have the same pattern during the same time frame. But it might be an example of #3: some lurking variable that causes sunspots may lead by sundry and subtle means to an increase in the salmon run. For example, by messing with the earth’s magnetic field in some way which might affect the salmon’s homing sense.

    But note, as Matt pointed out, that this must come from information outside the sunspot/salmon data. Solar wind, or Ap field strength, or some other sort of thing.

    See here for some commentary on possibility 3:
    http://www.claremontmckenna.edu/pages/faculty/MONeill/Math152/Handouts/Joiner.pdf

  5. I cited the salmon/sunspots business as an example not necessarily of coincidence but only as an example of the necessary correlation of two data series that happen to have the same pattern during the same time frame.

    Mr. Flynn,

    It’s obvious that one can’t declare a meaningful correlation between salmon runs and sunspots simply base on the one claim that they both have an approximate 11-year cycle. So do the two data series actually have the same pattern during the same time frame? To what extent are they correlated? If there is a strong correlation, the question of whether there is a causal relationship might be worth further investigation. That is, it would then make sense to ask whether there is a causal relationship.

  6. @JH
    As I recollect, the two cycles went up and down in synch at the time I read of the example, which is why they are necessarily correlated. Two coincident cycles will always correlate. (Ditto, two coincident trends, like imported automobiles and women in the workforce, at least up to about 1990.) Given that the previous Schwabe cycle and the current one are atypical, with a long “fallow” period between them, it might be useful if someone revisited the matter.

    But only if you think there might be something to it. The Kultur determines which hypotheses are worth testing.

  7. So do the two data series actually have the same pattern during the same time frame? To what extent are they correlated?

    Dear JH,

    Very perceptive! The answer: there is no discernible correlation between sunspots and Columbia River salmon. Check for yourself:

    http://www.fpc.org/adultsalmon/adultqueries/Adult_Annual_Totals_Query_form.html

    I’ll send Briggs a graph of the data, and maybe he can post it.

    All of which makes the example kind of crappy, although the lesson in logic is still valid.

    Uncle Mike

    PS – If you want to read some (poorly written) rants on the crapastic nature of politically correct (scientifically wrong) salmon alarmism, please visit http://westinstenv.org/nftsf/

  8. Salmon populations as a “proxy measure” led to our first understanding of the Pacific Decadadal Oscillation. Certain salmon populations are extremely sensitive to temperature being within a tight band at the time of transition to salt water. A sun spot role in water temperature, river flow, primary productivity “may” be evidence of a “causal train”?

  9. Yes, it takes a minimum on three variables to determine cause. An experiment is the easiest way to get a third variable.

    When stuck with observational data,however, I see nothing technically wrong with using something like a Salmon/Sunspot correlation to proxy one or the other. If it’s a false correlation it will eventually become evident.

    The problem as I see it it using the correlation to 1) establish a fact (like salmon cause sunspots) and act accordingly or 2) use it as a means for controlling one or the other or people. Examples of (2) abound in the health fields: salt causes high blood pressure, cholesterol causes heart attacks; smoking causes lung cancer**. Outside of health: CO2 is a primary cause of Global Warming. Etc., etc.

    The impact of assuming a causal relationship when one doesn’t exist should be a guide but too often it is not. Worse, the consequences of not doing so get exaggerated by those with agendas*** either personal or monetary. When it becomes political, it also becomes harder to overcome misconceptions when the current causal relationship guess gets modified. Shortly after I turned 40, m optometrist told me I was going to need bifocals BEFORE taking any measurements.

    ** There is indeed a 20x increase in the odds if one smokes but what is often glossed over is that the chances of NOT getting it are roughly the same for smoking or non-smoking because the probability of getting it is rather small. Buying 20 lottery tickets will improve one’s chances of winning but the chances of NOT winning hardly change at all.

    *** Interesting word “agenda”. It’s a Latin plural used as a singular (like scissors, another plural as singular). According to the OED its plural is “agendas” yet there is no English plural of “data” which has the same Latin construct as “agenda”. English is such an easy language.

  10. Dear Patrick and DAV,

    But but but but salmon abundance does NOT correlate with sunspots. Correlation is not causation, but in this case there is NO correlation — and therefore no causation either, interestingly enough.

    Evidently “salmon correlate with sunspots” must be some sort of urban myth dilettante scientism. It smacks of Popular Science — you know, stuff that sounds like science but isn’t really.

    However, people being people, there is a tendency to accept baseless claims that sound like science because to do so satisfies our excitable imaginations. And witch burnings are great social gatherings where you can network.

    This wonderful website is special because it celebrates logic and rationality. Let’s keep it that way.

  11. But but but but salmon abundance does NOT correlate with sunspots.

    OK, fine. It’s a side issue not intrinsic to the topic of the post. Pretend it does or substitute another example.

  12. Tyler Cowen just posted about a paper looking at language causing prudence.

    I suppose the best you could say is that apparently the paper wasn’t a data dredge, given that he picked something with a plausible connection and ignored the randomness.

  13. One of the markers of a non-causal correlation is that as new data accumulate the correlation breaks down. The past two sunspot cycles were atypical and whatever spurious connection obtained fifty years ago has likely not survived.

  14. Uncle Mike! How’ve you been?

    I’m sure you’re right about the data, but as DAV said, swap in any other example you like.

  15. Dear Matt,

    One of my favorite spurious correlations is the Super Bowl-Stock Market game discussed in

    http://wmbriggs.com/blog/?p=3430

    It almost seems logical: that somehow the Super Bowl would boost investor confidence and result in a Dow Jones uptick. But it isn’t.

    My pet peeve is spurious (and usually political) pseudo-environmentalism. So I guess I overreacted to the salmon-sunspot correlation superstition.

    As a general rule to help the uninitiated: among non-sessile animals (the types that move around) population dynamics are governed by predator/prey relations. It’s an eat-or-be-eaten-world out there.

    Stuff like sunspots, weather, climate, habitat (ugh) have little or nothing to do with animal (salmon, owl, deer, muskrats, etc.) population dynamics. Animals move around and are adaptable to many conditions, from deep winter to hot summer. They are not significantly affected by sunspots, anymore than you and I are.

    In the case of salmon, what they eat and what eats them are the critical factors. The price of tea in China and other oddball time series have nothing to do with it.

  16. Uncle Mike,
    Your remarks were uncalled for.
    I made no assertion that salmon were correlated with sun spots–note the question mark- and I have not personally seen such a cycle in salmon population data. I was simply thinking if a correlation existed where I would look first in searching for a possible causal link.

  17. Dear Uncle Mike, long time no hear. Your point is well taken, and I don’t think you overreacted.

  18. Uncle Mike,

    Why wouldn’t weather have some influence? When it’s dry there are fewer mosquitoes where I live and things that like mosquitoes are likely to search elsewhere. Other species migrate for better weather. Canadian geese used to but seem to have taken up permanent residence along the Chesapeake. Sometimes they’re worse than the mosquitoes.

    So the correlation between sunspot cycles is weak however I’ll bet most cycles in wildlife populations are ultimately driven by energy availability.
    It is improbable that sunspots per se cause anything but something must be driving weather changes. Seems to require energy and the closest suspect is the Sun. Sunspots might just be a clue to its temperament.

    Pray tell, though, why a sunspot-salmon link (or lack of one) is a hot button for you.

    I’ve just discovered the nifty expansion tool in the bottom right of the edit box but once it hit its max size it disappeared. Strange.