This post is one that has been restored after the hacking. All original comments were lost.

Everything that can go wrong with a time series analysis has gone wrong with the post “Recent global warming trends: significant or paused or what?” at Real Climate. So many classic mistakes are made that I hesitate to show them all. But it’ll be worth it to do so. Be sure to read to the end where I ascribe blame.

The model is not the data

Here is the author’s Figure 2, which is the “HadCRUT4 hybrid data, which have the most sophisticated method to fill data gaps in the Arctic with the help of satellites”. Keep that “data gaps” phrase in the back of your mind; for now, let it pass.

Fig. 2 from Real Climate

The caption reads “Global temperature 1998 to present” and (from Fig. 1) “monthly values (crosses), 12-months running mean (red line) and linear trend line with uncertainty (blue)”.

Supposing no error or misunderstandings in the data (for now), those light gray crosses are the temperatures. They are the most important part of this plot. But you can’t tell because the data has, in effect, been replaced by a model. Two models, actually, both of which because they are so boldly and vividly colored take on vastly more importance than mere reality.

The data happened, the models did not. That blue line did not occur; neither has the red line anything to do with reality. These are fictions; fantasies; phantasms. The red line claims nothing; no words are devoted to it except to announce its presence; it is a mystery why it is even there. It is a distraction, a visual lie. Well, fib. There is no reason in the world to condense reality in this fashion. We already know how reality happened.

The blue line is an animal of different stripe. It seems to say something about a trend.

A trend is not a trend is not a trend

Look only at the crosses (which is very difficult to do). Has the as-defined-above global temperature increased since 1998? Yes. That is to say, no. Rather, yes. Well, it depends on what is meant by increased.

I’ve talked about this dozens of times (see the Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series for a terrific example), but there is no mystery whether or not a given set of (assumed-error-free) data has or hasn’t a trend. To tell, two things must be in place: (1) a definition of trend or increase and (2) a check to see whether the definition has been met.

There is no single “scientific” definition of an increasing trend: there possibilities are legion. One might be that the data during the second half of the time period has a higher arithmetic mean than the first half. Another is that the last point in time is higher than the first point. Another is that there are more values in the second half (last quarter, or whatever) higher than some constant than in the first half (quarter, etc.). It could be that each successive point must be equal to or greater than the previous points. And there are many more possibilities.

It doesn’t matter which definition you pick, though what you pick should relate to the decisions to be made about the data: once the definition is in hand all you have to do is look. The trend will be there or it won’t; i.e. the criterion implied by the definition will have been realized or it won’t have been. That’s all there is to it.

In particular, no “tests” of “statistical significance” need or should be announced. The trend will or won’t be there, full stop; indeed, a statistical test at this point is dangerous. It is apt to mislead—as it has misled the author of the graph.

The model is not the data

You see the blue line. Accompanying it are two light-blue curves. What could those be? The line itself we know is a fiction. It is what did not happen. The crosses happened. The blue line is a “smoother”, in this case a regression line. Its purpose is to replace the data will something which is not the data. Why? Well, so that thing-that-did-not-happen can be spoken of in statistical language, here a grammar of obfuscation.

We’ll get to the light-blue curves, but first examine the title of the plot “Trend: 0.116 +/- 0.137oC/decade 2σ”. This seems to indicate that the author has decided on a—not the—definition of a trend and discovered its value. That definition is the value of the parameter in a simple linear regression with one parameter as an “intercept”, another attached to time as a linearly increasing value, and a third for the spread (the σ). The parameter attached to time is called “the trend”. (Never mind that this trend changes depending on the starting and stopping point chosen, and that good choices make good stories.)

Here’s where it becomes screwy. If that is the working definition of trend, then 0.116 (assuming no miscalculation) is the value. There is no need for that “+/- 0.137” business. Either the trend was 0.116 or it wasn’t. What could the plus or minus bounds mean? They have no physical meaning, just as the blue line has none. The data happened as we saw, so there can not be any uncertainty in what happened to the data. The error bounds are persiflage in this context.

Just as those light-blue curves are. They indicate nothing at all. The blue line didn’t happen; neither did the curves. The curves have nothing to say about the data, either. The data can speak for themselves.

The author on some level appears to understand this, which causes him to speak of “confidence intervals.”

Have no confidence in confidence intervals

(Note: it is extremely rare that anybody gets the meaning of a confidence interval correct. Every frequentist becomes an instant Bayesian the moment he interprets one. If you don’t know what any of that means, read this first. Here I’ll assume the Bayesian interpretation.)

The light-blue curves and the plus-or-minuses above had to do with confidence intervals. The author, like most authors, misunderstands them. He says

You see a warming trend (blue line) of 0.116oC per decade, so the claim that there has been no warming is wrong. But is the warming significant? The confidence intervals on the trend (+/- 0.137) suggest not — they seem to suggest that the temperature trend might have been as much as +0.25 oC, or zero, or even slightly negative. So are we not sure whether there even was a warming trend?

That conclusion would be wrong — it would simply be a misunderstanding of the meaning of the confidence intervals. They are not confidence intervals on whether a warming has taken place — it certainly has. These confidence intervals have nothing to do with measurement uncertainties, which are far smaller.

Rather, these confidence intervals refer to the confidence with which you can reject the null hypothesis that the observed warming trend is just due to random variability (where all the variance beyond the linear trend is treated as random variability). So the confidence intervals (and claims of statistical significance) do not tell us whether a real warming has taken place, rather they tell us whether the warming that has taken place is outside of what might have happened by chance.

This is a horrible confusion. First, “significant” has no bearing on reality. If the temperature, for instance, had increased by (say) 20 degrees, that would have been significant in its plain-English sense. It would have been hot! Statistical significance has no relation to plain-English significance. In particular, many things are statistically “significant” which are in actuality trivial or ignorable. Statistical significance is this: that a certain parameter in the model, when input into an ad hoc function set equal to a predetermined value, produces a p-value smaller than the magic number. Significance thus relies on two things (1) a statistic, many of which are possible for this model, and (2) a model. Two statistics in the same model can produce one instance of “significance” and one instance of “non-significance”, as can simply switching models.

Here the author decided a linear regression trend was the proper model. How does he know? Answer: he does not. The only—as in only—way to know if this model is any good is to use it to forecast values past 2014 and then see if it has skill (this is a formal term which I won’t here define). To prove he is fiddling and not applying a model he has deduced, look into his article, where he applies different models with different starting dates, all of which give different blue lines. Which is correct? Perhaps none.

Second, the author says that a trend “certainly was” in the data. This is true for his definition of a trend (see the Netherlands post again). It isn’t true for other definitions. But the author needed no statistical test to show his version of a trend obtained.

Third, the real error is in the author’s failing to comprehend that statistical models have nothing to do with causes. He claimed his test was needed to rule out whether the data was caused by (or was “due to”) “random variability“. This term is nonsensical. It quite literally has no meaning. Randomness, as I’ve said thousands of times, cannot cause anything. Instead, something caused each and every temperature datum to take the value it did.

Don’t skimp your thinking on this. Prove to yourself how bizarre the author’s notion is. He drew a straight line on the data and asked whether “random variability” caused the temperature. Yes, it did, says his statistical test: his test did not reach statistical significance. Even the author said the blue line was a chimera (he didn’t erase it, though). He asks us to believe nothing, because randomness is not a physical thing, caused the temperature.

Now another model, or another statistic inside his model, might have produced a wee p-value, which would have rejected the “null hypothesis” that nothing caused the data (as it did in his Fig. 1). Very well. Suppose that was the case. What then caused the data if it wasn’t “random variability”? The statistical model itself couldn’t have. The straight line didn’t. Physical forces did. Does anybody anywhere believe that physical forces are causing the data to increase at precisely the same rate year on year, as in a straight line? Answer: no, that’s bizarre.

So, significance or not, the statistical model is useless for the purposes to which the author put it. The only proper use is to forecast new data. The model then says, and only says, “Given my model, here is the uncertainty I have in future data.” We can then check whether the model has any value.

If the author believes in his creation, I invite him to put his money where his model is. The Chicago Mercantile Exchange deals in heating and cooling degree day futures (which are simple functions of temperature), then he can make a fortune if his model really does have skill.

But before he does that, he’ll have to do a bit more work. Remember those “data gaps”?

Mind the gaps

Suppose global average temperature (GAT) where defined as “The numerical average of the yearly average values at locations A, B, …, and Z”. This is comprehensible and defensible, at least mathematically. Whether is has any use to any decision maker is a question I do not now answer except to say: not much.

As long as locations A-Z, and the manner at which the temperatures were computed at each location remained constant, then nothing said above need be changed one whit. But—and this is a big but—if the locations or manner in which the measurements were taken, we must necessarily become less certain than we were before. That “necessarily” is inescapable. Something like that is the case here. The HadCRUT4 data are not constant: locations change as do the way the measurements are taken (the algorithm used to produce the measurements has changed, and more).

For example, suppose one of the locations (say, D) dropped out this year. That makes any comparison with the GAT this year with previous years impossible. It’s apples and oranges. It’s like, though in a smaller way, saying last year the GAT used locations Cleveland and Vera Cruz and this year only Vera Cruz.1 Hey. We never said how many locations we had to have, right?

Well, we might estimate what the temperature was at D before we form our GAT. That’s acceptable. But—and this is where the bigness of the but comes in—we have to carry forward everywhere the uncertainty which accompanies this guess. We can no longer say that this year’s GAT is X, we must say it is X +/- Y, where the Y is the tricky bit, the bit most authors get wrong.

To guess the temperature at location D requires a statistical model. That model will be some fancy mathematical function with a parameter (or parameters) associated with temperature. We don’t know the value of this parameter, but there are techniques to guess it. We can even form a confidence interval around this guess. And then we can take this guess and the confidence interval and use it as the proxy for D—and then go on to compute the GAT, which is now X +/- Y.

Sound good? It had better not, because it’s wrong. Who in the world cares about some non-existent parameter! We wanted a guess of the temperature at D, not some lousy parameter! That means we have to form the predictive confidence (really, credible) interval around the guess at D, which is also necessarily larger than the interval around the guess of the parameter. That larger interval can be plugged into the formula for the GAT, which will produce this year (again) an X +/- Y.

The crosses aren’t the data after all

That means those crosses, because of the way the HadCRUT4 hybrid data were stitched together, aren’t the data like we thought they were. Instead of crosses, we should be looking at fuzzy intervals. We are not certain what the value of the GAT was in any year. Adding in this uncertainty, as is or should be mandatory, would make the picture appear blurry and unclear—but at least it would be honest.

Above I said the uncertainty we had in the GAT must be carried forward everywhere. I meant this sincerely. The author of the blue regression line thus cannot be sure that his value of a trend really is real. Neither can anybody. The author’s confidence intervals, which are wrong anyway, are not based on the true and complete uncertainty. And that means his model, if he does choose to use it to predict new data, will have prediction intervals that are too narrow.

And that means that he’s much more likely to lose his money, which I’m sure he’ll be putting in HDD futures. (Didn’t James Hansen make a fortune on these, Gav?)

Ego te absolvo

Why pick on this article? Well, it is one of many all of which use and compound the same statistical slip ups. The point, ladies and gentleman, is that bad statistics have so badly skewed our view of reality that our dear leaders have turned this once scientific field into yet another political playground.

The blame for this boondoogle lies with—wait for it—me. Yes, me. It is I and other professional statisticians who are responsible for the gross misunderstandings like those we saw above which plague science. I cannot blame Real Climate. The author there really did think he was doing the right thing. Mea maxima culpa. I absolve the author.

Our textbooks are awful; the errors which you see nearly everywhere are born there and are allowed to grow without check. Professors are too busy proving yet another mathematical theorem and have forgotten what their original purpose was; and when you ask them questions they answer in jargon and math. Probably the more egregious fault is how we let students escape from our classrooms misunderstanding causality. We really do write blush-worthy things like “due to random chance” or “the result wasn’t statistically significant so A and B aren’t related.”

How to fix this mess is an open question. All suggestions welcomed.


1The reason your mind reels from this example is that you understand that the temperatures, and the physics that drove those temperatures, are different in nature at those two cities. The problem with statistical modeling is it encourages you to cease thinking of causality, the real goal of science, and in terms of ritual. Wave this mathematical wand and look at the entrails of the data. See any wee p-values? Then your faith has been rewarded. If not, not.