What Is A “Statistically Significant Trend”?

Longtime reader Nate Winchester found a discussion—among the many, many—of global warming data revolving around statistics, from which we take the following snippet:

You just do the statistics on the data. If you calculate trends over a short period you don’t get statistically significant trends, and over a longer period you do get statistically significant trends. This is true for almost any real life data, and how long it takes for trend to show up over short term non-trended variation will depend on the data.

In the case of global temperature anomalies, it turns out that the trends in temperature become statistically significant over scales of roughly 15 to 20 years or more, and lack significant trend over shorter scales. That’s just a description of what global anomalies are doing.

Where this quotation originated is not important; probably you can find one nearly identically worded at any site in which the subject of climate change arises. But it is a useful comment, because it betrays a standard misinterpretation of statistics which we can here put right.

Suppose in front of you is a picture of a number of dots, one per year arranged sequentially, each dot representing, say, a temperature. Obviously—yes, truly, obviously—those temperatures, assuming they were measured without error, came from somewhere. That is, something, some physical process or processes, caused the temperatures to take the values they did.

They did not appear “randomly”, if by use of that word you mean some vague and mysterious metaphysical engine (run by quantum gremlins?) which spit the temperatures out for humanity to discover. But if by that word you merely mean that you do not know or do not understand what physical process caused the temperatures, then you speak intelligently.

Our second supposition requires us to weakly anthropomorphize either all, or individual portions, of the dots. You have to squint at the collection and say to yourself, “Say, if I draw a straight line running amidst the dots between year A and year B, most of those dots will lie close to the line, though only very few will touch the line.” You are allowed to draw various lines through the dots, some pointing upwards, some downwards, as long as all the lines connect head to foot, starting at the first year and ending at the last.

Once done, you can reach into your bag of statistical tricks and then ask whether the lines you have drawn are “statistically significant.” The first step in this journey to amazement requires you return to the word “random” and invoke it to describe the behavior of the dots not lying on the line. You have to say to yourself, “I know that nature chose to make the temperatures lie on this line. But since they do not lie on the line, only close to it, something else must have made the dots deviate from the line. What this cause is can only be the normal distribution.”

In other words, you have to say you already know that nature operates in straight lines, but that something ineffable steers your data away from purity. The ineffability is supplied by this odd who-knows-what called the normal distribution, the exact nature and of motivations of which are never clear.

Another thing that isn’t quite clear is the slope of the line you drew. It is a line, though; in that you are certain sure. But perhaps the line points not so nearly high; rather, it might lie flat. Must be a line, though. Has to be. After all, what else could it be?

Now, with all these suppositions, surmises, and say-whats in hand, you feed the dots into your favorite statistical software. It will churn the dots and compute a statistic, and then tell you—the whole point of the article has now come upon us, so pay attention—it will tell you the probability of seeing a statistic larger than the one you actually got given your line theory and your ideas about randomness are faultless (I ignore mentioning infinite repetitions of data collection).

If this probability is small, then you are allowed to say your line is “statistically significant.” Further, you are allowed to inform the media of this fact, a tidbit for which they will be grateful.

Of course, saying your imaginary line(s) are “statistically significant” says nothing—not one thing—about whether your line(s) are concrete, whether, that is, they describe nature as she truly is, or whether they are merely figments of your fervid imagination.

The best part of this exercise, is that you can ignore the dots (reality) entirely.

1. Strick says:

Another example of how with very little ingenuity you can produce meaningless statistics and stay ahead of the publish or perish demon.

I’m a bit more interested in the other side of the equation in that quote these days. I’m being told that the apparent decline in temperatures over more than a decade is irrelevant because there are too few samples to be “statistically significant”. It’s a sort of safe harbor from the facts like the old peer reviewed argument.

I know you already wrote on that, but I’d still like a short response, clear to even the most dense climate alarmist, that puts this all in perspective.

2. Luis Dias says:

A very good summary of a little problem of intelligence being spread in the internetz. Thank you for that 😉

3. Morgan says:

To be fair, most of the test statistics assume autocorrelation in the temperature time series (see…

…for an example), but I think your main point stands – the test statistic is the result of a presumed model, the adequacy of which we don’t really know.

4. JH says:

If you calculate trends over a short period you donâ€™t get statistically significant trends, and over a longer period you do get statistically significant trends.

Anything could happen! Letâ€™s use this data set http://www.massey.ac.nz/~pscowper/ts/global.dat to show that this statement is incorrect. Read the data into R.
www =”http://tur-www1.massey.ac.nz/~pscowper/ts/stemp.dat”
temp =scan(www); x.ts=ts(temp,start=1850, freq=12)

Now, extract two subsets of the time series, and fit a simple linear trend model to each series. (Use 0.05 significance level)
x.short=window(x.ts, start=c(1940,1), end=c(1965,12)) significant
x.longer=window(x.ts, start=c(1940,1), end=c(1972,12)) non-significant
Or
x.short=window(x.ts, start=c(1946,1), end=c(1951,12)) non-significant
x.longer=window(x.ts, start=c(1946,1), end=c(1955,12)) non-significant

I might have mistakes in my R script though!

[H]ow long it takes for trend to show up over short term non-trended variation will depend on the data.

This says nothing about anything. Just to be honest.

5. Doug M says:

There is a subset of finance guys, technical analysts, who look at the wiggles of the data in price histories. They take out their rulers and draw trends, channels, pennants and triple tops. They have complex rationales that explain how different market psychologies will lead to each of these patterns. If you understand the mood of the market, you can successfully time short term-fluctuations. These guys are generally regarded as the witch doctors of the finance community.

To disprove these methods, finance professors use random number generators and show that all of these patterns will occur with a random number generator driving them.

Start with your fictitious XYZ corp. at \$100. Flip a quarter, if it is heads, say that XYZ was up \$1 that day, if it is tails XYZ is down \$1. 250 flips later you have a price history for XYZ that jigs up and back down, and likely finishes outside the range [95,105]. If calculate a least square fit regression of XYZ price to time you will probably get a statistically significant fit. If you don’t, then add another 250 flips to the data.

6. DAV says:

Fitting a straight line to cyclical data is mostly an exercise in futility — assuming you want to find something useful. If the data are cyclic then fits to shorter segments are closer to reality. This doesn’t mean there aren’t dc components and biases. It’s just much harder to prove you are actually observing it with a statistical fit.

Autocorrelation is supposed to handle that but does it?

7. Imagine you have two points in time, say the temperature reading this morning and this afternoon. Connect them with a straight line. The line either goes up or down (if they are exactly the same and the line is horizontal, you made a mistake and must go read the thermometer again).

Is the trend “significant”? Why not? Seems to me it is perfectly significant. It got hotter (or cooler as the case might be). Maybe things thawed. Maybe things froze. Whatever happened, it is significant. i.e. meaningful. You can make an inference (i.e. “Man, it sure got cold in a hurry” or whatever).

Two points. All you need. In fact, more than two points just confuses the issue.

8. Doug Proctor says:

What you are saying here is that there is no “real” certainty, that in the 1 in X cases the reverse is “actually” going on but the data distribution does not clearly reflect this situation. Granted, in an academic way. This is a problem for individual studies or datasets; other non-related studies are useful for verifying the reasonableness of such a conclusion, as they (presumably) come from different and non-connected events caused by the underlying phenomena. So while germane to a specific study, the lack of certainty is not applicable to a multiple of similar and non-similar but same-causation studies.

The question of statistical significance has to do with magnitude. If, as has been suggested, the global temperature “anomalies” for the past 15 years are not statistically significant from one another, then the conclusion that global warming is consistent and dominated by anthropogenic CO2 is false. Hansen says the temperature has been rising for the past 15 years. Jones says the data does not show a change that is statistically significant. Both can be right, though Hansen should not claim the data supports his claim with statisitical proofs. Jones does not refute Hansen’s claim of global warming occurring, but only that Hansen’s claim cannot be backed up by the rules of certainty within the mathematical community.

We are not concerned about the details of the shape of the temperature rise, only the long-term magnitude and indications of an acceleration (on the +90 year level). Cyclic behaviour in an overall rising situation still has the rise we are told to be worried about. CAGW requires an underlying trend of > .20K/decade. Any less is a continuation of pre-industrial, “normal” trends. Considering the recent alarm of 4K by 2050 expressed at the Cancun meeting, an acceleration of annual temperature rises has to be seen in the data for the imminent disasters to materialize. Does the current data trends – even if not technically defensible at this time – support extra-ordinary heating of the planet? That is what we want to see. If so, then we’d be advised to look closer at the situation. If not, then a slower, more considered response is appropriate.

So far, to get away from the scholarly issues and into the course of human affairs, I’d argue, there is no evidence that the CAGW hypothesis has been validated. But denigrating what evidence there is on the basis of academic rules is not helpful. We live our lives within cycles and in anticipation of trends we see only dimly.

The key here is that we are still looking at trends that are not clear, that are not at a level of harshness where even reasonable error still leaves us drowning or broiling. Only reasonable error on the POSITIVE side brings us into areas of trouble, which the same levels of reasonable error leave us looking at an approaching ice age. It’s as if we are driving down the street facing a green light that might turn yellow, or maybe even red. Do we pull off to the side, jump out of our car and hide behind a tree to avoid an imminent car crash (that we can’t see approaching the intersection)? Of course not. But we do watch for a change in light and traffic.

9. DAV says:

Doug Proctor,

The problem is, for the last 25 years, we’ve been asked to jump out of the car ; hide behind a tree and junk it because some see the approaching traffic and the light hasn’t yet changed.

10. Ray says:

If you fit a straight line to a finite segment of data, there is always a trend even if there is none in the process that generated the data. This is easily demonstrated by a thought experiment. Suppose you flip a coin. In the long run you should have 50% heads and 50% tails. However if you flip the coin 100 times you would expect an excess of 10 heads or tails i.e. 55 heads and 45 tails or vice versa. This has been known since the time of Bernoulli. You have a trend in the finite segment of data even though no trend exists in the process that generated the data.

11. JH says:

Ron Number,

I like your comment because it brings up a very important concept. Instead of writing up explanations, I found the site below for you.
http://www.creative-wisdom.com/computer/sas/df.html

It presents the concept of degree of freedom and explains why the perfect fit of the two-point case youâ€™ve described isâ€¦hmmâ€¦ not useful. A perfect fit is usually associates with poor generalization and prediction results.

12. JH says:

Ray,
It’s correct that the chance of getting 50 heads (1s) and 50 tails (0s) in 100 tosses is small. However, whether there is a linear trend over time and whether the trend is statistically significant will depend on the resulting sequence of the tosses.

13. Doug M says:

Ray, JH,

It is actually quite possible to have have a sequence of flips such that the endpoints are flat, but the data shows a trend.

If you have a streak of heads early in your flipping, and slightly more tails than heads over the rest of the sequence, you could finish with heads – tails = 0. A surpluss of heads to tails at every point in between and a ‘statistically significant’ trend toward tailiness.

14. JH says:

Doug M,
Seriously, running an ordinary linear regression of {1s, 0s} on the variable â€œtimeâ€ is not something I would recommend…

15. I have a question for the statistical gurus.
In your tutorial you invoke the normal distribution around the line to test for significance. I wonder is this assumes that the errors in the data are random, that is, there is no bias in the data. Back in first year Physics class we learned that errors can be random or systematic. In the application of correlations of temperature data, the data have bias due to urban heating at some sites. Anthony Watts has revealed many of these sites over the last few years as well as the use of a geographical averaging where temperature sensors are missing.
Given the global temperature anomalies are small, how can a statistical correlation of global temperatures be called a statistically significant tend without eliminating the systematic errors?

16. GoneWithTheWind says:

I have just discovered a statistically significant trend that predicts the end of the world. Today I got up before dawn and went down to the freeway to watch the traffic. The traffic flow increased exponentially at an unsustainable rate. So at 8AM I ran home to warn everyone that at this rate there will be 27 billion humans by 2020. The data is clear the trend is there for all to see.