It came out at my recent meeting that a compilation of everything wrong with time series was in order. A lot of this I’ve done before, and so, just as much for you as for me, here’s a list of some of the posts I found (many probably hiding) on this abused subject.
A Do not smooth time series, you hockey puck! Used to be scientists liked their data to be data. No more, not when you can change that data into something which is more desirable than reality, via the ever-useful trick of smoothing. Link.
B Do not calculate correlations (or anything else) after smoothing data. No, seriously: don’t. Take any two sets of data, run a classical correlation. Then smooth both series and re-run the correlation. It will increase (in absolute value). It’s magic! Link.
C Especially don’t calculate model performance on smoothed data. This is like drinking. It’s bound to make your model look better than she is. Link.
D Time series (serieses?) aren’t easy. Let’s put things back in observational, and not model-plus-parameter, terms. Hard to believe, but nobody, not even the more earnest and caring, have ever seen or experienced a model or its parameters. Yet people do experience and see actual data. So why not talk about that? Link.
E A hint about measurement error and temperature (time series) using the BEST results. Link. Here’s another anticipating that data. Link. Here’s a third using the language of predictive inference. Link.
F You’ve heard about the homogenization of temperature series. Now read all about it! A thick subject and difficult. This is the start to a five-part post. Link.
G Lots of ways to cheat using time series. Example using running means and “hurricanes.” Did he say running means? Isn’t that just another way of smoothing? Why, yes; yes, it is. Link.
H A “statistically significant increase” in temperature is scarcely exciting. One person’s “significant” increase is another’s “significant” decrease. Link.
I I can’t find a favorite, which shows the “edge” effect. If you use classical statistical measures, you are stuck with the data you have, meaning an arbitrary starting and ending point. However, just changing these a little bit, often even by one time point, can turn conclusions upside down. Michael Mann, “The First Gentleman of Climate Science” relies on this trick.
Here’s the précis on how to do it right, imagined for a univariate temperature series measured via proxy at one location.
A proxy (say O18/O16 ratio, tree rings, or whatever) is matched, via some parameterized model, to temperature in some sample where both series are known. Then a new proxy is measured where the temperature is unknown, then the range of the temperature is shown. Yes, the answer is never a single number, but a distribution.
This range must not be of the parameter, telling us the values it takes. But it must be the value of the temperature. In technical terms, the parameters are “integrated out.” The range of the temperature will always be larger than of the parameter, meaning, in real life, you will always be less certain. As is proper.
The mistakes at this level usually come in two ways: (1) stating the model point estimate and eschewing a range, (2) giving the uncertainty of the (unobservable) parameter estimate. Both mistakes produce over-certainty.
Mistake (1) is exacerbated by plotting the single point number. If the observable range were plotted (as noted above), it would be vastly harder to see if there were any patterns in the series. As is proper.
Mistake (2) plots the point estimate with plus-or-minus bars, but these (based on the parameter) are much too small.
Mistake (3) is to then do a “test”, such as “Is there a ‘statistically significant’ trend?” This means nothing because this assumes the model you have picked is perfect, which (I’m telling you) it isn’t. If you want to know if the temperature increased, just look. But don’t forget the answer depends on the starting and stopping points.
Mistake (4) is to ask whether there was a “statistically significant” change or increase. Again, this assumes a perfect model, perfect prescience. And again, if you want to know there was a change, just look!
Mistake (5) is putting a smoother over points as if the smoothed points were “real” and causative, somehow superior to the actual data. The data (with proper error bounds) is the data. This mistake ubiquitous in technical stock trading. If the model you put over your data was any good, it would be able to skillfully predict new data. Does it, hombre?
Everything said here goes for “global” averages, with bells on. The hubris of believing one can predict what the temperature will be (or was) to within a tenth of degree or what the sea-level will be (or was) within a tenth of millimeter fifty years hence is astonishing. I mean, you expect it from politicians. But from people who call themselves scientists? Boggles the mind.
Stay tuned for an example!