Do NOT smooth time series before computing forecast skill

Somebody at Steve McIntyre’s Climate Audit kindly linked to an old article of mine entitled “Do not smooth series, you hockey puck!”, a warning that Don Rickles might give.

Smoothing creates artificially high correlations between any two smoothed series. Take two randomly generated sets of numbers, pretend they are time series, and then calculate the correlation between the two. Should be close to 0 because, obviously, there is no relation between the two sets. After all, we made them up.

But start smoothing those series and then calculate the correlation between the two smoothed series. You will always find that the correlation between the two smoothed series is larger than between the non-smoothed series. Further, the more smoothing, the higher the correlation.

The same warning applies to series that will be used for forecast verification, like in the picture below (which happens to be the RSS satellite temperature data for the Antarctic, but the following works for any set of data).

Temperature data

The actual, observed, real data is the jagged line. The red line is an imagined forecast. Looks poor, no? The R^2 correlation between the forecast and observation is 0.03, and the mean squared error (MSE) is 51.4.

That jagged line hurts the eyes, doesn’t it? I mean, that can’t be the “real” temperature, can it? The “true” temperature must be hidden in the jagged line somewhere. How can we “find” it? By smoothing the jagginess away, of course! Smoothing will remove the peaks and valleys and leave us with a pleasing, soft line, which is not so upsetting to the aesthetic sense.

All, nonsense. The real temperature is the real temperature is the real temperature, so to smooth is to create something that is not the real temperature but a departure from the real temperature. There is no earthly reason to smooth actual observations. But let’s suppose we do and then look at the correlation and the MSE verification statistics and see what happens.

We’ll use a loess smoother (it’s easy to implement), which takes a “span” parameter: larger values of the span indicate more smoothing. The following picture demonstrates four different spans of increasing smoothiness. You can see that the black, smoothed line becomes easier on the eye, and gets “closer” in style and shape to the red forecast line.

Smoothed Temperature data

How about the verification statistics? That’s in the next picture: on the left is the R^2 correlation, and on the right is the MSE verification measure. Each is shown as the span, or smoothing, increases.

R^2 grows from near 0 to almost 1! If you were trying to be clever and say your forecast was “highly correlated” with the observations, you need only say “We statistically smoothed the observations using the default loess smoothing parameter and found the correlation between the observations and our forecast was nearly perfect!” Of course, what you should have said is the correlation between your smoothed series and your forecast is high. But so what? The trivial difference in wording is harmless, right? No, smoothing always increases correlation—even for obviously poor forecasts, such as our example.


The effect on MSE is more complicated, because that measure is more sensitive to the wiggles in the observation/smoothed series. But in general, MSE “improves” as smoothing increases. Again, if you want to be clever, you can pick a smoothing method that gives you the minimum MSE. After all, who would question you? You’re only applying standard statistical methods that everybody else uses. If everybody else is using them, they can’t be wrong, right? Wrong.

Our moral: always listen to Don Rickles. And Happy Fourth of July!

Update: July 5th

The results here do not depend strongly on the verification measure used. I picked MSE and R^2 only because of their popularity and familiarity. Anything will work. I invite anybody to give it a try.

The correlation-squared, or R^2, between any two straight lines (of non-zero) slope is 1 or -1. The more smoothing you apply to a series, the closer to a straight line it gets. The “forecast” I made was already a straight line. The observations become closer and closer to a straight line the more smoothing there is. Thus, the more smoothing, the higher the correlation.

Also, to about frequency spectra, or measuring signals with noise, miss the point. Techniques to deal with those subjects can be valuable, but the uncertainty inherent in them must be carried through to the eventual verification measure, something which is almost never done. We’ll talk about this more later.


  1. I wonder if maybe you haven’t just demonstrated a limitation to the usefulness of R**2 rather than the uselessness of smoothing per se. Or, a variation on the same thing, a reminder that correlation is not causation, and that if you torture the data long enough it will confess, even to a crime it did not commit.

    As I look at your graphs, I find the loess lines highly informative, especially with the lower levels of smoothing. When I look at the raw data, I imagine that I can see a downward trend, but that the variations around the trend do not appear to be strictly random, but periodic. And your loess line with smoothness = 0.05 seems to bear this out.

    I think that non-linear smoothing is important here, because it shows the limitations of linear smoothing as indicative of anything characteristic of this kind of data.

    As for computing forecasting skill, I can see your point, but only because the forecast in question is not derived from the data itself. If I were to develop a deterministic non-linear forecast from the data, the projection from the smoothed time series would be the appropriate basis for computing forecast skill. But that is not what we are discussing. We’re discussing a test of the forecasting skill of GCM’s, and unless they forecast non-linear behavior of the kind embodied in the smoothing, then it opportunistic to use a smoothing that they didn’t forecast to claim higher forecasting skill.

    But that is a condemnation of an inappropriate use of smoothing, not a condemnation of smoothing per se.

  2. It’s easier to understand smoothing if you look at the effect in the frequency domain (Fourier transform) rather than in the time domain. Smoothing retains the low frequency components of the time series at the expense of the high frequency components which produce the wiggles that make the graph so “Ugly.” Low frequency sine waves will always correlate well over a period that is short compared to their natural periods, and dc levels will always correlate perfectly.

    I have to agree with you 100% that smoothing shouldn’t be used at all for the purposes of data analysis. It’s only purpose is to produce “curves that guide the eye” which should be explicitly stated.

  3. But smoothing before statistical analysis is used so extensively in climate science that the poor analysts would be unable to make scary headlines about the trend being “worse than previously thought”.

    Short of a gang of high level statisticians going around to climate scientists’ houses and breaking kneecaps, is there anything that can be done to highlight and stop these heinous practices?

  4. Wicked!! Another example of, “if you do not know, you do not know”. Under-specified models are under-specified. This approach ot building models of any phenomenon is clearly problematic. Your charts show that the way to hell is paved with smoothing.

  5. Hi –

    I’ve been doing econometrics for the last 25+ years in the commercial environment (i.e. I do industrial forecasting for a living), and I can only say one thing here: Dr. Briggs is absolutely correct, with no equivocation.


    Simple: a regression analysis gives you a path with a calculated likelihood. Nothing more, nothing less. The statistical tests tell you how likely this path is. If you moved to smoothed time series (an early idea of mine when my equations didn’t turn out worth crap, and which is quickly beaten out of me by someone with 40 years experience doing econometrics) you are losing information, not gaining it.

    Why? Because with each smoothing you are removing information from the time series. You are NOT adding it. Those outliers are reality: removing them is saying that your equation knows better. It’s a common error of people who learned methods, but not how to use them.

    When I do regression analysis on noisy data and get lousy results, then I either accept the results and move on (commercial time constraints) or I gut busy working on respecifying the models. You can do wonders with lag effects and sometimes with some repetitive dummy variables to screen out a recurring outlier, but that belongs in the equation, not in the data. Data is holy, data is king, and your customers will call you asking where the hell we got our data if it isn’t instantly recognizable as the government-issued time series that everyone in the business knows and loves. And if you dick around with the data, you lose all credibility in the business.

    The only adjustments I have ever used on original data is seasonal adjustment (invariably X11). There is no justification to using any other data manipulation. Zilch. None.

    If you want to show a trend – which everyone does – then you use the trend results of the X11 process. Best there is.

    The smoothing-before-analysis-trick of the climate “scientists” is, as far as I am concerned, at least an indication of either statistical incompetence or deliberate abuse of those poor numbers.

    And just a thought: given the nature of the claims of global warming, shouldn’t the data be seasonally adjusted? By removing a standardized, statistically generated and very widely accepted seasonal factor, any underlying trends should appear. Of course, you’d not want to use the X12 process, as the ARIMA components would tend to hide that development.

    Just thinkin’

  6. For the statistically under-sophisticated, here is another way to look at it. The “forecast” is a straight line, but the real data are all jumpy. If you smooth the jumpy real data enough, they form a straight line, too. Any two straight lines have perfect correlation because the angle (like two sides of a triangle) between them never changes.

    The more you smooth, the more you turn jumpy data into a straight line, which then forms a constant angle with any other straight line.

    There is no utility to doing that. The meanders of rivers and stumbling drunks are their important features. Pretending such phenomena are straight lines discards all that important information, whether you paddle a canoe or are a traffic cop checking the sobriety of a weaving driver.

  7. Matt,

    I have thought for a long time that the practice of Hansen, Jones, and others of averaging hourly, daily, then monthly temperatures to get a “global annual average temperature” time series is a form of smoothing — and a severe one. Do you concur?

  8. This is actually a symptom of a much broader problem and one not limited to climate science: Too often ad hoc validation statistics are used because they are easy to compute and refine.

    I (somewhat facetiously) blame the easy access to computers. In the 70s, there was a computer center director who was the gate keeper of a valuable resource. In theory, he understood what was and was not allowable when using a computer to do good science. I know there’s no going back and I don’t really want to go back but a lot of modern work could use that kind of check. If people were forced to convince a colleague on campus that their validation statistic wasn’t something invented out of whole cloth before running it, I think we’d see less of this silliness.

    Somewhat related: There’s a very entertaining essay by Forman Acton in the middle of his book Numerical Recipes That (Usually) Work (published in the 70s) where he laments how computers are being used too often as a substitute for thinking. A lot of what he says still holds true.

  9. Pete,

    There isn’t anything necessarily wrong with producing an “average monthly temperature”, or other time period average. However you define an observable, you just have to be consistent and logical. The ideal “average monthly temperature” would sum all the observed temperatures (say, by hour) over the one-month period and then divide by the number of observations. The current method does the same thing with daily average temperatures and is a reasonable approximation.

  10. Matt, one of the curious metaphors that is embedded in climate statistical work – and one which I found very foreign on first encountering the literature – is the metaphor of “signal” and “noise”. As though thermometers were radio receivers with superimposed static.

    In the case at hand, climate scientists are prone to reify the smoothed version of a series as the “signal” and the variations from the smooth as static. While there is undoubtedly measurement error, the observed fluctuations do not arise from interannual measurement error and there is no basis for regarding the smooth as the true “signal”.

    The point seems very hard for many climate scientists to grasp.

    It’s easy for people used to financial series, which are also “noisy”, but you simply don’t see the signal-noise metaphor used.

  11. If smoothing were the only problem with climate science these days, at least it could be addressed. But from reading climate reports and articles, I get the impression that time series data are adjusted, then corrected, then homogenized, then normalized, then culled of outliers, then smoothed across time, then “imputed” to fill in gaps, then smoothed across location, and then statistically analyzed to reach a conclusion… complete with confidence interval!

  12. Joe- I read that entertaining section in Acton’s book recently. However, it seems to me that the battle has been lost. Computers have gotten so fast and so cheap that many problems that would have required thought back in the 70s can be solved just by plugging them into canned code. Each year, the boundary between what can and can’t be handled this way gets pushed out to contain more and more difficult problems. Perhaps it is progress, but I find it a bit depressing as well.

  13. This might be a version of the question Pete asked, and my apologies if it is, but what about running averages? Do these count as smoothing? I assume they would as they seem to eliminate higher frequency components in the data.

  14. Steve,

    Amen, brother. You don’t hear of brokerages tossing out the pennies to get at the dollars.


    It’s a different question, and, yes, running means are smoothers and should never be used unless one wants to predict future data with the belief the data is best modeled by that smoother. There does exist the possibility of using such a smoother to estimate a signal that is known to be measured with noise; however, the uncertainty in that estimation must be carried through with any eventual analysis (like verification). You must not make the guess (estimate) and then ignore the uncertainty in that estimate, though that is what is usually done. And as Steve pointed out, any noise is usually small.


    And people wonder why statisticians aren’t trusted.

  15. If I apply Rahmstorf’s methods to my brokerage account, I suspect my balance will improve considerably. If I cash out, do you think the broker will pay me on the basis of my “smoothed” balance?

  16. A verification or goodness-of- fit statistic is defined to measure the discrepancy between observed data and fitted (predicted) values calculated from an estimated model. It puzzles me why one would calculate the correlation or MSE between smoothed (say, calculated from a LOWESS regression) and linearly predicted values (calculated from a linear regression). I can guess some reasons for doing so; I just can’t come up with a good one. It will be greatly appreciated if someone can point me to some peer-reviewed papers where such calculation was used to make inferences.

  17. JH, old pal. Happens all the time. Go to the Climate Audit link and follow the discussion of the paper mentioned therein.

    This smoothing-before-verification happens in meteorology on a daily basis. Think of field forecasts, which are 3D grids. Surface stations are sparse and upper-level radiosonde sites sparser. What happens is that the observations are interpolated (smoothed) to the same grid as the forecast and then the two are compared. I wrote a proceedings paper many years ago saying this is naughty.

  18. Matt:
    It seems to me that smoothing is a “heads you lose, tails I win” proposition in that you not only may find a relationship that is an artifact of the smoothing technique but that you also may be losing information and in fact may be eliminating some viable explanatory variable that is cyclical in nature. Simply put if you only look at annual average temperatures you will ignore what is happening on say a seasonal or even monthly basis. This is exactly what I think happened in a recent discussion where the initial comment assigned Mountain Pine Beetle Infestations to global warming. When you actually look at the temperature record on a monthly basis for the affected regions in the Greater Yellowstone area it turns out that there is (a) no GW in the affected areas but (b) an appreciable and an unaddressed increase in March temperatures (see the discussion here: ). The underlying AGW hypothesis leads to what is essentially a wrong unit of analysis problem – annual temperatures instead of some other unit of aggregation. I think this is the temporal equivalent of your spatial smoothing example?

  19. An interesting piece. In response to an earlier comment you seemed to suggest that taking a consistent mean of the data was not equivalent to smoothing and yet your span parameter = 0.05 looks a lot like the annual mean of the data. Probably very close if you then did a consistent data reduction on the smoothed data by sampling say once every ten points on the smoothed data. And the annual mean of various time series of temperture data has been used to generate headlines and quite a lot of heat!

  20. What’s been shown here is that classical linear regression with Pierson’s correlation coefficient are inadequate time-series analysis tools. Bona fide analysis does not rely upon them, using cross-spectral coherence instead. Proper smoothing (without ripples in the amplitude response) does not create any spurious signal components; it simply removes those that may be of little interest to the study. There’s no inherent problem there. The problem is that “climate scientists” are dealing with time-series using jury-rigged filters and inappropriate analysis tools to sell a belief system that has little to do with real-world observations.

  21. No, John S. No, no, no. The problem is that people are replacing actual observations with non-observations.

  22. In the case of dealing with the “endpoint problem” of symmetric filtering, non-observations are obviously being introduced in place of data. But that is not the case when frequency discrimination (low-pass or high-pass) is used to seperate out signal components of different origin, e.g., low-frequency tides and high-frequency sea-swell. Both are clearly components of the ocean surface elevation observed by a wave staff. Properly designed filters do not create any spurious signal components as might a simple moving average, with negative side-bands in its frequency response. Are you really saying no to proper filtering intelligently applied?

  23. John S. Not quite yet. There is just no reason to smooth if you are not trying to estimate some underlying truth. If the observations you are taking are truth, then smoothing destroys it and misleads you. It does not matter if the method you use to smooth is sophisticated and pleasing, it is still destructive.

    See also my next post on models.

  24. Dr (?) Aaron, and John C

    Those wiggles would tell you a lot. But before that, you have identified the key idea: smoothing can be useful for forecasting, for making statements about new data.

    Any smoothing you apply is an implied model of the data. It says you believe that there is some underlying process of importance and another on non-importance (or is unidentifiable). Thus, for tides, you have the physics of the moon and it cycles and the unknown frothing caused by wind, topography, etc. The only way to tell whether your model of the moon’s influence is to see not just whether it explains the data you have in hand, but new data that will arise.

    It matters not whether the model is linear or nonlinear or whatever. Or whether you have only an approximation to the moon’s dynamics or a full, partial differential set of equations of motion.

    Again: you only smooth if want to forecast. Or in the rarer cases where you believe you have measured your data with error and want to estimate the true signal. If you have not measured with error, or that error is negligible, and if you are not forecasting, then you are replacing your known data with fictional data, i.e., data that is not the real data. Why would you want to do this? Because the smoothed data is more pleasing to the eye? Because it appears like its it going to be a forecast? The later might be true, but this concept is hardly ever followed up: when is the last time you saw somebody’s example of smoothing actually used to forecast and that forecast has been verified? Outside of business/financial data, I have not seen it.

  25. Clearly raw data is more valuable then smoothed data. As for why you might want to smooth, you suggested:

    1) Reducing measurement error
    2) For visually appealing graphics
    3) For forcasting

    Let me add one other,

    4) To whiten the noise in your data.

    In this case the power spectrum of the Filter should be somewhat inverse to the power spectrum of the noise. If the noise is high frequency then maybe this means a low pass filter, if the noise is low frequency maybe this means a high pass filter, If the noise is high but over a narrow bandwidth maybe this means a notch filter. Of course this assumes that you have some idea of what the noise actually looks like. And there are of course optimal algorithms to do this such as Kalman and Wiener filters. All algorithms need to consider such thing as phase delay, distortion (amplitude and phase) and numerical issues and perhaps causality (if it is a real time filter).

  26. “Pray tell, what would the wiggles tell us if we’re trying to forecast the tides?”

    Storm surge?

  27. I disagree that smoothing implies a model, that known data is being replaced with fictional data, or that it’s motivated by the desire to hype a forecast. Smoothing is nothing more–or less–than a lowpass filtering that attenuates certain high-frequency signal components. It requires no model, often being done on the basis of rigorous physical considerations to discriminate between superimposed processes in distinct frequency ranges. No signal components are replaced with fictional ones. The effect of any linear filter upon the correlation (or spectral)properties of the filtered signal is analytically well-established through the corresponding impulse (or frequency) response function. Forecasting may or may not be the eventual goal, but it is not the central issue in this matter. Perhaps in dealing with aggregate statistics that arise from independent trials of drugs or samples of public opinion, data smoothing may inappropriate. But in geophysical time-series analysis, it often proves indispensable in identifying signals of practical interest. Apparently you have not seen such legitimate applications.

  28. Storm surges are hardly high-frequency wiggles that could be removed by smoothing while leaving the tide signal intact. Their spectrum overlaps that of the tides.

    What seems to be forgotten in this senseless argument against smoothing, is that instrument systems perform their own irreversible smoothing, whereas digital filters do not destroy the raw data. That’s all that really matters.

Leave a Comment

Your email address will not be published. Required fields are marked *