Sep 06 2008

Do not smooth times series, you hockey puck!

Published by Briggs at 7:47 am under Bad Stats, Climatology

The advice which forms the title of this post would be how Don Rickles, if he were a statistician, would explain how not to conduct times series analysis. Judging by the methods I regularly see applied to data of this sort, Don’s rebuke is sorely needed.

The advice is particularly relevant now because there is a new hockey stick controversy brewing. Mann and others have published a new study melding together lots of data and they claim to have again shown that the here and now is hotter than the then and there. Go to climateaudit.org and read all about it. I can’t do a better job than Steve, so I won’t try. What I can do is to show you what not to do. I’m going to shout it, too, because I want to be sure you hear.

Mann includes at this site a large number of temperature proxy data series. Here is one of them called wy026.ppd (I just grabbed one out of the bunch). Here is the picture of this data:
wy026.ppd proxy series

The various black lines are the actual data! The red-line is a 10-year running mean smoother! I will call the black data the real data, and I will call the smoothed data the fictional data. Mann used a “low pass filter” different than the running mean to produce his fictional data, but a smoother is a smoother and what I’m about to say changes not one whit depending on what smoother you use.

Now I’m going to tell you the great truth of time series analysis. Ready? Unless the data is measured with error, you never, ever, for no reason, under no threat, SMOOTH the series! And if for some bizarre reason you do smooth it, you absolutely on pain of death do NOT use the smoothed series as input for other analyses! If the data is measured with error, you might attempt to model it (which means smooth it) in an attempt to estimate the measurement error, but even in these rare cases you have to have an outside (the learned word is “exogenous”) estimate of that error, that is, one not based on your current data.

If, in a moment of insanity, you do smooth time series data and you do use it as input to other analyses, you dramatically increase the probability of fooling yourself! This is because smoothing induces spurious signals—signals that look real to other analytical methods. No matter what you will be too certain of your final results! Mann et al. first dramatically smoothed their series, then analyzed them separately. Regardless of whether their thesis is true—whether there really is a dramatic increase in temperature lately—it is guaranteed that they are now too certain of their conclusion.

There. Sorry for shouting, but I just had to get this off my chest.

Now for some specifics, in no particular order.

  • A probability model should be used for only one thing: to quantify the uncertainty of data not yet seen. I go on and on and on about this because this simple fact, for reasons God only knows, is difficult to remember.
  • The corollary to this truth is the data in a time series analysis is the data. This tautology is there to make you think. The data is the data! The data is not some model of it. The real, actual data is the real, actual data. There is no secret, hidden “underlying process” that you can tease out with some statistical method, and which will show you the “genuine data”. We already know the data and there it is. We do not smooth it to tell us what it “really is” because we already know what it “really is.”
  • Thus, there are only two reasons (excepting measurement error) to ever model time series data:
    1. To associate the time series with external factors. This is the standard paradigm for 99% of all statistical analysis. Take several variables and try to quantify their correlation, etc., but only with a mind to do the next step.
    2. To predict future data. We do not need to predict the data we already have. Let me repeat that for ease of memorization: Notice that we do not need to predict the data we already have. We can only predict what we do not know, which is future data. Thus, we do not need to predict the tree ring proxy data because we already know it.
  • The tree ring data is not temperature (say that out loud). This is why it is called a proxy. It is a perfect proxy? Was that last question a rhetorical one? Was that one, too? Because it is a proxy, the uncertainty of its ability to predict temperature must be taken into account in the final results. Did Mann do this? And just what is a rhetorical question?
  • There are hundreds of time series analysis methods, most with the purpose of trying to understand the uncertainty of the process so that future data can be predicted, and the uncertainty of those predictions can be quantified (this is a huge area of study in, for example, financial markets, for good reason). This is a legitimate use of smoothing and modeling.
  • We certainly should model the relationship of the proxy and temperature, taking into account the changing nature of proxy through time, the differing physical processes that will cause the proxy to change regardless of temperature or how temperature exacerbates or quashes them, and on and on. But we should not stop, as everybody has stopped, with saying something about the parameters of the probability models used to quantify these relationships. Doing so makes use, once again, far too certain of the final results. We do not care how the proxy predicts the mean temperature, we do care how the proxy predicts temperature.
  • We do not need a statistical test to say whether a particular time series has increased since some time point. Why? If you do not know, go back and read these points from the beginning. It’s because all we have to do is look at the data: if it has increased, we are allowed to say “It increased.” If it did not increase or it decreased, then we are not allowed to say “It increased.” It really is as simple as that.
  • You will now say to me “OK Mr Smarty Pants. What if we had several different time series from different locations? How can we tell if there is a general increase across all of them? We certainly need statistics and p-values and Monte Carol routines to tell us that they increased or that the ‘null hypothesis’ of no increase is true.” First, nobody has called me “Mr Smarty Pants” for a long time, so you’d better watch your language. Second, weren’t you paying attention? If you want to say that 52 out 413 times series increased since some time point, then just go and look at the time series and count! If 52 out of 413 times series increased then you can say “52 out of 413 time series increased.” If more or less than 52 out of 413 times series increased, then you cannot say that “52 out of 413 time series increased.” Well, you can say it, but you would be lying. There is absolutely no need whatsoever to chatter about null hypotheses etc.

If the points—it really is just one point—I am making seem tedious to you, then I will have succeeded. The only fair way to talk about past, known data in statistics is just by looking at it. It is true that looking at massive data sets is difficult and still somewhat of an art. But looking is looking and it’s utterly evenhanded. If you want to say how your data was related with other data, then again, all you have to do is look.

The only reason to create a statistical model is to predict data you have not seen. In the case of the proxy/temperature data, we have the proxies but we do not have temperature, so we can certainly use a probability model to quantify our uncertainty in the unseen temperatures. But we can only create these models when we have simultaneous measures of the proxies and temperature. After these models are created, we then go back to where we do not have temperature and we can predict it (remembering to predict not its mean but the actual values; you also have to take into account how the temperature/proxy relationship might have been different in the past, and how the other conditions extant would have modified this relationship, and on and on).

What you can not, or should not, do is to first model/smooth the proxy data to produce fictional data and then try to model the fictional data and temperature. This trick will always—simply always—make you too certain of yourself and will lead you astray. Notice how the read fictional data looks a hell of a lot more structured than the real data and you’ll get the idea.

Next step is to start playing with the proxy data itself and see what is to see. As soon as I am granted my wish to have each day filled with 48 hours, I’ll be able to do it.

Thanks to Gabe Thornhill of Thornhill Securities for reminding me to write about this.

Don't pass by, pass it on:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Slashdot
  • Twitter
  • Reddit
  • StumbleUpon
  • PDF

81 responses so far

81 Responses to “Do not smooth times series, you hockey puck!”

  1. Doubter says:

    underlying process: I didn’t say anything about this for either the climate or the stock market, but since you brought it up, I will concede that there probably is one in both cases. But how much does that actually help us to predict the future?

    There are several levels of science. There’s the Newtonian mechanics level, which is pretty straightforward, even deterministic up to Heisenberg, or until you get into n-body problems. Then there is statistical mechanics, which can deal with huge numbers of particles pretty well, and not too hard until you get into turbulence and so on. Note that it says nothing predicting the paths of individual particles, so it is in a way less than Newtonian and in a way it is more. The third level is where you have complex complicated systems with many individuals and you want to predict paths at the individual level. Now it’s getting hard. Or fun, if you’re masochistic.

    Can we predict the path evolution will take? We know the fundamental principles pretty well, but we don’t know future events, random or not. And even if we did, it all depends on lots of little chance events. There is an element here that evolution will do what it wants just because it feels like it, if you don’t mind putting it that way.

    Climate is (maybe) like that. It is certainly not deterministic. Even if we knew the present conditions exactly, we could not predict the weather very far into the future, even with perfect computational ability. Edward Lorenz (famous for chaos theory and the butterfly effect) was on the right track.

    What is the physical model behind climate theory? It is certainly not linear. How much statistics does it take to prove that? Do we know enough of the physical model to say for sure it is not a random walk? Throwing everything that does not fit your preferred belief-system model into the category “weather” is not scientific.

    We find ourselves in the situation described above. We might be able to describe the space of possible outcomes, but we cannot know which one of those will be the actual outcome.

    In statistics, you can fit data points as closely as you want, but this is not a good thing, because you want to fit the actual system, not the particular data points.
    In fitting climate models to past weather, you can’t know if you are overfitting.

  2. George E. Smith says:

    Hurrah! Finally some statistical sanity.

    If one assumes, that the point to point data changes, are not noise, but are actually real believable data, then one may argue that each data point is actually supposed to be different, because the system under study made them different.
    So any kind of low pass filtering algorithm; or for that matter any kind of filtering algorithm, can only THROW AWAY INFORMATION.

    You do not add information by filtering; but you can remove noise by filtering, and noise in climate data, is simply measurment errors of various kinds.

    If you do enough low pass filtering, you eventually end up with no variation at all; so why not simply keep a running average of ALL of the data values you have to date, and simply report the single number value of that average; it is probably at least as meaningful as your five year running average is. Remember that the first point of a data set, must vanish form a five year running average,as will the last point, so you throw away history as well as actual data.

    Actually, the same concept can be applied to the data gathered from different locations.

    On any northern summer day, one can measure temperatures on earth (surface) that can range as high as +60C, or as low as -90C, and every temperature between those extremes can be found somewhere. Those temperatures are different, because they are supposed to be different; so why attempt to average them in any kind of way, to get some number (+15C say) that wasn’t measured at any actual place at any actual time, and has no scientific significance to it whatsoever.

    That 150 C range of temperatures also covers a wide variety of terrains, and ground cover, even deep oceans, and the thermal energy flows in each of those different environments relate to the local temperature in totally different ways, so there is no relationship between the “average” global temperature (even if it was possible to measure such a number) and the energy balance of the planet.

    You can learn about as much by simply averaging all the telephone numbers in the Manhattan Telephone Directory, to come up with a mean telephone number. The numbers in that book are all suppsed to be different, because each relates to a single telephone somewhere, and the average of all the numbers is of no interest to anyone, unless it happens to be your phone number.

    Signals can be improved by filtering out real noise; but nothing extra is learned by throwing away much of the real information contained in the signal.

  3. George E. Smith says:

    One slight correction; since we are being pedantic.

    The “King’s English” is a technical term describing the approved language of the Royal Court (British anyway), and has nothing to do with the occupier of that position.

    so there is no such thing as “The Queen’s English.”

    George

  4. bill r says:

    George,

    Shh! You’ll wake the children. Next you’ll be telling them that parameters and probability don’t exist either.

  5. Briggs says:

    George, Please ask the Queen to forgive me.

    TCO, See today’s post for a “scholarly” rebuttal to those who might claim that smoothing does not increase certainty.

    Bill r, parameters do not exist and neither does probability. Santa Claus, however, is an entirely different story.

  6. bill r says:

    Briggs,

    Hah, I like the Santa Math! I’m familiar with the writings of Bruno D., too.

    How about a discussion on the difference between intensive and extensive measurements with respect to linear operations on temperature? When I try that one on engineers, I usually get funny looks. Some of them average clock times, too.

  7. Patrick Hadley says:

    Does George E Smith not overstate the case when he criticises the use of global mean temperature?

    Imagine Starbucks head office receiving data every day from all its coffee shops with the gross takings, and the details of the sales of each item. At the end of each month it could work out the average sales per shop, the anomaly of this from the long term average for that shop at that time of year, and at the end of the year it could produce a time series to see what the trend is for this anomaly when averaged over the whole company. Now this trend is very far from telling the whole story about the health of the business, but that does not mean that it would not contain some useful information.

    Businesses do regularly publish their like-for-like sales figures and these figures are seized upon by analysts because they do give a picture of whether or not the business is growing. I know this analogy is not perfect, but without a GMT how would we know whether the world is warming or not?

    I don’t agree with him about “The Queen’s English” either. I remember when I was at school in England in 1960, using a text book that was called “The Queen’s English”, and I have just traced a book written in 1856 Called “A Plea for the Queen’s English” by Henry Alford.

  8. joy says:

    “Signals can be improved by filtering out real noise; but nothing extra is learned by throwing away much of the real information contained in the signal.”

    TCO

    I put your words through a language sharpness filter to find the sharp bits and never was a truer word spoken in jest. This is how the graph looks: all words out of quotes are mine. It’s poetic.
    “Just a word to the wise…”
    “scientists need to put their own babies on the chopping block and try to kill them.”good job you weren’t talking to the foolish!
    I think the whole jumping up and down on him for a remark is silly. Although opportunistic. I make black humor all the time. Hate PC speech codes or stifling of good fun. Let’s not use the namby pampby tactics of our opponents. It justifies them and makes us ball-less like them.
    Capisce?
    “Using the actors for an inference is silly…’Capisce’?”
    “it is not clear as a ‘scolarly’ rebuttle”
    “I am actually concerned about using smoothed data for instance in a significance testbut I just want it better spelled out…”!!!
    “with math and references and such, not an on high put down”
    “and I love Sarah Palin”
    “I am very sympathetic to a concern about smoothing data…”
    “”but WM Briggs just has an on high post…”
    “If it’s such a no brainer”
    “an obvious thingk”
    “Why doesn’t WM Briggs get off a scholarly comment”?
    “Mann did at least concern himself somewhat with the issue”!!!
    “perhaps not optimally” (Understatement)
    “side just cackling and egging each other on”
    “Hu don’t know about your Math…. But appreciate your approach”
    “well ‘RIGHT’ it up as a comment then…”
    “leave out the “you hockey puck”” Who’s team are you in?
    “it is well know…(fill in”
    “fill in”
    “fill in”
    “fill in”
    “fill in”
    “given the complexity of his method.
    “BTW I’m particularly concerned about using smoothed data…””Zorita has been pretty clear about this ’ inuTitavely’”?
    “of course I’m not sure the mathematical extent..” evidently
    “that’s for a stats guy” Yep
    “heck, maybe there is even some sort ofesoteric argument”!!!
    “but lets “T” things up and have that argument clearly”

    “WM thanks for the kind words”
    “but this thing is out there now”
    “Jeff’s comment is silly and is what makes the rest of us look silly”!! How silly? Could we quantify?
    “I’m on all that and two steps past you”…and I’m half way back!
    “…more clearly nail it and in the literature. Just “stking”won’t cut it”…
    “so I’m actually sympathetic to the kvetch” did you say VET!!!
    “perhaps it really would be good to make a’scolarly’ comment…”!! couldn’t agree more.
    “peace”
    Amen!

  9. bill r says:

    Patrick,

    Starbucks and the global climate are not commensurate. Starbucks pools and averages because all the dollars get mixed into the same pot, where they are exchangeable. (the dollars from Seattle spend just the same as the dollars from, say, Miami). George’s post points out that this is not the case with climate/temperature measurements. The temperature in Seattle does not get pooled with the temperature in Miami in the physical world, as it apparently does in the climate models.

    When I pool Starbucks (net) profits, I’m am essentially talking about a big pile of cash that “exists” somewhere. When I average temperature values taken at different locations, the “average” is a fantasy (an estimate of a parameter) that doesn’t describe any particular location.

  10. TCO says:

    Air Force: Thanks.

    Joy: You lookin’ mighty fine in those comments…

    BillR: I think that there is obviously more information and potentially some very interesting information in the change in temp distribution (by latitude, by land versus water, for examples) rather than just the average versus time. All that said, looking at the average versus time is a simple way to get started on analysis.

    Pat: You go, girl.

  11. # Chris Colose wrote on 07 Sep 2008 at 3:19 pm

    “(I’m still getting my edumucation!!! Give me some time). It does interest me when blogs and other such places challenge the peer-review, and very rarely do I find the claims end up holding closer scrutiny.”

    Dear Chris,

    If you continue your education, eventually you will find that there is no such thing as a perfect paper and that many faulty papers pass peer review.

    However, I would not look to the claims of error that come out in the popular press or the wilder blogs.

  12. Joy says:

    TCO:
    I make you right there! I missed the quotation marks out of this comment, one of your more poignant.I would hate to take the credit for such wisdom.

    “I think the whole jumping up and down on him for a remark is silly. Although opportunistic. I make black humor all the time. Hate PC speech codes or stifling of good fun. Let’s not use the namby pampby tactics of our opponents.” It justifies them and makes us ball-less like them.
    Capisce?”

  13. TCO says:

    It’s ok, honey.

  14. Patrick Hadley says:

    Bill R, I did say that I knew my Starbucks analogy was not perfect, but I think that it is better than you suggest. Let me see if I can improve it.

    Imagine if a sample of Starbucks coffee shops gave each customer a feedback questionnaire and every day these shops reported back to head office the average score of customer satisfaction. Adding up these scores would not be like adding up the dollars and cents into a big pile of cash that “exists” and presumably you would describe the “average” as a fantasy (an estimate of a parameter) that does not describe any particular location.

    But surely Starbucks would be getting useful information if they found that over a time there was a trend in this average?

  15. JH says:

    This is JH, not JM…wanna be J-L Picard though.

    Smoothing techniques can be used to not only filter out the noise but also remove seasonality to make the long term trend clearer. If one is to look at the trend, I think smoothing is a natural first step. They do sometimes pick up trends undetectable by our eyes.

    You know, I will soon need bifocals.

    The measurement errors can yield distorted modeling results, for example, the choice of best time-series model in terms of certain criteria. I suspect the trend probably won’t be affected. Of course, as you pointed out, it will depend on the (structure) relationship between the proxy/surrogate and true variable. Will have to do some study to be sure.

  16. rdd says:

    I don’t see the problem with smoothing data over time and extrapolating trends. It worked well in the 1920s and 1990s for the stock market and 1945 to 2005 for predicting house prices. All of this variability stuff is vastly over-rated in my humble opinion.

    Just because the relationship between tree rings and temperature has some fluctuations in it and there appear to be fluctuations in the tree ring thicknesses over time doesn’t mean that you can’t get equally good relationships as the geniuses on Wall Street have been able to develop. Mathematical models ARE the real world after all. Everything else is an illusion

  17. Briggs says:

    Rdd,

    Ah, but that kind of smoothing is for a different purpose. That is standard time series analysis with the goal of trying to predict future values of the same observable.

    The smoothing I am talking about is where one series (or more) is smoothed and then input into another analysis. The results will be too certain.

    We disagree about the “variability stuff.” I argue that most people underestimate variability, which is another way of saying they are too confident.

  18. JM says:

    Briggs

    I’m really sort of confounded here (stats is not my forte), but I can’t really see your point.

    You start with two processes which I’ll rewrite as follows (if I mess this up please let me know):

    X0 = ax + b + w(v,s0)
    X1 = ax + b + w(v,s1)

    where a = slope (same for both), b = offset (same for both), w = white noise process (same for both), v = variance (same for both), but ….

    s0 = seed for process 0
    s1 = seed for process 1

    The important thing here IMHO is that both X0 and X1 are identical *except* for the seed values.

    You then generate your input data by letting both processes run for a while, but because the seeds are different the output is different. To the eye, they look completely uncorrellated provided the amplitude of w() is large enough to overwhelm (or at least disguise) the linear process “ax + b”).

    At this point you say X0 “has nothing to do with” X1. (I don’t agree, because stochastically they are identical, but I’ll pass on that point for a second.)

    You then start doing correllations and smoothing until you get to the following situation

    X0-smoothed = ax + b
    X1-smoothed = ax + b

    ie. you’ve removed all the variation which was originally due solely to the differing seeds for the white noise process w()

    So what? You started with the same thing, did two runs to get some output then started smoothing to remove the random element and ended up with the same thing.

    Why are you surprised?

    Now second question:- If your purpose is to uncover (and validate or deny) the signal of “ax + b”, how could you possibly do this without removing w() by smoothing? Isn’t that the point of the whole exercise?

    Now if the white noise process w() was what you were interested in, I could understand your complaint, it has been eliminated by smoothing.

    But if the signal of the physical process you are trying to detect (ax + b) is the item of interest (which it is in climate matters), why are you concerned that the noise (weather) is removed by smoothing?

    Sorry, I just don’t get it.

  19. Briggs says:

    JM,

    Your equations don’t represent what I did. They should start

    X0_i = epsilon_0i
    X1_i = epsilon_1i

    where epsilon_ji ~ N(0,1). The “a” and “b” in your equations should be 0.

    “Stochastically” the same? Well, let’s say that our uncertainty in X0_i is described by a normal distribution with parameters 0 and 1, and knowledge of X0_j (where j does not equal i) or any X1_k (for all k) does not change this. Same for X1_1.

    Your second set of equations aren’t right either (well, aren’t right for what I did).

    S0_i = aX0_i-1 + aX0_i-2 + …aX0_i-k

    where a = 1/k, and the same thing for S1_i. This holds when we do the running means smoother. For the low pass, it’s slightly different.

    Thanks.

  20. Jude says:

    It all depends. We engineers smooth input data to our Kalman filters all the time. However, you generally want to make sure that the smoothing prefilter has a much higher bandwidth than the Kalman filter, or you may be taking out the information it needs to converge accurately. You have to be especially careful of inducing excessive phase delay, which is why we usually like to have at least an order of magnitude between the bandwidth of the Kalman filter and the smoothing prefilter in any real-time processing. If you are using a “moving average” smoothing prefilter, what we like to call a “finite impulse response filter”, you can compensate phase delay in post-processing just by shifting the data forward. If the smoothing prefilter bandwidth is approaching the bandwidth of the Kalman filter, then you are best advised to model the prefilter difference equations in the Kalman filter. This task can be considerably simplified without increasing the dimension of the filter state if you decimate the prefiltered data, which you might as well do as you just smoothed out all the high frequency content anyway.

    I doubt many of you will read this far. If you skipped ahead, let me put it in a nutshell – you can smooth input data, but you have to be careful how you do it and how you process subsequent data. It takes a very experienced and very talented operator to do it right so, if you are not sure of the bonafides of the person doing it, you should proceed with caution.

  21. John Creighton says:

    I wouldn’t call smoothed data fictional data, I would call it transformed data. I’ll agree that it can increase spurious correlation but that doesn’t mean that there isn’t some dynamic model that fits it well. Weather you have enough data to fit that model with reasonable certainty is another question. If you take the inverse transform of your fit, then you can see how much of the real data is explained by the fit to the transformed data.

  22. Briggs says:

    John,

    Not quite. An infinite number of models will fit your already observed data, and will fit it as close as you like with respect to any measure of goodness. But so what? The real test of a model is its ability to well predict new data.

    Also, search for the term “smoothing” on the site here.

  23. John Creighton says:

    Weather a model makes good predictions is outside the scope of my comment. Even a simple line can make predictions as it is an estimate of the derivative and the Taylor series tells us how much error is introduced by higher order derivatives assuming we have a decent estimate of the first derivative. In some circumstance within the same process the estimate of the derivative will predict the future well, and in others it won’t. We cannot know when there will likely be a rapid or discontinuous change in the trend without a good understanding of the underlying process.

    With regards to climate science, it is nonlinear, so there are likely processes like hysteresis and possibly bifrications that will introduce rapid changes in the behavior of the system. When these changes occur, we will have to throw out our fit and start again. Most simple model fits are an approximation about some region of a much more complex system.

  24. John Creighton says:

    I can’t find it here but John S made a comment about cross spectral correlation. Well, not totally related to this thread I find it interesting because where the cross power spectrum has peaks are areas in the frequency spectrum where there could possibly exist some correlation and the narrow narrower the bandwidth of these peaks the easier it will be to use them to make predictions. However, the morrow narrow the peak the less statistical evidence there will be to evaluate whether the correlation is spurious.

  25. Slartibartfast says:

    We engineers smooth input data to our Kalman filters all the time.

    Um, no. Not this engineer. Never, ever, ever. By “smoothing” the inputs, you introduce time-correlation into the inputs, which you then should have to represent in your measurement model. Not good. No, the correct way to handle noisy measurements is by modeling the amount of noise in the measurement in the measurement error covariance (usually called “R”, in notation found in Gelb, for instance).

    If you’re only smoothing in one direction, you’re introducing a time lag, which may throw off your filter in a completely different way. Frequently your measurement is a dynamic quantity, so time delay is going to result in residuals that are time-correlated in a way that is almost sure to confuse a filter that you’ve made overly confident by smoothing its inputs, and then reflecting the attendant reduction in noise by lowering “R”.

    It is true that sometimes we engineers are forced to feed one Kalman filter with the output of another Kalman filter. There are ways for handling these situations, but you wouldn’t do this by design; you do it because you have no other choice. I’ve done a number of these, and never once has it been necessary or desirable for me to smooth the inputs to the downstream filter.

    EOR (End of Rant)

    NB: I work in defense, not climate science. It may be that complete no-nos in the world of defense are less important in climate science, but it is NOT true in general that Kalman filter measurements are smoothed or otherwise manipulated. The measurement is your best information; by messing with it, you are destroying information.

  26. Slartibartfast says:

    I think one of the problems here is that what Mann is doing, IIRC, is smoothing a naturally varying (but accurately measurable, one might suspect) process. In doing so, he’s removing a great deal of the natural variation in the name of “noise” reduction. And also, inevitably, introducing a different kind of time correlation into the data than is in there in the first place. Possibly to the point of dominating or even masking what’s actually in there.

  27. John Creighton says:

    Slartibartfast, doesn’t the process of sampling smooth data because we don’t sample over an infinitesimal period of time and know measurement device has infinite bandwidth.

  28. Slartibartfast says:

    Sorry, what? There seems to be a chunk of that question missing.

  29. John Creighton says:

    Nothing’s missing, a thermometer has thermal mass therefore it soothes data, a microphone diaphragm has inertia therefore it smooths data, a turbine flow meeter has rotational inertia, therefore it smooths data, a digital to analog converter has inductance and capacitance and therefore it smoothies data, etc, etc, etc, …….

  30. Slartibartfast says:

    My apologies, John; I had interpreted “smooth” in your question as an adjective, not a verb. My error.

    Sampling does inherently roll off the frequency content of the data. Nyquist, and all that. Sampling sensors do have frequency limitations as well. All this says is if we’re missing important frequency content by sampling and by using the sensors we use, then we’re sampling too slowly, and using sensors that are not sufficient to the task. Typically, you use a measurement process that introduces time-correlation of the data that’s short compared with your treatment of the data; in other words, you want to provide data samples to your estimation process that are spaced several time constants apart, so that they are effectively decorrelated. Barring that, you’ve got to account for the time-correlation effect in the data processing.

    Which means that, in the case of proxies (and I feel that I need to emphasize here that I am not any sort of climatologist or dendrochronologist), you have to have some kind of model that represents the timewise correlation (as well as, and this is very important, correlation to other variables) of the measurements. If the sequence in question was the reading of a single thermometer, sampled hourly over a span of years, why then you’d have to account for diurnal and seasonal temperature variation (as well as other important effects) as a prominent artifact of the measurement. I haven’t looked to see what proxies the chart in the main post represents, but a similar understanding of the random and nonrandom drivers in the measurement is a must.

    As I said: not a climatologist. But I do optimal estimation for a living, and I have a great deal of experience with model-based estimation. Most important thing: your model has to be sufficiently detailed to reflect how the real world is behaving.

  31. Peter Melia says:

    From where I stand, a Monte Carlo Routine could be either a statistical process or a bunch of really pretty girls kicking up their legs (I much prefer the latter…).
    What is a Monte Carol Routine?