Statistics

The McShane and Wyner Gordie Howe Treatment Of Mann

Many—as in lots and lots—of folks wrote in and asked me to review the McShane and Wyner paper. Thanks!

Gordie Howe—Mr Hockey to you—didn’t need his stick, his hockey stick, to plaster his opponents against the boards. Nor did he have to wave his blade, Tim-Dr. Hook-McCracken1 style, in order to fill the other team with fear. No, sir. Old Number 9 relied almost solely on his elbows to raise temperatures on the ice and score goals.

Statistically speaking, McShane and Wyner emulate Howe by applying a forearm check to the throat to Mann’s proxy reconstruction of temperature, cracking his hockey stick irreparably, leaving his models sprawling on the ice.

Like old school players, McShane and Wyner start with a little trash talking, albeit using sophisticated phrasing: “In fact, Li et al. (2007) is highly unusual in the climate literature in that its authors are primarily statisticians.” And they quote Boss Wegman—who once picked on me, publicly in print, for being a prof. at a med. school, but I hold him no grudge; just don’t let me get him out on the ice—“While the literature is large, there has been very little collaboration with university-level, professional statisticians.” The authors also show off their team, my pal Tilmann Gneiting, as well as Larry Brown and Dean Foster, all men of statistical brilliance.

But we can tell these taunts were included as a matter of form, thrown in because it is traditional. They don’t spend much time on them, and instead focus their efforts where it counts, exploiting Mann’s huge, gaping statistical five hole.

There’s little point in summarizing the statistical methods the pair use to pummel Mann: the paper is not especially difficult and can be read by anybody. It’s also so that the boys haven’t said much new2, but what they do say, they say well and plainly. It’s the sheer spectacle that’s worth attending to.

Hip check! “[A] random series that are independent of global temperature are as effective or more effective than the proxies at predicting global annual temperatures in the instrumental period. Again, the proxies are not statistically significant when compared to sophisticated null models”

High stick to the chops! “[I]t is possible that the proxies are in fact too weakly connected to global annual temperature to offer a substantially predictive (as well as reconstructive) model over the majority of the instrumental period.”

Scientific deke! “[T]he proxy record has some ability to predict the final thirty-year block, where temperatures have increased most significantly, better than chance would suggest.” Proxies and temperatures have less measurement error the closer to we are to now. This implies the relationship between proxies and temperature is not stationary, as is usually assumed. That means a model applied to data now won’t work for older data. And that means we should be even less certain.

Are their weaknesses in the boys’ approach? Sure: their base model might stink, which they acknowledge. The Lasso can overfit, usually by following spurious dipsy-doodles in the data too closely. But even if that’s so here, it’s inconsequential. Look to their Figure 16, their backcast from a full Bayesian model. What’s most lovely about this picture is that it (tries to) show the complete uncertainty of the predicted temperatures.

mcshane wyner fig. 16

The jaggy red line is their prediction, over which they lay bands of uncertainty due to various factors. Just look at that envelope of possible temperatures!—the dull gray window. The straight yellow line is mine: notice how it slides right through the envelope, never poking out through it at any point. This suggests that a flat-line, i.e. unchanging, temperature fits just as well as the boys’ sophisticated model. At least, the unchanging “null” model cannot be rejected with any great certainty.

This is just as beautiful as a shorthanded goal. It means we cannot tell—using these data and this model—with any reasonable certainty if temperatures have changed plus or minus 1oC over the last thousand years.

McShane and Wyner don’t skate off the ice error free. They suggest, but only half-heartedly, that “the proxy signal can be enhanced by smoothing various time series before modeling.” Smoothing data before using it as input to a model is a capital no-no (see this, this, and this).

Finally, we have our Amen, the touch of Grace, the last element of a Gordie Howe hat trick3, and worthy of an octopus tossed onto the ice:

Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been overconfident in their models.

Update Reader Harold Vance has answered the call of science—thanks Harold!—and provided us with a greyed out picture, shown here:

mcshane wyner fig. 16

Can a hockey stick fit this? Sure. Can a straight line? Also sure. A line which also starts high in 1000 and continuously drops until now also fits. It’s getting colder! Like the authors said, we can tell +/- 5 degrees or better, but not so well with less than +/- 1 degree.

———————————————————————————————–

1Slapshot reference. This, along with (the original) Bad News Bears, comprise the two best sports movies ever.

2For a plain language summary of the homogenization of temperature series, you can read these articles: I, II, III, IV, V

3A goal, an assist, and a fight.

Categories: Statistics

68 replies »

  1. Thanks for the great article. The hockey comparisons are really funny. Now, I have to read the actual paper.

    Even though I am not a hockey fan, I was privileged to see Gordy Howe (and his sons) play for the Houston Aeros in the 1970’s. I only went to one game, but even in his fifty’s, Gordy Howe was really amazing, and I almost never use the word “amazing” for anything.

  2. Matt:
    Can you comment on their Bayesian reconstruction? In particular, can you look at their conclusion that there is an 80% chance that the last decade was the warmest in a 1000 years?

  3. Bernie,

    The money quote is

    Using our model, we calculate that there is a 36% posterior probability that 1998 was the warmest year over the past thousand. If we consider rolling decades, 1997-2006 is the warmest on record; our model gives an 80% chance that it was the warmest in the past thousand years. Finally, if we look at rolling thirty-year blocks, the posterior probability that the last thirty years (again, the warmest on record) were the warmest over the past thousand is 38%.

    Recall our litany: All probability statements are conditional on certain premises or evidence. One premise here is the truth of the model. It’s unlikely that this model is perfect. If we allow some chance for other, better models, then the chance that, say, that the last thirty years were the warmest would be less than 38%. And the chance that the rolling decade 1997-2006 is the warmest would be less than 80%.

    Another piece of evidence is the purity of the data, assumed by their model to be measured without error. Again, not true, so we must damp down the probabilities even more.

    A third piece of evidence, an assumption, is stationarity of the proxy-temperature relationship: not that this is not the same as assuming the stationarity of either series; we only require that the statistical relationship between them is stationary. There is good evidence that this is not so (see the above, and the main article). Once more, this being so, the chances are lowered yet again.

    Since we don’t have a model for these premises, we cannot say explicitly how low the probabilities should drop; but a reasonable guess is by at least a quarter to a half. That’s based on my subjective assessment of the likelihood that (1) the model is perfect and (2) the data are measured without significant error, and (3) the relationship is stationary.

    In other words, we just can’t be that sure what the pre-historical-record temperatures were—at least, not to the tune of fractions of degrees Celsius. Ice ages we can tell, the difference between last year and, say, 1640, about the best we can do is say, “It was about the same.”

  4. Sticking with the hockey analogies, I think this paper is more of a jersey pulled over the head followed by a few solid shots. I think the stick across the nose is still to come.

    What the authors have done here quite brilliantly, is debunk the hockey stick even while accepting some of the cr*ppy proxies, and the indefensible infilling, smoothing, and filtering.

    Once they get done with that, Mann, Bradley, Hughes, Wahl, Amman, and the whole lot of them will be lying on the ice in a pool of “That ain’t Coca-Cola”.

    Any statistician worth their salt has know for years that what Mann et al has done is utter cr*p that wouldn’t pass muster in most 300-level multi-variate statistics classes. Hats on the ice for these to guys for having the stones to take on these goons.

    (Final aside to Briggs: I tried to find a way to work a Probert reference into my response, but it never felt right. Maybe you can do better.)

  5. Any truth to the rumor that Gordie Howe’s stick was made from bristlecone pine treated with sediment from disturbed Finnish lake bottoms which is why otherwise random bounces always went where he wanted them to go?

    It is also unclear as to whether going medieval on an opponent was actually anomalous where compared to later eras of the NHL.

    Lastly, the Slap Shot reference/image I would have preferred would be McShane and Wyner cast as the Hanson Brothers delivering a sequentially blade slap to the “Hockey Team.” Tim McCracken would probably be cast as Tamino in a climate-themed remake.

  6. Excellent post! I especially enjoyed the hockey theme, having grown up in western Canada and frostbitten several toes on outdoor rinks. One interesting statement in the paper is the following-

    “We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline. Without taking a position on these data quality issues, we thus take the dataset as given.”

    They do not assume the dataset is accurate. I hope there are several sequels to this paper in the works. For example, what happens to Fig 16 when the surface temperature record uncertainty is added, or when the proxy set is cleaned up, or when both are corrected. I suspect the gray caterpillar of uncertainty will grow another factor of 2.

    Its paleo-phrenology.

  7. You’ve cut-off the righter-most portion of the up-tick in the graph. Are you trying to hide the incline?

  8. I noticed the same thing as Pops… please show the entire graph, it really seems like you are hiding data. I assume it’s unintentional… but my faith in you is slowly eroding with each passing minute that graph remains unchanged!

  9. OK, Pops, you got it. The graph has been fixed.

    Mike B. We already miss Bob Probert. RIP.

  10. Matt:
    What you say makes perfect sense. Given the reasonable assumption that M & W understood these additional sources of uncertainty, what is the benefit of including it? Is it simply a throwaway? It certainly caught the attention of Tamino, Deltoid, et al. Are they perhaps demonstrating that you can produce a hockey stick if you make HS assumptions? What is intriguing to me is the sharp drop off for the probability of the last 30 years being the warmest.

  11. “we cannot tell—using these data and this model—with any reasonable certainty if temperatures have changed plus or minus 1oC over the last thousand years. ”

    Is that quite fair? Looking at your graph, and where temperature anomaly is now, it would be difficult for them to have fallen by 1c?

  12. I, like many of the readers, am not a statistician. When I look at the Mann graph, or the graphs in the subject article, I note the +/- bars and understand that the truth likes somewhere in between the extremes. However, I look to the centered line at a fair representation of the truth, feeling that the true temperature profiles lies somewhat close to the mid-point of the uncertainty. But is that true? Statistically, what is the likelihood that the truth lies up and down in the bars, such that even with Mann’s reconstruction the MWP, the LIA, and all the highs and lows we suspect are legitmately present? That we could create our own curve from his and claim equal validity?

    Regardless of the error bars, for at least non-statisticians (which includes, apparently, most of the IPCC, Al Gore and David Suzuki) the central curve IS the closest approximation of the true temperature record, regardless of the error bars or statistical uncertainty. But, is it?

  13. The straight yellow line is mine: notice how it slides right through the envelope, never poking out through it at any point. This suggests that a flat-line, i.e. unchanging, temperature fits just as well as the boys’ sophisticated model. At least, the unchanging “null” model cannot be rejected with any great certainty.

    ———–

    The flat-line, which “can’t be rejected with any degree of certainty” is the shaft of the hockey stick, and looks pretty much the same as Mann’s.

  14. Mike B,
    You are correct it is identical to the paper and stops as the explanation says in 1998. I was expecting it to go to 2008 to conform with the statement in the text. My bad.

  15. Max_NotOK,

    It do, but, and follow me closely here, the straight line doesn’t match the blade of Mann’s stick. Thus, the yellow line is more of a complete shaft.

    Doug Proctor, Roddy,

    Their red line would be the best guess, if you had to commit yourself to just one number. Chance that that one number is right is near zero. The error envelope is always—always—a superior way to do business. It says that there’s a 95% chance that the observable temperature was somewhere in that window.

  16. Terrific article and I love the hockey references, especially since my hometown, Sault Ste. Marie, was the original “Hockey Town USA.” The Detroit Redwings used to have their preseason camps in “The Soo” and they stole the moniker. Maybe that’s why I’m a Sabres fan these days.

    Keep up the good work. Whenever somebody starts talking statistics, my eyes usually glaze over… and this is after taking a couple university-level statistics courses. Breaking the subject down is key to educating the public and revealing the warmists for who and what they truely are…

  17. Briggs says:
    17 August 2010 at 11:19 am
    Max_NotOK,

    It do, but, and follow me closely here, the straight line doesn’t match the blade of Mann’s stick. Thus, the yellow line is more of a complete shaft.

    Doug Proctor, Roddy,

    Their red line would be the best guess, if you had to commit yourself to just one number. Chance that that one number is right is near zero. The error envelope is always—always—a superior way to do business. It says that there’s a 95% chance that the observable temperature was somewhere in that window.
    ——
    No, the flat line isn’t a perfect match with the shaft of Mann’s Hockey Stick. It’s not identical, but it’s pretty close.

  18. McShane and Wyner calculations assume that the tree rings measurements are somehow correlated with global temperatures. There was no global thermometer network 1,000 years ago. There weren’t even thermometers. So the paleo-reconstuctionists use selected tree rings. However, they cannot establish that there is any correlation between tree rings and temperatures, even modern temperatures. The “proxies” are not good approximations. The data are erroneous. Mann et al. might just as well have used numbers from the phone book. The whole shebang is rife with bogosity.

  19. Dave R.,

    I was up in the Soo for one year, when I was a lowly forecaster for National Weather Service (station was by the airport). I miss the auroras up there.

  20. OK, after all the fun, I got to read the damn paper …

    I have always wondered about measurement errors and how could they possibly know what the real temperatures were for 1000 years ago.

  21. Briggs,
    I saw on your resume that you did a short stint (a sentence?) in the Soo.

    If you like to watch auroras and haven’t yet done so, then you should take a trip to the arctic/subarctic… the northern lights are far more colorful than the ones in the U.P.

  22. Briggs: Why not take more of a strain and actually evaluate the paper technically? What is novel, what is repeated, what has issues. You do a little bit of this, but don’t seem to really make an effort to dig in. You’re a real statistician–I am not (despite my grapevining you on the wrestling mat, zoomie, during the wishcasting kerfuffle).

    When I had questions about McI’s screwups in MMH, I went to Annan. I could sense issues, but wanted to get someonoe strong to look at it. Turns out what I was smelling was dead on the money, per Annan (who looks both buff and like you, if that is possible).

    I wrote a balanced review of McShane over at Policy lass (who is much more comely than you, urban Republican). How about you really make an effort, man?

  23. Yeah, yeah, but get to the important stuff. What can you do for the Vancouver Canuckleheads this coming season?

  24. Great article Briggs, thanks for helping illustrate and articulate.

    I will add that given the stature and reach of this work and that of these statisticians as upstanding and devoted to their craft, we need (as I provide) reference to Larry Robinson.

  25. if you read the paper they tell you they stop at 1998 because there is no proxies after that time it is a very good read.
    lorne

  26. Actually TCO, your review smells to high heaven, shot full of conceptual errors and misunderstandings. Do yourself a favour and actually work out what you’re commenting oin first, instead of your usual snide crapfest.

  27. Excellent wit and pacing.

    “start with a little trash talking”

    Would the “Team’s” suppression of journal articles via peer pressure and review be akin to the Hansen Bros. sucker punching an opposing player before the game even begins??

  28. @PolyisTCOandbanned — Nice try for a freebie. From Briggs’ resume it seems that he is a statistician to the stars and is a gun for hire. Wanna be a star?

  29. If respecting Green Gang inputs, methodology, and highly manipulated graphics results in prima facie statistical garbage-out, we infer with a 95%+ confidence-level that Briffa, Hansen, Jones, Mann, Trenberth et al. put garbage in.

    Apparently Schmidt, Romm, and other Cargo Cultists of their ilk are hastening to censor all references to McShane and Wyner lest Chicken Little take offense. We used to think that Reality would nail climate hysterics’ coffins shut around 2016; this plus Satellitegate (sic) and various pending worldwide lawsuits make 2012 seem a better bet.

  30. John Blake,

    The day that a majority of voters understand that these guys never checked their instruments, don’t make data and code available for others, and have no interest in replicating anyone else’s work anyway, Global Warming will be dead and Al Gore will wish he were (from the embarassment).

    Try it with someone you know who thinks there might be something to global warming. Explain that they never checked their thermometers and when someone finally did, about 9 of 10 flunked basic scientific standards. That’s usually all it takes for the person to realize that incompetence reigns.

  31. “This suggests that a flat-line, i.e. unchanging, temperature fits just as well as the boys’ sophisticated model”

    Is that really true? The flat-line spends more time on the outskirts of the estimated envelope of possible temperatures, suggesting that it is a less likely trajectory than the central result.

  32. I got to see Gordie Howe play in the old timers game before the All-Star game a few years ago. Gordie’s secret was finally made clear – his stick telescopes out to about ten feet long – what a riot!

  33. I saw recently, that Gordie Howe never actually had a Gordie Howe Hat-trick, is there any truth to that, or did Micheal Mann concoct that?

  34. Matt

    Very funny post, though the hockey references required a bit of deciphering by this Brit.

    Over at Rabett Run, Eli posts the following vindication.

    My explanation at link for why the M&W reconstruction is erroneous, was a little too simple. It’s the Wabett who gets it completely right: the fundamental error is calibration only against a hemispheric average, when local data — the 5×5 degree grid cells of instrumental data as used by Mann et al. — provide a so much richer source of variability — i.e., signal — to calibrate against.

    It is this poor signal/noise ratio that helps the calibration pick up spurious low vs. high latitude temp difference “signal”, which in the reconstruction interacts with the Earth axis tilt change effect.

    What stands is the observation that doing the calibration against the instrumental PC1 (instead of hemispheric average) will give you back pretty exactly the genuine Mann stick(TM) even in spite of this.

    Congrats Eli!

    In response, one commenter writes

    Actually this argument only goes so far. Yes, the local temperatures vary more, but their coefficients are not the ones you’re interested in. They get averaged when you look at the hemispherical temperatures. And remember that the whole computation is linear in the instrumental temps: you can just add a row representing hemispheric mean to the RegEM computation tableau. Then it shouldn’t make any difference whether you first average instrumental values and then reconstruct, or reconstruct per grid cell and then average. The outcome, and uncertainties, should be the same.

    And another

    I don’t think this local/global calibration will matter one whit after the whole process is completed, given the linearity of the both sides of the model, but it would be good to check that with the exact process used, etc.

    In any case, one of the main points of the paper is that because of the weak signal in the proxies you can get almost any shape you want within the error bars depending on meta-assumptions. The reason the shouting is so loud about paleo is because it’s impossible to decide objectively about all these meta and modeling assumptions.

    In summary, because of the weak signal, and the sensitivity to assumptions, it’s (mostly) not science anymore.

    This is beyond me statistically but I wonder if you’d care to comment? Does Eli have anything approaching a point?

  35. Mr. Briggs: I too spent some time at Leader Shouldice’s Squirrel Cage (LSSU nee LSSC). I too miss the auroras. I think it’s too early to declare the “hokey stick” (to borrow someone else’s term for it)broken and dead. It is a good first attempt to get the requisite skilled people to look at the methodologies used by the “Team”. It will be interesting to see what shakes out of the trees in the following days/weeks/months. Go Lakers!!!!!

  36. Matt, I think your criticism at this point is incorrect:
    “McShane and Wyner don’t skate off the ice error free. They suggest, but only half-heartedly, that “the proxy signal can be enhanced by smoothing various time series before modeling.”

    Looking at page 26 it seems you left out the word “Perhaps” at the beginning of that sentence. My reading of the whole paragraph leads me to believe that they took yet another shot at Mann here by first hanging the “literature” on two of his papers and then going on to list the various problems with the method. The last sentence should have had you nodding (if I’ve properly understood).

    “Furthermore, smoothing data exacerbates all of the statistical significance issues already present due to auto-correlation: two smoothed series will exhibit artificially high correlations and both standard errors and p-values require corrections (which are again only known under certain restrictive conditions).

    bob

  37. OK, I always like to give the comments a go, and especially those that are critical, but TCO keeps coming up as “total cost of ownership” in all the texting dictionaries I try. Any help out there?

    thanks,

    brian

  38. I am not a statistician, and I can see your point that the temperature could conceivably be a flat line and still be within the error window, but it is inescapable that the window itself shows cooling to ca 1840, and warming since. I guess the key point is that there is no evidence of “unprecedented in the last 1000 years.

  39. Further to my previous post, I thought it might be worth flagging up the following
    comment by Chris Watkins at Climate Audit. In subsequent comments, Chris tones down his rhetoric but he does seem to be statistically knowledgable. Or perhaps as a non-statistician, I’m too easily impressed!

    The new McShane and Wyner paper due to appear in Ann. Stats. is clearly going to be much discussed, so I thought I would get in with a few comments, after scanning it briefly.

    Let me say first that it is great news that some stats journals are taking a look at climate reconstructions. Unfortunately the first half of this paper is very silly, and the second half is slightly more sensible, and the most plausible reconstruction they produce…..looks rather like the hockey-stick.

    In the first half, they take 1200 temperature proxy series (treated as independent variables) and fit them to 119 temperature measurements (keeping overlapping holdout sequences of 30 yearly temperature measurements). Fitting 1200 coefficients to 119 data points is of course hopeless without further assumptions. Instead of doing some form of thoughtful data reduction, they employ the lasso to to the regression directly, with strong sparsity constraints.

    They justify their choice of the lasso by saying:
    “…the Lasso has been used successfully in a variety of p >> n contexts and because we repeated
    the analyses in this section using modeling strategies other than the
    Lasso and obtained the same general results.”
    Both parts of this statement are wrong, and the first part is a MORONIC thing for statisticians to say. They give absolutely no reasons to suppose that the Lasso — a method that makes _very_strong_ implicit assumptions about the data — is in any way appropriate for this problem.

    The Lasso _is_ appropriate in certain cases where you believe that only a small subset of your variables are relevant. To use it as a substitute for any data reduction with 1200 variables and 119 data points, when _all_ the temperature proxy series are presumed to be relevant to some degree, and all are thought to be noisy, is simply stupid.

    Not surprisingly, they find they can’t predict anything at all using the Lasso. (It is a completely inappropriate technique for the problem.)

    In the second half of the paper, they do something which is almost sensible, (but less sensible than what the climate modellers do). They take 93 proxy series that go back a thousand years, and do OLS regression on various numbers of principal components of these series. Regressing on just one PC gives more or less Mann’s curve (ironically this is probably the most defensible prediction from all the ones they try)– when they regress on 10, they back-cast historical upward trends. If they were being agnostic statisticians, then I suspect that from the cross-validations they show, the most conservative model they could choose would be a model predicting on one or a very few principal components.

    Hey presto, they’ve recovered Mann’s hockey stick as a most plausible estimate. As Garfield would say, Big Hairy Do.

    That’s the bulk of the paper. Some but not all of the points they make about over-tight confidence intervals in the previous literature seem valid.

    In my opinion they do not introduce any useful new techniques into palaeclimate reconstruction: their main contribution is to show that using the Lasso with no prior dimension reduction is as useless an idea as any sensible person would expect it to be.

    This paper shows, if proof were needed, that it is possible to get ill-considered papers into good peer reviewed journals, especially if they are on hot topics.

  40. RichieRich,

    Chris Watkins has a difficult time understanding the key figure. Unfortunately, the figure is presented in a sub-optimal way to show that the hockey stick is hokey: perhaps he just has hockey sticks on the mind and sees them everywhere.

    I tried to improve the original figure by showing the straight yellow line. But even better would be to gray out everything between the error envelope. The real result is the envelope itself. If all we see is a sea of gray, then the idea of over-certainty sinks in better. Then you can see that we don’t know half as much as we thought we did.

    I think the original Mann, and it’s slow degradations and changes over time are like willowy clouds: people can see any shape they like in them. What somebody sees in it is like a cheesy Rorschach test.

    Once more, if a straight line fits inside the window, it cannot be ruled out.

    Is there anybody graphical out there who can help with this? I mean, can gray out the figure? My artistic skills are nil.

    Incidentally, the Lasso is no more inappropriate for this data than are (single) principle component regressions. Plus, Watkins has forgotten to examine the overlap sample fit.

  41. Keep in mind that McShane and Tyner do not address the preferential inclusion of those samples showing stick behavior and the systematic omission of large numbers of nearby samples that do not. That is, they dealt only with statistical problems, not the cherry picking.

  42. Flangepp: It’s not true that Gordie Howe never had a “Gordie Howe Hat-trick”. But it is true that he never had many in his career, and several players since then have recorded far more than he did. That statistic, however, must be considered in context. The reason why Gordie Howe recorded so few Gordie Howe Hat-tricks is that in his long career only a few men were crazy enough to fight Gordie Howe. (Just look at the photo of Lou Fontinato’s face.) One must be careful when interpreting old hockey statistics, just as one must be careful when interpreting tree rings.

  43. Noblesse Oblige says:
    Keep in mind that McShane and Tyner do not address the preferential inclusion of those samples showing stick behavior and the systematic omission of large numbers of nearby samples that do not. That is, they dealt only with statistical problems, not the cherry picking.

    I think they dealt with this in the comment “We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline.”

  44. TGSG says:
    I think they dealt with this in the comment “We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline.”

    Indeed. And one can read a firm tongue in cheek there, if one wants, though it is an appropriate stament for them to make since they are statisticians, not climate guys.

    My point was really that the gray error range is really substantially larger because of the cherry picking. Matt may have impied this when he said “it (tries to) show the complete uncertainty of the predicted temperatures,” emphasis “tries to.”

  45. As always, Briggs, outstanding.

    As someone who estimates econometric equations for a living, let me put in my two cent’s worth as well:

    1) the grayed-out chart is indeed where the real story is. This means the “real” numbers could be anywhere within that path, stated with 95% confidence that this is so. That does leave a 5% possibility of individual or multiple outliers, but let’s accept that this is the reality of the situation for now.

    2) Under the assumption of a normal distribution, the red line of the original chart could suggest a cooling and then a warming. What is not, however, stated, is the reason that a normal distribution would be applicable here (if anything, given the claim of warming, you’d expect to see a skewed distribution that would drive up the temperatures…but this is not discussed anywhere).

    3) The authors do look at distribution and find that it is problematic: see fig. 7 on p. 12 of the paper. That is, however an indication of the fundamental problem: this distribution is actually the test of proxies vs. actual temperatures (lag one autocorrelation coefficients between proxies and instrumental temperatures during the instrument period), meaning…what?

    If these proxies were accurate, the distribution would be around the 0 point on the axis. They are not: they are centered around 0.5° C instead. Hence the proxies have an inherent bias upwards, and are rejected by the authors of being usable to reconstruct the series:

    “To assess the strength of the relationship between the natural proxies and
    temperature, we cross-validate the data. This is a standard approach, but
    our situation is atypical since the temperature sequence is highly autocorrelated.
    To mitigate this problem, we follow the approach of climate scientists
    in our initial approach and fit the instrumental temperature record using only proxy covariates. Nevertheless, the errors and the proxies are temporally
    correlated which implies that the usual method of selecting random
    holdout sets will not provide an effective evaluation of our model.”

    This simply means that the proxies cannot be used for regression analysis to determine temperature, as their error terms are not independent from that of the instrument temperatures. They can be used as a proxy for temperatures, but NOT as an independent variable to regress a temperature from multiple proxies.

    4) From 1 and 2 we do not know the distribution of the mean from within the set of proxies, just the variance from a possible joint mean. From 3 we know that we can’t use a simple regression analysis to find that mean.

    5) The idea that the series should be smoothed to improve them, especially for regression analysis…ye gods.

    Here’s that paragraph in its entirety:

    “Finally, perhaps the proxy signal can be enhanced by smoothing various
    time series before modeling. Smoothing seems to be a standard approach
    for the analysis of climate series and is accompanied by a large literature
    (Mann, 2004, 2008). Still, from a statistical perspective, smoothing time series
    raises additional questions and problems. At the most basic level, one
    has to figure out which series should be smoothed: temperatures, proxies,
    or both. Or, perhaps, only the forecasts should be smoothed in order to reduce
    the forecast variance. A further problem with smoothing procedures
    is that there are many methods and associated tuning parameters and there
    are no clear data-independent and hypothesis-independent methods of selecting
    amongst the various options. The instrumental temperature record
    is also very well known so there is no way to do this in a ”blind” fashion.
    Furthermore, smoothing data exacerbates all of the statistical significance
    issues already present due to autocorrelation: two smoothed series will exhibit
    artificially high correlations and both standard errors and p-values
    require corrections (which are again only known under certain restrictive
    conditions).”

    Here is one of the key problems, correctly identified but ignored: that smoothing seems to be a standard approach for the analysis of climate series. It has always bothered me, since conclusions reached on that basis are determined more by the smoothing methodologies, rather than the data. That is fundamental!

    Smoothing removes exactly the information needed to do things like determine distributions of the data and make all the fun tests available moot. It creates artificially high values for things like r^2 or DW, making the data look like it is vastly better than it is, and for me that is a sign of intellectual sloppiness and very, very poor methodology.

    Now, there is one thing that has always bothered me and in my cursory review of the literature simply doesn’t appear: smoothing methodologies always appear to be some sort of multiple moving averages. Why is not something used that is heavily documented, widely accepted with a very large body of analysis and statistically sound, such as … seasonal adjustment? Census X-11, anyone? Even X-12 with its totally funky AR components?

    It seems to me that this is really what the climatologists should be using: they want to have instrumental readings that have outliers removed in a statistically sound manner, that removes spurious and random variances, that identifies noise, seasonal factors affecting the development of a time series, and provides both a forecast of seasonal factors for a full year as well as a long-term trend component of the deconstructed original data, while all the time remaining within strict statistical parameters.

    Just a thought. As I said, I haven’t been able to research this, but perhaps one of the others has. Seasonal adjustment should be the basis for climate research, methinks…

  46. PS to the above:

    Obviously, seasonal adjustment can only be used for monthly and quarterly frequencies: smoothing annual time series has to be done differently. But my point remains: there has to be a clear reason for smoothing annual series, rather than working with the original data and using statistical distribution analysis to determine variance from the mean. Failing to do so means that the data is not being properly analyzed.

  47. RichieRich:

    On the Rabbet and his idea that M&W only calibrated against a NH Average, I guess he didn’t read section 3.6 of the paper titled:

    3.6. Proxies and Local Temperatures. We performed an additional test which
    accounts for the fact that proxies are local in nature (e.g., tree rings in Montana)
    and therefore might be better predictors of local temperatures than
    global temperatures. Climate scientists generally accept the notion of ”teleconnection”
    (i.e., that proxies local to one place can be predictive of climate
    in another possibly distant place). Hence, we do not use a distance restriction
    in this test. Rather, we perform the following procedure.

    In that section M&W use the local temperatures to calibrate the proxies and give it the same test as the NH average test and found them to be only marginally better:

    The results of this test are given by the second boxplot in Figure 9. As can be seen, this method seems to perform somewhat better than the pure global method. However, it does not beat the empirical AR1 process or Brownian Motion. That is, random series that are independent of global temperature are as effective or more effective than the proxies at predicting global annual temperatures in the instrumental period. Again, the proxies are not statistically significant when compared to sophisticated null models.

    Secrtion 3.6 starts on page 23 and ends on 24

    So they tried both NH average and local

  48. They were using Mann’s very own personal favourite pictures in his wallet in place of family members data fragments. They’re just so darned cute!

  49. Kris,

    Many people seem to be pointing to the Zorita people as take down of M&W. I don’t see that all. Most of the criticism seems to be on tangential points and it contains a number of inaccuracies itself (as pointed out in the comments). Zorita even slights the paper by saying that its conclusions are already well known! Hardly a devastating criticism.

    The final point by Zorita (“It is not clear to which models the authors are referring to … If they mean ‘climate models’, they are again dead wrong) is mystifying. The word “model” is used throughout the paper and always with the same meaning (reconstruction statistical models). This attempt by Zorita to set up a strawman is rather foolish and suggests he has not read the paper in detail.

  50. Undoubtedly there will be problems with that post, and I agree that some of them seem tangential comments that do not affect the results. But in that post and a related one at Deep Climate (http://deepclimate.org/2010/08/19/mcshane-and-wyner-2010/) there seem to be quite a few weaknesses being pointed out in the statistical techniques that M&W used. I’m neither a statistician nor a climate scientist so I can’t judge the significance of these comments for the final conclusions of the paper. But I am a working scientist who reads and produces papers for a living, and I know that one paper rarely holds the final word on a topic. Rather, the truth emerges slowly from an ongoing debate between people who are knowledgeable but may hold different scientific opinions.

    In any case, what you have said does not alter my main point of criticism: if it is problematic when climate scientists work without help of statisticians, then the reverse should also be avoided.

  51. CLIMATE SCIENTISTS ACTING BADLY

    RE: Briggs responding to Bernie:

    Money Quote, bottom of page 37: “Using our model, we calculate that there is a 36% posterior probability that 1998 was the warmest year over the past thousand. If we consider rolling decades, 1997-2006 is the warmest on record; our model gives an 80% chance that it was the warmest in the past thousand years. Finally, if we look at rolling thirty-year blocks, the posterior probability that the last thirty years (again, the warmest on record) were the warmest over the past thousand is 38%.”

    NOTE: RealClimate selectively quoted the “80%” phrase out of context to suggest the paper endorses Mann’s original (http://www.realclimate.org/index.php/archives/2010/08/doing-it-yourselves/)

    WHAT they omitted were repeated acknowledgements by the authors that the data did NOT support any real conclusions:

    Page 36: “The major difference betweem our model and those of climate scientists, however, can be seen in the LARGE WIDTH of our uncertainty bands.”

    Page 37 (preceding the “money quote”): “…our uncertainty bands are so wide that they ENVELOPE all of the other backcasts in the literature.”

    Page 41: “…our Table 2 shows that our model does not pass “statistical significance” thresholds against savvy null models. Ultimately, what these tests essentially show is that the 1,000 year old proxy record has little power given the limited temperature record.”

    ALL OF WHICH is to state that the authors are acknowledging that there’s little or nothing to be concluded trendwise from the data — Briggs point.

    BUT CONSIDER & CONTRAST WITH REALCLIMATE’s SPIN:

    The RealClimate summary presents the trendlines sans error bars in the author’s proper context to imply agreement! Furthermore, the RealClimate item makes the concluding statement:

    “…In the meantime [see footnote ‘a’ below], consider the irony of the critics embracing a paper that contains the line “our model gives a 80% chance that [the last decade] was the warmest in the past thousand years”….”

    [a] with “meantime” regarding the pending publication of the final paper, which may differ somewhat from that currently published on-line; and, “critics” referring to so-called climate “deniers.”

    That quote, taken grossly out of context, is willfully manipulated & used to create the illusion of agreement where the opposite exists.

    Its hard, impossible, to believe they’re that stupid there. Which makes for pretty clear evidence of manipulative deception (discouraging critical thinking on the part of their acolytes). Evaluating their presentation via application of the legal standard of “the truth, the WHOLE truth, and nothing but the truth,” one is hard-pressed to conclude anything other than that they’re lying.

Leave a Reply

Your email address will not be published. Required fields are marked *