The CRU “climategate” proxy code: a primer

I’m just getting into the CRU code: it’s a lot of material and everything I say here about it is preliminary. Some of you will know what I’m going to say about proxies, but stick with me, it’s important. I apologize for the length: it’s necessary. Please help by linking this around to other sites which discuss the proxy data or code. I’ll make corrections as we go.

How do we know the temperature?

We have no direct observations of temperature—neither at the Earth’s surface or in the atmosphere—for the vast majority of history. So how can we know what the temperature used to be in the absence of actual measurements?

We can’t. We can only guess.

That’s right, all those squiggly-line pictures of temperature you see from before roughly 1900 are reconstructions; the lines are what is spit out of statistical models. They are therefore merely guesses. Even stronger, we have no way to know exactly how good these reconstructions are. If we did, then, obviously, we would know the actual temperatures, because the only way to know the actual model error is to compare the model’s predictions against the real data (which we don’t know).

To emphasize: the actual—as opposed to theoretical model—error is unknown. But we must try and estimate this error—it is of utmost importance—otherwise we cannot make decisions about the reconstructions.


How do we create a reconstruction? By using proxies, which are not temperatures but are observations of physical entities thought to be related to temperature. Tree ring widths are one well known proxy; bore-hole, ice core, and coral reef measurements are others.

Focus on tree rings, because CRU does. Through various methods, we can generally guess how old a tree is; or, that is, the years the various rings were grown—but sometimes this is a guess, too, but not a bad one. When it’s warmer, trees grow better—on average—and have wider rings; when it’s colder, the don’t grow as well—on average—and have narrower rings. The idea is sound: correlate (I use this word in its plain English sense) tree ring widths with temperature, and where we only have tree rings, we can use them and the correlation to guess temps.

(Incidentally, I find this correlation amusing. Can you guess why?)

Proxy reconstruction mechanics

Here’s how proxies work. We have some actual temperature measurements, call them yt, which overlap proxy measures, call them xt, where the subscript represents time. The next step is to build a model

   yt = m(β, xt) + error

which says that yt is modeled as a function m() of the proxies xt and (multi-dimensional) parameter β, plus some error.

The model m() is not given to us from On High. Its form and shapes are a guess; and different people can have different guesses, and different models will give different reconstructions.

Once a model is stated, statistical procedure (frequentist and Bayesian) then makes a guess about β and the error. The error guess allows us to say how good the guess of β is given m() is true. The parameter guess and values of xt where we do not know and values of yt are plugged back into the model, which spits out guesses of yt.

Reconstruction variability

Pay attention: we all know that these guesses of yts are not 100% accurate, so uncertainty about their values should be given. All those squiggly-line plots should (ethically) also contain an indication of the error of the lines. Some kind of plus/minus should always be there.

The huge problem with this is that the plus/minus lines around about the guess of β, which we don’t care about. We want to know the value of the temperature, not of some parameter. Technically, the uncertainty due to estimating β should be accounted for in making guesses of the temperature. If this is done, then the range of the plus/minus bands should be multiplied by from about 2 to 10! (Yes, that much.) And remember, all this is contingent on m() being true.

But if there is no plus/minus, how can we tell how confident we should be about any reconstructed trends? Answer: we cannot be confident at all. Since we typically do not see indications of uncertainty accompanying reconstructions, we have to hunt for the sources of uncertainty in the CRU code, which we can then use to figure our own plus/minus bands.

CRU document

One example from something called a “SOAP-D-15-berlin-d15-jj” document. A non-native English speaker shows a plot of various proxy reconstructions from which he wanted to “reconstruct millennial [Northern Hemisphere] temperatures.” He said, “These attempts did not show, however, converge towards a unique millennial history, as shown in Fig. 1. Note that the proxy series have already undergone a linear transformation towards a best estimate to the CRU data (which makes them look more similar, cf. Briffa and Osborn, 2002).”

In other words, direct effort was made to finagle the various reconstructions so that they agreed with preconceptions. Those efforts failed. It’s like being hit in the head with a hockey stick.

Sources of reconstruction uncertainty

Here is a list of all the sources of error, variability, and uncertainty and whether those sources—as far as I can see: which means I might be wrong, but willing to be corrected—are properly accounted for by the CRU crew, and its likely effects on the certainty we have in proxy reconstructions:

  1. Source: The proxy relationship with temperature is assumed constant through time. Accounted: No. Effects: entirely unknown, but should boost uncertainty.
  2. Source: The proxy relationship with temperature is assumed constant through space. Accounted: No. Effects: A tree ring from California might not have the same temperature relationship as one from Greece. Boosts uncertainty.
  3. Source: The proxies are measured with error (the “on average” correlation mentioned above). Accounted: No. Effects: certainly boosts uncertainty.
  4. Source: Groups of proxies are sometimes smoothed before input to models. Accounted: No. Effect: a potentially huge source of error; smoothing always increases “signal”, even when those signals aren’t truly there. Boost uncertainty by a lot.
  5. Source: The choice of the model m(). Accounted: No. Effect: results are always stated the model is true; potentially huge source of error. Boost uncertainty by a lot.
  6. Source: The choice of the model m() error term. Accounted: Yes. Effect: the one area where we can be confident of the statistics.
  7. Source: The results are stated as estimates of β Accounted: No. Effects: most classical (frequentist and Bayesian) procedures state uncertainty results about parameters not about actual, physical observables. Boost uncertainty by anywhere from two to ten times.
  8. Source: The computer code is complex. multi-part, and multi-authored. Accounted: No. Effects: many areas for error to creep in; code is unaudited. Obviously boost uncertainty.
  9. Source: Humans with a point of view release results. Accounted: No. Effects: judging by the tone of the CRU emails, and what is as stake, certainly boost uncertainty.

There you have it: all the potential sources of uncertainty (I’ve no doubt forgotten something), only one of which is accounted for in interpreting results. Like I’ve been saying all along: too many people are too certain of too many things.

Update See this story on evidence.


The CRU “climategate” proxy code: a primer — 42 Comments

  1. The “proxy relationship with temperature is assumed constant through time” source of uncertainty is huge.

    Small trees grow slowly. This is because they are shorter than surrounding trees and the taller trees block the amount of sunlight they receive. Once a tree is tall enough to get out of the shade of its neighboring trees, the ring width growth accelerates until the tree reaches a mature height.

    Trees out in the open don’t have this problem and have a more consistent growth rate happens (another complication not accounted for). And the problem is that if you are looking at tree ring data encompassing several hundred years, you cannot know if there were any trees blocking sunlight to the tree of interest.

  2. Bill -

    Is there a statistical inconsistency in the whole idea of using tree rings as a proxy?

    Tree growth is also strongly influenced by the level of CO2 in the atmosphere. But CRU is attempting to infer that CO2 in the atmosphere is affecting temperature, which is obtained from tree growth proxies.

    This might not be a problem if CO2 and temperature had a direct relationship. But historical ice core data shows that temperature increases preceed CO2 increases by a couple hundred years.

    For example – temperature could be on the increase, but decreasing CO2 would cause a decrease in tree growth…but the proxy relationship would indicate that temperature is decreasing.

    Does that make sense?

  3. Matt:

    Am I correct that the method of data standardization (as opposed to smoothing) also increases the variance of the estimate? It seems to me that most reconstructions standardize the data but use arbitraty centering points for the process. Perhaps this is just a variation on the points 1 and 2. That is, if effects are constant around time and space then it doesn’t matter where or shen you select a standarization point.

    David C.

  4. I may be reading too much into some of what I’ve seen in various posts from around the web, but as I understand it, the proxy series that are selected as good treemometers are those that match the twentieth century thermometer readings, (or can be calibrated to such), so many, (most?), of the proxy series, (data!), are thrown away.

    Some stats guy once told me never smooth the data so allowing for that being true then what you have left is not a lot.

    If you then have to add in data from another source as a ‘trick’ to ‘hide the decline’ then I would suggest you haven’t got a lot left.

  5. It is OK to try to find temperature proxies or more accurately heat content proxies, but in so doing we should also state the error terms and the unknown likely sources of uncertainty.

    Matt, do you have any comments on the actual way the code is configured. Folks are now unpacking the files that talk about the code. It appears as though the monthly temperature measures for NH, SH and G are not push button to say the least.

  6. Bernie,

    Looking at the code now. It’s spread out all over the place. Some ad hockeries, some commonplaces, etc. I haven’t found any line of code that says

       call ignore_calculations; output = my_cooked_numbers

    But for code of such international importance (so we are always being told), you’d think it wouldn’t be such a mess. A lot fewer comments than you’d expect.

    I’m hoping one of our regular readers, Dan Hughes, can join in on this. Software auditing is his specialty.

    Kevin B,

    Your stats guy was a genius; amen to him.

    David C,

    It can, and is one of the items I forgot. If all they are doing is subtracting a constant and dividing by another constant, the resulting uncertainty is untouched. But if they are dividing by an estimated mean and an estimated standard deviation, and those estimates can change from year to year or series to series, then you have to carry the uncertainty in their estimation through to the final answer. Probably won’t be a huge source of extra variability, as long as the series are “long.”

    Thomas M,

    It does. It was my first source of uncertainty: it can very well be that the m() should be written as mt(); that is, the model/relationship changes through time, and does so for the very reasons you mention (and perhaps others; we ignore precip, for example).

  7. All,

    And now we have the folks at Real Climate explicitly acknowledging that the proxy/temperature relationship is not fixed in time. Solution? Throw out data past the point it starts being troublesome. Good move.

    The more you toss out, the easier it is to get your data to agree with theory.

    In other branches of statistics, tossing out “outliers” is a popular pastime. Always used to good effect.

  8. Excellent post. Note that most points on your list could have been made before this hacking incident. This is why I have never bought into paleoclimatology, and have found offensive proponents’ insistence that it is accepted science and critics are merely unscientific “deniers.”

  9. By the way, is the code mostly in Matlab? I would be interested in having a look if it were.

  10. SteveBrooklineMA,

    F77, F90, IDL (Matlab-like), and others. There might be some in there, though. I might have just missed it.

  11. Mattt:
    Can you link to the RealClimate statement on a non-stable ring-temp relationship? ALso as of 1:20 PM Eastern moderation at RealClimate appears to have stopped. Anybody want to hazard a guess as to the topic of the meeting Gavin is attending?

    Since you are local to Gavin’s office and if you have some time later this evening, could you keep an eye on the dumpster out back of their office in case they decide to dump their hard copies a la ACORN in San Diego?

  12. I haven’t found any line of code that says
    call ignore_calculations; output = my_cooked_numbers

    Look in


    ; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions
    ; of growing season temperatures. Uses “corrected” MXD – but shouldn’t usually
    ; plot past 1960 because these will be artificially adjusted to look closer to
    ; the real temperatures.

    Sounds like “Cooked Numbers” to me.

  13. B. Humphreys,

    Hmmm. Well, when you’re right, you’re right.


    Link is there, hidden in one of my earlier comments. Seems to work.

    I have to fix the blue color for links: it’s too hard to see.

  14. I wonder if a similar analysis could be applied to much of the post-1860′s temperature data, i.e. not proxies but actual thermometer readings. The idea that researchers can look at spotty, inconsistent temperature records from only a small fraction of the earth in the 1800s and get a global temperature average accurate to a few hundredths of a degree is hard to swallow. Take a look at the typical temperature graph here for example:

    notice there is no error shown. Have you ever seen error shown in something like this? If you came up with a realistic measure of the error and plotted it +/- the mean, would it so overwhelm the graph that it would become meaningless?

    Imagine an Economics researcher estimating world economic output for 1865, based on whatever national reports and other data he could find from countries at that time, and reporting the result to the penny.

  15. What always bothered me about Briffa’s Yamal reconstruction was the fact that in an area with millions of trees, he selected a dozen. And if just one [YAD061, IIRC] had been left out, there would have been no hockey stick at all.

    I’m sure Dr. Briffa gave a reason for using such a tiny sample. But if he had used, say, several hundred trees from that general area, spaced equally apart, then the uncertainty about whether a particular tree had been heavily shaded, or fertilized by bear poop, or had received more sun due to an old tree falling over, or was growing near a seasonal creek, etc., would have averaged out.

    Prof. Freeman Dyson is critical of today’s scientists, many of whom would rather sit in air conditioned offices instead of getting out into the mud. But hey, when you’re just looking for something to back up your expected result, I guess a dozen trees are enough.

  16. “all those squiggly-line pictures of temperature you see from before roughly 1900 are reconstructions”

    What a con! There are very many thermometer records that go back to 1800 and a handful that match these records quite well but happen to go back a whopping 300-350 years.

    The longest running one just happens to be in the exact are is of CRU near London in England.

    Here it is:

    The next longest record matches this one quite well. It is in the Netherlands.

    Note that this single location can be seen to track the very well accepted global average T (HadCRUT3). Read that sentence again before you complain about how poorly “only one location!!!” is expected to reflect a global average.

  17. NikFromNYC,

    You’re right—except for the “con” part, which I can generously put down to enthusiasm on your part. We New Yorkers have to stick together! It would have been better if I had said that since roughly 1900 general and more widespread worldwide coverage became common. Of course, all records in the pre-satellite era are sparse. And even post-satellite, they are still somewhat sparse—and the temperatures themselves are not directly measured, but the result of a statistical algorithm (this is an inverse problem).

    I read your cautionary sentence three times, not just the recommended two. That one station doesn’t track the global temperature that well at all, at least not so closely the measurement error of the “match” isn’t crucially important. Remember that that is our goal: to understand the sources of uncertainty and to quantify our prediction error. Recall that people are exercised about a possible 1 to 2 degree C increase. Therefore, our measurement/prediction error should be well below this. It isn’t.

  18. jack m,

    Depends on the particular model. Some proxy models just use time and no other variables. So β would just apply to the time and of multipliers for the proxies.

  19. Here is what Phil Jones is speaking of. How do I know. Because this series of email and the code comments all speak of this “decline” and it’s bloody obvious to anyone familiar with dendroclimatology (tree ring studies) that they are referring to a decline in tree ring temperature proxy values from 1960-present. They call it the “divergence problem”. And the reason it’s so important to them is that they are the Hockey Stick Team and they can’t make a truly frightening Hockey Stick that claims to be quite certain due to inclusion of more than one temperature proxy (non-tree ring things like sediments) without those tree rings. The blade of the stick is actually still made of thermometer data since they mysteriously refuse to take new tree ring measurements since the “divergence problem” only gets worse: as T rises even more, cold-adapted trees quite “oddly” refuse to grow faster and faster. Not too surprising that arctic trees don’t like the arctic to warm up too much, eh?

    The latest IPCC report (#4) indeed mentions this possibility. So WHY would Phil Jones and the others and the guy commenting on computer code want to “hide” what is in plain sight in the IPCC report? After all, it’s a “well known” issue, just a technical issue, right?

    Now ask yourself if you’ve ever heard of this problem in the newspaper. Ask yourself how many people own copies of the IPCC report? How many who own a copy or have a link to the online version have actually read it through? How many people are even qualified to understand what’s in there? NOT MANY PEOPLE.

    So it is up to those who DO understand what is in there to tell the public what its content MEANS. What they *say* is that we are VERY sure that their Hockey Sticks are “settled science” and that everybody agrees with them.

    And THAT is what Phil Jones is trying to quite literally and in propogandistic form “hide”.

    He was trying to quite literally hide the fact that climate scientists all agree that tree rings as thermometers are BROKEN thermometers and they are broken in the exact way that is MOST damaging to the Hockey Sticks that appear in the IPCC report: they DEMONSTRABLY FAIL TO ACCURATELY RECORD RECENT HOT TEMPERATURES AND THUS ARE VERY LIKELY TO BE ALSO CONCEALING HOT TEMPERATURES IN THE DISTANT AND NOT SO DISTANT PAST.

    From the IPCC:

    their large-scale reconstructions based on tree ring density data,
    Briffa et al. (2001) specifi cally excluded the post-1960 data in
    their calibration against instrumental records, to avoid biasing
    the estimation of the earlier reconstructions (hence they are not
    shown in Figure 6.10), implicitly assuming that the ‘divergence’
    was a uniquely recent phenomenon, as has also been argued by
    Cook et al. (2004a). Others, however, argue for a breakdown
    in the assumed linear tree growth response to continued
    warming, invoking a possible threshold exceedance beyond
    which moisture stress now limits further growth (D’Arrigo
    et al., 2004). If true, this would imply a similar limit on the
    potential to reconstruct possible warm periods in earlier times
    at such sites. At this time there is no consensus on these issues
    (for further references see NRC, 2006) and the possibility of
    investigating them further is restricted by the lack of recent tree
    ring data at most of the sites from which tree ring data discussed
    in this chapter were acquired.”

    So much for there being a consensus. How many people know that contained within the IPCC report is the very debunking of “the consensus” of the science being settled?

    How many people know that the only Hockey Sticks that do NOT use obviously non-linear tree ring proxies either (a) fails to show that recent warming is alarming compared to 1000 years ago, or (2) use so few proxies as to become HIGHLY SPECULATIVE?

    Finally I quote from Hockey Stick man himself, the one whose most famous Hockey Stick relies not on just a tiny region of HIGHLY “DIVERGENT” trees in Siberia but in fact after he HID his data for a decade it turned it relied on a SINGLE TREE (!!!). Take that tree out and his hockey stick fades away. Take that whole little region of Siberia out (Yamal) and there is a sythe instead of a Hockey Stick, meaning temperature reconstructions suddenly skyrocket not today but 1000 years ago:

    “I believe that the recent warmth was probably matched about 1000 years ago. I do not believe that global mean annual temperatures have simply cooled progressively over thousands of years as Mike appears to and I contend that that there is strong evidence for major changes in climate over the Holocene (not Milankovich) that require explanation and that could represent part of the current or future background variability of our climate.” – Keith Briffa

  20. You know your list if sources of uncertainty had me thinking back 3 or 4 years ago when I first got interested in this subject. My initial skepticism was largely a sense of the total misplaced precision in what I was hearing. This roughly translated into, “they cannot be that certain about what they are saying. The climate system is way too complex, there are likely to be unkown variables and/or interactions of existing variables, there are way too many sources of possible errors and CO2 as the primary culprit is way too convenient.”

    When I started listening and reading, what I heard and saw mostly was confirmation bias. Katrina = global warming. Pine beetles and lodge pole and white bark die off = global warming. Floods in England = global warming. Drought in California = global warming. Glaciar quakes in Greenland = global warming. Heat wave in France = global warming. There was no real critical thinking and seldom any rational exclusion of alternative and simpler explanations. The tree ring circus and the use of the iconic hockey stick serves as a microcosm for the field as a whole.

  21. Brigg wrote:

    “I read your cautionary sentence three times, not just the recommended two. That one station doesn’t track the global temperature that well at all, at least not so closely the measurement error of the “match” isn’t crucially important. Remember that that is our goal: to understand the sources of uncertainty and to quantify our prediction error. Recall that people are exercised about a possible 1 to 2 degree C increase. Therefore, our measurement/prediction error should be well below this. It isn’t.”

    Point conceded, but the match is in fact astonishingly good for merely one thermometer! Let readers judge for themselves in my linked graphic above.

    The question that is so hard to answer is complicated. Too complicated to put in a single graph. It is this: a Hockey Stick would require 1900-present VARIATION to be unprecedented. So it’s not so much about absolute temperatures which indeed are warmer than perhaps a 1000 and maybe 10,000+ years. Maybe. It’s about natural variations before 1900. The variability in temperature shown in the long running charts seems quite similar to that of the last century and this is in DIRECT conflict with a Hockey Stick view of the past.

    I’ll argue a point for you: thermometers were quite crude prior to 1724 when Farenheight created his method of calibrating thermometers with ice and boiling water. And really “good” data indeed does not appear until around 1800. So indeed I am throwing up a highly speculative argument to counter a highly speculative argument of your own (that reconstructed temperatures are reliable). And here is where opinions may legitimately and healthily differ, but that’s my whole point: opinions differ! The science is not settled, so my point is that the IPCC’s Summary For Policy Makers is NOT a fair representation of either the science nor even of the actual contents of the IPCC reports themselves.

    If popular writers and journalists would concede this point, those of us interested scientists (Ph.D. in chemistry from Columbia and postdoc at Harvard) might not *get* so excited as to call a misstatement a “con”. But given the TONE in the CRU emails I’m afraid that the assumption must now be made that when people like you “accidentally” make an outlandish statement that is very convenient for Global Warming theory that it is very much indeed likely to be a literal dictionary example of a “con”.

    Long term thermometer records exist. You implied in your article that they didn’t. There is a huge political movement to ration energy. Long term temperature records fly very much in the face of the Hockey Sticks that the likes of Al Gore use to promote that political movement. Do you really believe that if long term records *did* show a clear Hockey Stick that climatologists would utterly omit them from consideration instead of slap them on the cover of their policy maker’s reports?

  22. Ken,

    Tomorrow, we have some fun with these questions. Stick around.


    This has always been the central problem with AGW. I’ve written about this before, but I was too technical. I’ve just hit on a way, I think, to explain this more clearly. Look for this on Wednesday.


    Sick ‘em!

  23. Being a mischievous sort, & having referenced Michael Crichton earlier/above, I can’t help but wonder if there’s something patentable, something really fundamental, in the Global Warming arena. Say a tree ring evaluation technique(s), for example – then every time someone does that or references that they owe a fee, or some other compensation stipulated by a license agreement (or whatever). Imagine how that could be used to ensure openness & transparency (of the sort that ought to be there anyway).

    Of course that seems [patently] absurd….but….as M. Crichton points out ( ), such is the state of patent law; there’s ample precedent, nutty it be. All that’s needed is some creativity, a good patent attorney/ies, and $$$$. Opportunity awaits!!

  24. Hi, sorry to bother you again, but in your discussion of y=m(beta,x), surely a vital question is the conditioning of m given the data used to construct a coefficient matrix to calculate beta? In my field, I’m always arguing with people who think that one can do totally accurate non-linear regression for 10 parameters with 11 observations or some other intellectually challenging calculation. I often find that a simple Monte-Carlo approach demonstrates that the solutions blow up for small perturbations of the data. I would guess that the CRU crowd probably weren’t into solution stability from what I’ve seen. However, as you seem to be keen on getting their code to work on their data, this might seem to be a useful, and understandable, approach in seeing if their models interpret their data in a predictable fashion.

  25. Temperature is an intensive thermodynamic variable, so I’m not certain that the mean atmospheric temperature is a valid metric. After all, mean (or average) implies some sort od addition, which doesn’t work for intensive variables…it’s like finding the mean value of a phone book. Someone explain this to me.

  26. This turned up at L’Ombre de l’Olivier, where they’re talking about the code released as part of the alleged hack:

    The real disaster is in the *.pro files. Try a search on ‘decline’. It is unabashed data manipulation . Here is a sample(flattens a warm 40′s period and warms the recent parts):

    ; Apply a VERY ARTIFICAL correction for decline!!
    valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,- 0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
    2.6,2.6,2.6]*0.75 ; fudge factor
    if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’

    Just crazy.
    debreuil | Homepage | 25.Nov.09 – 0:08 | #

  27. “Temperature is an intensive thermodynamic variable…”

    That’s a good point. I suppose one could assume the various specific heats and masses (amount of air and water, etc) don’t change much, so Temp = HeatEnergy * constant (or a sum of these).

  28. I suspect that many folks have lost sight of the ball.

    There are far more than 60,000 surface temperatures taken each day as a part of weather observations containing much more – such as wind, cloud cover, moisture, pressure, visibility, etc., from all over the globe. This information is widely distributed in near real-time. Yet these “scientists” use temperature data from a few (around 2000) mostly rag-tag observation sites and then “modify” the data for the UHI (Urban Heat Island) influence using very suspect methods.

    Further, to assert that climate can be predicted by this single dimension (temperature) is absolutely absurd. As an old (ancient actually) weatherman, I remain flabbergasted that this sort of thinking can be widely peddled as either logical or rational.

    A thought experiment: Suppose you were to ask me what tomorrow’s weather will be and I answered, “Oh, it will average 63 degrees.” What would you think? Would it be a good day for golf, grass cutting, watering the grass, fixing the roof, or painting the garage door? Will the sun shine? Will it rain? Will the wind be strong and gusty, or light and variable? Will it be foggy?

    And then what if I told you that was a “global” mean temperature?

    According to some of the program comments these guys are total klutzes that can’t even keep up with WMO block/station numbers for their base data.

  29. The non existence of a physically meaningful global mean surface temperature follows from the fact that temperature is an ordinal random and not interval as claimed in some statistics texts (see, and As illustrated in the same data set can consistent with both warming and cooling. Further statistical difficulties are discussed at

  30. Philip,

    Global average temperature is a perfectly understandable concept, as long as it is carefully defined. If, for example, I define it as the numerical mean of a group of stations, where each station provides a mean of its daily temperatures, then there is no difficulty.

    We all understand what “hotter” and “colder” mean without resort to technical terms.

    It is proper, though, to emphasize that nobody or no (interesting) thing experiences a global temperature. We experience temperatures in a place and at a time. But, even so, it is clear that these place-and-time temperatures are correlated to the global mean I have defined. (I use the word “correlated” in its plain English sense.)

  31. Not every perfectly understandable concept is physically meaningful.
    Imagine a planet, call it X, that has only two weather stations. Suppose that for many centuries the two stations both record a constant temperature of 16° X. Suppose that in a certain year one station records an average temperature of 0° X and the other 36° X yielding a mean of 18° X. Suppose the temperature scale ° X is related to another thermometric scale ° Y by an order preserving transformation (thus degree of hotness is preserved). Suppose for simplicity that under this change of scale: 0° X converts to 0° Y; 16° X converts to 4 ° Y; 36° X converts to 6° Y.
    When ° X are used, the data from the two stations shows a rise of 2 ° X in the mean.
    The same data when converted to ° Y shows a fall of in the mean of 1 ° X.
    Thus whether we deduce warming or cooling depends on the scale used.

  32. I wish to thank Briggs for this excellent and seemingly objective analysis. Of course the e-mails are a disgrace, (and if I had conducted my own PhD research in that way- adjusting the data to fit my hypothesis- I would deservedly not have received it) but the real issue to me is whether the science is correct.

    Someone needs to explain to the public how fundamentally flawed the research that is rammed down the throats of children in school is. This is the real message.

    I formerly accepted the word of these scientists that GW was a fact. After investigating myself, I now am a skeptic, which does not mean that I am not open to persuasion by proper science, but this does not seem to me to be proper science.

    I am not a programmer, but I do have a lot of experience in backtesting mechanical trading systems on stock market data, so I know how tiny changes in data can produce hugely deceptive results. In addition, the ultimate test of any mechanical trading system is whether when you test it live, it’s performance going forward is consistent with its past performance.

    At the risk of boring everyone by continuing this analogy, in my own testing I would rapidly have rejected these computer models because they clearly do not work going forward in time, and that is the only real test. As I understand it, (please correct me if I am wrong) they fail miserably to predict temperatures going forward.

    So please keep up the good work and thank you once again. The mainstream journalists are not doing the work that they should be, so people like you are vital to bursting the bobble of what I believe is some form of apocalyptic collective madness.

  33. I’ve never seen any paper claiming 1 or 2, and every paper I’ve ever read on climate change has addressed 3. 4-7 would only diminish since the dawn of modeled climate science, and would begin in an era before climate change was such a politically charged issue; thus, assuming the data is on your side, its unlikely the fears of virtually all climate change scientists would’ve survived the dawn of sophisticated computer models. Thus half of your arguments actually speak against your point. 8 is a little silly: “the code is complex, therefore scientists made up global warming?” Come on. 4-7 and 9 all require a vast concerted conspiracy by thousands of scientists working across continents and decades.

    This post to me reads more like a Truther’s critique of the 9/11 Commission or a Creationist grasping for straws about carbon-dating than a reasoned argument.

  34. John,

    Thanks; but to say something like, “I don’t believe X” is no proof of X’s truth or falsity. So if you have specific arguments about why one of the points I made was false, please do so. I can equally well argue, via statistical theory, that they are correct. Factually: I have never claimed—anywhere—that scientists have “made up global warming.” Try and read a little more around my site before making more statements like this.

    To say that the post itself is comparable to the asinine things “Truthers” or creationists say is childish of you. It smacks of an ad hominem attack, and suggests you have nothing of substance to contribute.

  35. Matt,
    I’m curious about the model
    y = m(β, x) + error

    where y are temperatures and x are proxies. I would instinctively have gone the other way around (y = proxies, x = temperatures) because I think of the proxies as the dependent variable. E.g., tree ring widths change in response to temperature. Does it make a difference?

  36. Is it not a problem that the variables in the model y = m(β, x) + error are ordinal, whereas energy -the measure of global heating- is ratio?. Temperature is not even a function of energy.
    Temperature cannot be used to distinguish between, say, water at 0 degrees C and ice at zero degrees C.

  37. I have examined climate time series for many years, and have taken the (naive?) view that researchers have been objective and honest in reporting their observations. I used this belief when working on Mann’s data (used in “MBH98″) and which convinced me completely that his infamous hockey stick plot was a fabrication. I may well have been mistaken in my beliefs about the integrity of some climatologists, especially of the paleo sub-species, it seems, but unless deliberate manipulations of original numbers on an industrial scale have been the norm rather than the exception, people like me can attempt reasonable and hopefully meaningful analyses.

    Now we have obtained some proper insight into the behind-the-scenes workings of a famous organisation, and it is not at all reassuring.

    If you have access to the GHCN data – I have a complete download of it on CD as it was about five years ago – and appropriate software, you can readily find out about the numbers of records that began x years ago, where x is a value of your choice. Records back to the early 1800s are quite abundant, although they lack “coverage” in a geographical sense. I can access any record from the above source in seconds. There are in fact 54 sites that have data starting pre 1800. All are in Europe apart from Madras, (India), York Factory and Churchill factory, in Canada, and Philadelphia, New Haven, Boston, Albany and Natchez, in the USA. So not representative, then, but still interesting. The oldest site is Berlin (1701) followed by Moscow (MOSKVA) starting in 1739. The data may not be reliable, but I’d back them against tree rings or lake varves any time!

    What these old records reveal is very intriguing, but I’ll not enter that area here. The Central England Temperatures stretch back to 1659, pre-dating the invention of the thermometer, but nevertheless believed to be reliable on a resolution of ONE DAY. I’m working on this stuff now. Monthly average data (as in the GHCN archive) are however much handier and I’ve done huge amounts of work on these.

    A further point of interest is that the alleged result of anthropogenic CO2 emissions is a very rapid increase in temperature. I’d recommend using instrumental data to check on this, not tree ring width or maximum ring density. Choose sites not too far from the Yamal site, like Abisco or Salehard and see what you find. (Some data are lacking. These are mid-winter temperatures that seem not to have been collected if it was below -20C. Presumably H & S forbids it!) They show no sign of rapid temperature increase.

    I also notice that yearly average temperatures are widely quoted. Northern latitude trees however, are said to grow only in the warm months. So how can their growth patterns possibly reflect annual averages?