Phil Jones and the Lack of Warming; Or, Die, Statistical Significance, Die

According to the stunning New York Times headline, which quoted climatologist Phil Jones, there has been no “statistically significant” global warming in the past 15 years.

Just kidding! The Times forgot to write about that. No doubt they were distracted by that golfer-guy’s TV event. Priorities!

Anyway, that’s what Mr Jones has said. Reader Francisco González has asked what that “statistically significant” means. It is an excellent question.

Answer: not much.

Here is what it absolutely, certainly does not mean: “There is a 95% chance that no warming occurred over the past 15 years.” It also does not mean: “There is a 100% chance that no warming occurred over the past 15 years.”

It also, most emphatically—slow down and read this thrice—in no way means: “We don’t know if any warming occurred.” I’ll tell you what it does mean in a minute.

It is time, now, right this minute, for the horrid term statistical significance to die, die, die (old-timers from Usenet days will grok that joke—sorry, couldn’t help myself with the second one). Nobody ever remembers what it means, and, with rare exceptions, almost everybody who uses it gets it wrong.

Statisticians have labored for nearly a century to teach the philosophy behind this term, and we just can’t make it stick. Partly it’s because the philosophy itself is so screwy; but never mind that. We must admit failure.

Here’s what “statistical significance” means in terms of global warming. Mr Jones fit a probability model to a series of data. That probability model had several knobs, called parameters, that needed to be tuned just so until the model fits. These knobs are like old-fashioned radio dials that must be twisted to just the right spot for the signal to be audible. (The data tells us the values at which to point them; only we’re never sure the data tells us the truth.)

Mr Jones looked at the array of knobs and set one of them to zero. He then calculated a statistic, some function of the data (like all the values squared then summed, then divided by another number, which is a function of the number of data points, but is not the exact number of those data points). Confused yet?

Mr Jones looked at that statistic and asked, given that my model is true—given, that is, that it is the one and only model for this data—and given that this particular knob is set to zero, what is the chance that I would see another statistic as large (in absolute value) if the world were to restart and the climate repeated itself, only this time it was “randomly” different, and I recalculated my statistic on this new set of data?

If that probability is low—usually less than the mystical 0.05 level—then the model is said to be “statistically significant.” That probability, incidentally, is called the p-value, of which you might have heard.

If that probability is greater than the 0.05, the results are said not to be statistically significant. (People then leave the knob at zero and ignore what the data says about where to set it.)

Thus, Mr Jones, in saying “there has been no statistically significant warming” actually means “I believe my model is the one and only true model for my data, and that its particular knob should be set to zero.” And that is all it means, and nothing more.

This is bizarre, to say the least, and is why nobody can ever remember what the hell a p-value is saying. Nevertheless, it is consistent with the mathematics and philosophy of a school of statistics called frequentism.

But forget all that, too. Let’s ignore statistics and turn to plain English.

Suppose, fifteen years ago the temperature (of whatever kind of series you like: global mean, Topeka airport maximums, etc.) was 10o C. And now it is 11o C. Has warming occurred?

Yes! There is no other answer. It has increased. But now suppose that last year, it was 9o C (this year it is still 11o C). Has warming occurred?

Yes! And No! Yes, if by “has warming occurred?” we really mean “Is the temperature now higher than it was 15 years ago?” No, if by “has warming occurred?” we really mean “Has the temperature increased each year since 15 years ago?”

Also Yes, if by “has warming occurred?” we really mean “Has the temperature increased so that is higher now than it was fifteen years ago, but I also allow that it might have bounced around during that fifteen years?”

Each of these qualifiers corresponds to a different model of the data. Each of them has, that is, a different probabilistic quantification. And so do myriads of other model/statements which we don’t have time to name, each equally plausible for data of this type.

Which is the correct model? I don’t know, and neither do you. The only way we can tell is when one of these models begins to make skillful predictions of data that was not used in any way to create the model. And this, no climate model (statistical or physical or some combination) has done.

So has global temperature not increased? It has not, if by “not increased” we mean…etc., etc.

Comments

Phil Jones and the Lack of Warming; Or, Die, Statistical Significance, Die — 24 Comments

  1. Could Dr. (I’m assuming) Briggs comment on the effect that imprecision in the actual measurements has on the above (if any)?

  2. The interesting question is what Prof Jones supposes he meant. I suspect he meant “I executed a numerical procedure I don’t understand and this is what it told me.” Mark you, that may be a description of his whole career. But not a complete description, since it doesn’t touch on many other issues of competence or honesty.

  3. You must also consider the fact that there is no such thing as average temperature. The system that you are measuring is not at equilibrium.

  4. we take an average of a day and use that to compute an average for a month then compute an average for a year and the average within 1200 kms and then average for some grid model and finally average for all the world. Along the way we put in questionable adjustments most of which add a warming bias, make questionable missing data fills and do questionable culling of data.

    Why do i believe any of this data?

    As a data administrator i looked at every questionable entry individually for errors using my data analysis programs to identify possible data errors. I spent many more hours verifying than i did in collecting and producing outputs.

  5. A woman walks up to the window and pushes a ticket across the counter to the Nice Man. She asks the Nice Man, “Did my horse win?”

    The Nice Man says, “That depends.”

    “On what?” says the woman, slightly annoyed.

    “It was a photo finish,” said the Nice Man. “When there’s a photo finish, the officials have to develop the film, make prints, review the pictures, make a judgment call and declare a winner.”

    “How long will that take?” asked the woman.

    “15 minutes or so” replied the Nice Man.

    “I can’t wait – I’ve got a plane to catch. How much will you pay me for this ticket now, before the judges rule?” asked the woman, getting annoyed.

    “Well,” said the Nice Man, thinking. “Your horse, Steady Winner, has won 95 of his last one hundred races. The other horse, Beautiful Loser, has won five of his last 100 races and these two horses have never raced against each other so there’s no help there. I’d have to study their past races, see if they’ve ever run against the same horses, look at their times, track conditions, time of day, phase of the moon and weight of the jockey and then maybe I could give you a price.”

    Suddenly the PA system crackled to life. “Ladies and Gentlemen. There was no film in the finish line camera. We cannot declare a winner. The race is therefore a tie. We apologize for our incompetence.”

    On the other hand, since Beautiful Loser came within a nose of beating Steady Winner, perhaps he should be declared the winner – statistically speaking.

  6. Larry T – Well said. Is there any other field where results based on these sort of data manipulations would be taken seriously? We can’t go back in time and collect better temperature records. That fact doesn’t legitimize the data games these guys are playing.

  7. Nice story, Speed, but FYI: win tickets for both horses are winners in the event of a tie; place tickets (finishing first or second) on these horses are also winners of the place pool; and there will be be no horse finishing second.

  8. Surely what Jones is saying is that based on the way he looks at the numbers that he has available to him he is not going to bet his next grant from US Department of Energy that next year’s average temperature is going to be greater than the average of the last 15 years without getting better than even odds?

  9. This endless checking of today’s temperature, this month’s temperature, this year’s temperature, the trend for the past decade, the trend for the past century, whether more snow means warming, or more snow means cooling, and on and on, wears thin. Nobody would even care about all this except that we have become technologically capable of measuring and calculating things we don’t fully comprehend, at a level of precision that is meaningless given the variability of climate. What I figure is this. If it is cold enough to cover nearly all land areas north of 45NLat once every 30 years, then it is hardly too warm. And as long as the snow and ice actually all goes away in the region between the Great Lakes and Hudson Bay each summer, then it is not too cold. We will get by otherwise.

  10. Kevin you are a genius.

    I disagree however with the posted argument re what Dr. Phil Jones meant.

    I think he meant that, given the noise in the data, that by the statistical test he is using the null hypothesis is not rejected. The null hypothesis in this case being ‘there has been no change in the mean global temperature in the past 15 years’.

  11. Ah yes the beauty of significance: and its wonderful but much abused relative confidence.

    Many years ago to help my students I invented the Cuprinol study. it’s a UK brand of wood preservative.

    It states as follows: Out of ten wooden soldiers treated with Cuprinol nine said their toes felt better for it. Discuss.

    At the time medicine and particularly epidemiology was rife with statistical abuse, still is for that matter, which is not to say today there have not been excellent large studies: but I wanted to make a point. To my surprise when I came across modern climatology I found almost exactly the same statistical abuse.

    I blame Matlab myself.

    When I was young it was usual for physicists to consult with statisticians familiar with experimental methods before constructing complex experiments in order to decide which out of many possible measurements would be valuable and which nugatory: and in this they carefully considered the limits of precision of the measurement techniques. This saved huge amounts of time and cost in performing the experiment and meant that the subsequent statistical analysis could be done with relative ease and the result had meaning: always useful even if it was a negative outcome.

    Today we have Boulton, he of the CRU enquiry, saying he is collecting 2 gigabytes of data from the Antarctic and that this increase in scale is how the science has changed. Really? and how much of that vast amount of data has any value at all? I wonder? for no amount of number crunching and statistical abuse can turn worthless figures into any kind of meaningful answer.

    Kindest Regards

  12. Starting at about minute 6 of this recent interview,
    http://www.youtube.com/watch?v=A7W4-50n1HE E.M. Smith describes the process of arriving at the average global temperature as a kind of cool computer game. A long series of pre-adjustments, adjustments, estimations, fillings, guessings inform the process. It sounds a bit like voodoo. No wonder the keepers of these treasures are not too keen on sharing their data and methods.

    And starting at minute 5 of the following segment from the same program, John Christy gives his views on how things are being measured:
    http://www.youtube.com/watch?v=Iqi2zEcz7cs

    I once saw an interview with the famous physicist Freeman Dyson where he stated very serenely that really nobody knows what such a thing as the average surface temperature of the Earth is, or should be.

    Mi impression is that the alarmist AGW hypothesis due to CO2 emissions is the ideal tool if you want a truly undestroyable argument capable of withstanding whatever is thrown at it. The uncertainties at every single pillar and beam and joint of this phantasmagoric edifice are so huge and flexible that, paradoxically, the building becomes unbreakable. It simply absorbs every blow and morphs its shape by rearranging the endless could-may-might-can rung ladders on which its stacked-up hypothesis are based. It’s literally an immortal theory. The number of bad things that CO2 may cause is probably inexhaustible, since it is obviously in direct proportion to the grant money available to demonstrate them. No wonder EVERY scientific discipline wants a piece of the pie, as shown here:

    http://www.numberwatch.co.uk/warmlist.htm

  13. In saying that ” there has been no ‘statistically significant’ global warming in the past 15 years” professor Jones probaly meant that the mean of the last 15 years is not significantly different from the mean of some previous prior periods, unspecified.

    I’m sure that he did not mean that there was no significant upward linear trend in the 15 year data, as every AGW adherent well knows that a brief period of 15 years containes only weather, not climate.

    But perhaps that applies to 15 year means as well.
    Damn, blast etc. Foiled again.

    I think he was just being nice – remember he also said in the same interview that maybe it has not been particularly hot recently and maybe it had been as hot or hotter in the more distant past or what ever.

    What a nice, reasonable man!

  14. Don’t blame Matlab A. Jones? Blame those who use statistics inappropriately. I use Matlab myself for algorithm prototyping which is an excellent platform for numerical procedure development.

  15. No. I don’t disagree: Matlab is a wonderful tool in the right hands.

    The problem. as you correctly say, is inappropriate use.

    Statistics have their use and very useful they are too but they are too easily abused or worse misused by those who do not understand them.

    This abuse goes back to almost the inception of statistical analysis: hence lies, damned lies and statistics.

    The truth is that we have we not yet learned how to use these modern computational tools properly and so they are wide open to abuse by charlatans and mountebanks, and even worse by genuine people who do not understand the limitations of such analysis.

    Kindest Regards

  16. Many years ago my friend Dave applied for and got a menial job. But to finalize the hiring process, he had to submit a urine sample. Dave knew that his sample would set off all the alarm bells, so he asked another friend, named Bill, if he could use his urine for the test. Bill was an abstainer at the time, if you know what I mean. Bill said okay, for a price. They dickered. Finally Dave agreed to pay Bill $25 for his urine.

    And that, my friends, is a real p-value.

  17. What Dr Jones meant was this: I am going to say something which appears to contradict the global warming message I and others have been giving these many years. But of course it does not, because the statement, as Dr Briggs has pointed out, does not mean much at all.

    The “deniers” (Boo! Hiss!) and media will jump on this and say “See we told you so” after which I or some member of the Consensus will point out what I said did not mean at all what it appeared to mean which just goes to show how little deniers understand science, which we have been saying for sometime, and the media journos look stupid and so will come to heel once again.

    This just proves that only I ( and the consensus) know of what we speak so everyone else should shut up, mind their own business and just believe what we tell them. More grant money please.

    The only flaw in this is I am not 95% confident Dr Jones is that bright.

  18. OK. After that very through report, I have but one question.

    Could you please repeat that?!? And in layman’s English.

  19. Briggs,

    You may have covered this topic elsewhere, but perhaps a thread on detecting secular change in the face of cycles would be useful. The climatologist faces a statistical problem like one in manufacturing. In manufacturing we are interested mainly in secular trends that suggest equipment or process is going wrong, but we remain mindful of variation that follows cycles of shift-change, days, weeks, years and so forth. The difference between the two disciplines appears to be that the climatologists, at least those paid for their work, neglect cycles, or maybe even deny they exist.

    It is darned difficult to convince me that a trend is real when I am aware of long period cycles that also affect the data, but the length and amplitude of which are unknown.

  20. Thus, Mr Jones, in saying “there has been no statistically significant warming” actually means “I believe my model is the one and only true model for my data, and that its particular knob should be set to zero.” And that is all it means, and nothing more.

    Wouldn’t it be more acurate to say, that there isn’t sufficent informatin so suggest that this particular knob shouldn’t be set to zero?

  21. I don’t know why I feel compelled to continue posting here when I could be writing a test, but I noticed that you are acquaintances with Triola of the Biostatistics textbook fame. In his chapter (6) on confidence intervals, which certainly pertains to this thread, Triola admonishes students (p. 264) that the wrong interpretation of a C.I., 95% confidence in this case, is that “There is a 95% chance that the true value of p lies between 0.226 and 0.298.”

    But in fact he writes 0.226 < p < 0.298 in order to summarize the confidence interval.

    Doesn't this just reinforce the wrong thinking about confidence interval that he tried to squash, and that is closely related to the wrong thinking about statistical significance you are hoping to correct here? Don't statisticians often make trouble for themselves?

  22. The description in the post was so abstract a lot of things weren’t clear. Should the statistic be high or low? Does the knob mean “CO2′s effect” or something like that? What does it mean to set it to zero – does it mean make the model pretend there’s no effect (or no anthropogenic emissions)? Do you retune the rest of the knobs as well to minimize error? A lot of things weren’t clear there.

    Maybe you could give an analogous example – some data and a simple model, explaining where the knob is and what it means to set it to zero?

    I think a pertinent question is, what do we want the statement to mean anyway? We’re probably most interested in the probability that the model is nonsense, but I guess that’s a necessarily unknown unknown – there are assumptions we can’t qualify, such as unknown science, or the lack of unpredictable divine intervention.

    Or maybe we’re most interested in an accurate probability distribution for “global” temperature in 15 years’ time – or the same prediction made 15 years ago and compared against the current measurement – or how unlikely the model thinks the actual fluctuations over the past 15 years were?

  23. So does it mean anything more than Jones set his knob to zero and has been polishing it ever since?

  24. I would have thought he meant the, given the amount of data over the past n years, the actual sample slope b is not significantly different from the hypothesis that the actual population slope β = 0. That is, given the amount of random variation in the data, it would be possible to obtain the observed sample slope at least five times out of every 100, even if the actual slope were 0.

    All this ignores questions of serial correlation in the series.

    A simpler test is to accept the serial correlation and use the moving range or successive differences as an estimate of short term variation and calculate from that the 3σ limits for random variation. Then check whether a) any of the points exceed +3σ and b) whether there are runs above the series median. Serial correlation upsets the probabilities of various run-lengths, of course, so we might ask for more than six or eight years above the median. If there is no serial correlation, then a) the usual calculations apply and b) I would be very surprised.