William M. Briggs

Statistician to the Stars!

On Global Warming Apoplexy: Temperature Trends

It is a sure sign that Sanity has packed her bags and headed for the door when otherwise sober scientists begin slinging around terms like “denier” and “denialist.” Language like this displays willful, pretended, or real ignorance of the historical context of these words. Anybody who talks like this makes himself an ass. They’s fightin’ words which start any discussion on an angry footing, their presence a certain indication we are dealing with zealotry, not science.

Let’s look again at the claim made by the scientists at the Wall Street Journal, over which many have popped their corks:

The lack of warming for more than a decade—indeed, the smaller-than-predicted warming over the 22 years since the U.N.’s Intergovernmental Panel on Climate Change (IPCC) began issuing projections—suggests that computer models have greatly exaggerated how much warming additional CO2 can cause.

There are two claims made here. Given the observational evidence we have, both claims appear true. The first (A) is that for the last ten years it has not grown warmer. Since it has grown warmer in some places and colder in others, this is evidently a claim about some global average and not any individual station. The second claim (B) says that the IPCC forecasts have been systematically too large: it is also concerned with some global average.

Both of these claims are quantitative and subject to easy verification. A person’s politics surely has no bearing on whether they are true or false claims. Now, the “global average” referenced is not a static thing, in the sense that, say, measurements from identical (and identically situated) thermometers at fixed locations are averaged together and called (arbitrarily, of course), the global average. Instead, the global average as it is operationally defined mixes sources and locations freely each year (and even within years). Therefore, when the “average” is computed there will be some uncertainty in it. Further, the uncertainty is larger in times historical than in times present. (There is even some uncertainty at individual locations, because no measurement apparatus is perfect, but this is generally small, though not always, especially in the past or when using proxies: see this series.)

The BEST people, for instance, recognized this and attempted to account for measurement uncertainty by speaking not just of averages, but of averages plus-or-minus. We can, and I did, argue over the better way to calculate and display this uncertainty. All we need to understand here is that some techniques underestimate this uncertainty. Actually, we don’t even need to agree about that: but we do need to see that some uncertainty is present, however small.

This is necessary because if we make claim (A), as the WSJ fellows did, we need to take uncertainty over the global average into account or we cannot know whether the claim is true or false. It is at this point when a lack of understanding of statistics can become a real hindrance. Sloppy language also hurts immeasurably. Let’s work through this slowly.

Suppose we have ten years of uncertainty-free global average temperature measurements. We can line them up and ask questions of this series. Was the temperature ten years ago warmer or colder than the temperature this year? All we have to do is look: it will be true or false at a glance. Was the temperature nine years ago warmer or colder than this year? True or false at a glance. And so on.

What does this mean in the context of claim (A)? Well, (A) says that temperatures have not gone up over the last decade. To verify this, all we need do is look to see if any of the temperatures of the last decade are lower than they are this year. If any are, the claim is false. If none are, the claim is true.

Maybe. Because claim (A) can also be taken to mean that at no time over the last decade have the temperatures increased (they could have stayed constant from year-to-year). Again, we can verify this claim with a glance at the data.

Which of these definitions is right? Evidently neither, because we all understand that the temperatures have some uncertainty in them. Because of that, we cannot just look at the data to say whether it has gone up or down; we instead have to speak of changes in probabilistic terms. And that means hauling in some kind of model.

The simplest (but not so good) model is to imagine each year’s data is irrelevant to knowing each other years’ data. That is, we take this year’s data and display it as an average with so, a plus-or-minus attached to indicate our uncertainty in it. That plus-or-minus can only come from some kind of probability model, meaning that the range of uncertainty will change when the model changes. Which is the best and most proper model? Nobody knows. But let’s imagine we all agree on one, such that displayed before us is a temperature series of averages and plus-and-minuses.

Now, if claim (A) means that temperatures this year are less than or equal to temperatures ten years ago, then we can make a comparison as before, but our comparison will be accompanied by a measure of uncertainty. Using predictive techniques (yes, this is the proper word: see this series), we can ask questions like, “Given the data and assuming our model is true, what is the probability this year’s temperature is less than or equal to temperatures ten (or nine, etc.) years ago?” Notice that this is not the same as a “t-test” or any other kind of statement about parameters of probability models: it is a statement about observable temperatures.

Or, if claim (A) means that temperatures did not increase even once over ten years, then we can get the probability of this just as simply. In support of either version of claim (A), I said that we cannot know with probability greater than 90% that temperatures have increased (over this last decade). In other words, it is likely that claim (A) is true.

This is so using the probability model I indicated. But what if we instead change the model to a linear regression—i.e. a straight line—drawn through the data? Well, we could go through the same steps and ascertain claim (A) in light of this model. But before we can begin we have several things to decide. Why a straight line? Just because it’s easy? Lazy, that. From what year do we start? See this post for the ways that choice can lead you wrong. Do we start with a date (as I joked) in the Jurassic? Or, for fun, in 1973? Every different start date will give a different answer. I will repeat that: every different start date will give a different answer. It is also a stretch, to say the least, to assume temperature always has been increasing in a straight line from whatever start date we pick. (Before the politicization of this subject, every physical scientist would have agreed with that last statement.)

But suppose we do agree on a date: 1964, say, a very fine year. Are we done? No, because we cannot forget that the data that goes into the straight-line model is still measured with uncertainty. We must, just as we did in the first model, account for this uncertainty. That means drawing any kind of naive line (even bold red ones) guarantees over-certainty.

Even if we were to agree on a date—in real life we do not—we could use a model of the measurement error, incorporate that into the model of straight-line change, and then assess claim (A): it is still probably true.

The best thing to do is to model the data in an intelligent way, taking into account the correlations of year-to-year (both auto-regressive and moving average), the measurement error, etc., etc. Hard work! As Doug Keenan has pointed out (often), it’s too much like work for anybody to do. I’d do it myself, but my check from Big Oil hasn’t yet arrived.

Whatever else you do in life, you must not, you must never, look at the pretty red (or blue, etc.) straight line you have just drawn and claim it is, or think of it as, the real data. (It is only in climatology where I have seen scientists forget error bars, and then pitch a fit when somebody points out the omission. You at least have to put predictive, and not parameters-based, error bars on the line, even ignoring measurement uncertainty of the data.)

What about claim (B)? Also likely true, as is generally recognized. We still have to incorporate the uncertainty in the global temperature measurements—there is no or little uncertainty in the forecasts—but this is no different than before.

What about the counter-claim (C) that the 2000′s where the “warmest years on record” or the like? It is trivially false. The 2000s simply were not the warmest. Four billion years ago, Earth was much hotter. “Wait! It’s obvious we weren’t talking about billions of years ago. Cheater! Denier!” Well, it isn’t obvious. What years did you have in mind as comparators? Ah, that’s the real question, isn’t it.

Did we mean just the last century? The last 1000 years? The last 10,000? What? You must supply a starting year. To make the claim (C) that it’s hotter now than before, you must tell us what you mean by before. If you say “before” means the last ten years, then claim (C) is identical with claim (A). If you say the last 200 years, then you have to do what BEST tried and incorporate the non-parameter error bars, otherwise there is no way to compare what happened a century ago with what happened last year. Obviously, the further you go back, the larger those uncertainty bars become, therefore the more difficult it becomes to claim (with any certainty) that now was hotter than then.

As I often say, over-certainty abounds in this field. People speak of models (statistical and physical) as if they were truth, as if the data that goes into them were granted some kind of special immunity from ordinary criticism. And when the critiques come, that’s when the asinine language breaks out. All sense of humor evaporates.

You would think that because both claims (A) and (B) are likely true (and claim (C) is unproved or likely false) that we have found a reason to celebrate! Perhaps our worst fears won’t be realized after all. This is good news! Wouldn’t it be great if we really did over-emphasize feedback in climate models and that whatever changes we do make to the climate are easily mitigated and not as horrific as posited?

Why so glum that things are so good?

Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.

Update It is imperative that all read this series, where I describe just how so many people make mistakes. Those below who have been shouting the loudest are most in need.

133 Comments

  1. I have often made the point on my blog and elsewhere that this sort of petty name calling is not the behavior of people who are confident in their position or who have faith in the scientific method.

    To me, it smacks of desperation.

  2. Dikran Marsupial

    2 February 2012 at 1:04 pm

    Climatologists tend to use linear trends when discussing whether the earth has been warming or not. Using that approach it is easy to test A, in the sense that you can perform a test for the statistical significance of a warming trend, using zero warming rate as the null hypothesis. Where the skeptics go wrong is in not investigating the statistical power of the hypothesis test, which tells us how likely it is for the null hypothesis to be rejected if it is false. For a decadal trend, the power is very low, so we should expect not to see a statistically significant trend even if it is actually warming.

    Hypothesis tests are assymetric, failing to reject the null hypothesis doesn’t necessarily mean the null hypothesis is true, or even likely to be true, it just means that there is insufficient evidence for us to be confident that it is false.

    No, the decadal trend is not significantly different from zero. This is not a suprise to anyone with a grasp of statistics who is familiar with the data.

    “We still have to incorporate the uncertainty in the global temperature measurements—there is no or little uncertainty in the forecasts—but this is no different than before.”

    This is simply incorrect, the spread of the multi-model ensemble is an indication of the uncertainty of the forecasts, they are archived and available, and anyone who has been following the discussion of model data comparison will know they are broad.

  3. Briggs

    2 February 2012 at 1:16 pm

    Dikran Marsupial,

    You haven’t read the links I provided. But your misunderstanding is so common, that I’m glad you brought it up.

    A hypothesis test first assumes a probability model is true. It then calculates a test statistic and asks, if this experiment were repeated an indefinite number of times, what is the chance of seeing a test statistic larger (in absolute value) than the one I did see, assuming the parameter associated with straight-line warming is set to 0? If this probability is small, the parameter is set to 0 and the hypothesis is said to be “confirmed.”

    This is entirely—as in entirely—different than asking, “Given the data and assuming the model is true, what is the probability this year’s temperature is larger than the temperature ten/100/1,000/whatever years ago.” There is no “null” hypothesis, no parameters, no mystery. It is a probability only about observables, about what is tangible, about what can be verified. The only thing in common from this “predictive” approach and yours in the model.

    Actually, even these are different, because you assume only a straight line fit. I will even concede the straight line, but you have to concede to include the measurement error.

    Oh, and what year should we start the straight line?

    Read the links, people. We’ll save ourselves much unneeded discussion (and typing).

    Bob,

    You are a denier! You deny my denials!

    My point was that old Phil—a smart and ordinary reasonable guy—started it by calling names. But, to mend fences, I apologize to old Phil. My tone was flip and my bad jokes were not serious enough. I will now await Phil’s apology in return.

  4. Dikran Marsupial

    2 February 2012 at 1:35 pm

    As it happens I am also a Bayesian and I am more than aware of the subtleties involved in the interpretation of frequentist hypothesis test. While such hypothesis tests are flawed, when used correctly (i.e. in a self-skeptical manner) they are a reasonable approach.

    Only a very poor statistician/scientist (even a frequentist) would ever say that an hypothesis test confirms anything. All you can say is that you can reject the null hypothesis or that you fail to reject the null hypothesis. If you can reject the null hypothesis then this suggests it is reasonable to continue with the experimental hypothesis on a provisional basis.

    Computing the probability that this years temperature is warmer than the temperature of N years ago” is a reasonable thing to do from a statistical point of view, but it is not very useful from a climatological point of view. Climate change has two components, forced climate change (i.e. change that results from a change in forcings, such as CO2 radiative forcing or solar forcing) and unforced climate variability (i.e. weather noise, things like ENSO). The weather for a particular year is very sensitive to unforced variability, so if you look at the temperature for two individual years that is mostly what you will be looking at, and your estimate of the change will be unstable and not very informative. We are more interested in the forced component. The forcings change more slowly with time, so we want to look at the average change over a period rather than the start and end points in isolation. Climatologists have found this more informative, and a good statistician needs to work with those who best understand the data to determine the most appropriate form of analysis.

    Now. if you want to work out the probability that the slope of the linear trend is greater than some threshold. Then that would be a much better approach (no null hypothesis there either).

    Or you could have a null hypothesis and compute the Bayes factor instead (that is probably what I would opt for since there are two parties with different hypotheses about what the data mean).

    I have no need to concede the measurement error, uncertainty is what statistics is all about, and we should include all sources of uncertainty.

    As to what year we should start the straight line? Well how about using a long enough timescale so that the (frequentist) test would have useful power. That is pretty much what the climatologists already do; that is why they adopt approx 30 year trends where the data is available.

    From a Bayesian perspective, the more data you have (all things bein otherwise equal) generally the less uncertainty there is in the conclusions you can draw from the analysis. So I would use as long a window as I could where the known behaviour of forcings suggests that a linear model is likely to be a reasonable approximate.

  5. This entire argument is a load of evasive hand-waving. As Phil says, “My main point — that the WSJ and DM articles are wrong, that we have lots of evidence the Earth is warming up, that 9 of the 10 hottest years on record occurred since the year 2000, that the DM article specifically uses scientific studies and presents them as if they say the exact opposite of what they actually say — still stands.”

    He called them names because they were obviously and deliberately misrepresenting the conclusion of those studies. They deserved it, and you don’t even try to address the main point, instead diving into a bunch of irrelevant nitpicking.

  6. Why so glum that things are so good?

    Because they bet on the other team. Bet big and didn’t hedge.

  7. Dikran Marsupial

    2 February 2012 at 1:53 pm

    BTW, on the “Bad Astronomer Does Bad Statistics: That Wall Street Journal Editorial” thread, would you be happier if the linear regression were shown with a credible interval (hyperbolic? “error bars” above an below showing that there is a 95% probability that the “true” regression lies somewhere inbetween)?

    I did ask on that thread, but perhaps it got buried.

  8. Briggs

    2 February 2012 at 2:00 pm

    Dikran Marsupial,

    Great News (your Bayesian salvation, I mean)!

    As interesting and useful of discussions of forced versus unforced “variability” are, they are irrelevant utterly to whether claims (A) and (B) are true. It matters not how the data arose to answer the claims of whether the temperature is higher or lower over the last decade.

    Unless you make the claim that the model you use is true: if it is, say if the IPCC models are true, then claim (B) would have been false. Claim (B) is likely true, meaning the models used to forecast are likely false.

    We are also not interested in the “slope” of some straight line: it too is irrelevant to the claims (A) and (B). Even if we are good Bayesians, we still must admit that a posterior on the slope parameter of some straight-line model is still a posterior over an unobservable parameter. And no matter what the posterior says about the parameter, it still does not answer whether claims (A) or (B) or true. We need to speak of the observables, i.e. the “posterior predictive” distribution. Only that distribution—which still assumes the truth of some model—can give us the probabilities that (A) or (B) are true.

    And if our model is any good, it should be able to skillfully predict new data.

    Further, Bayes factors are computed over parameters. We want those parameters out: we want the full uncertainty about observables.

    And, yes sir, you do indeed have to concede measurement error. You, Marsupial, may understand this. But scarcely anybody else does. Why leave it out when you know it’s there, unless your goal is biasing toward a predetermined conclusion or ignorance (I use that word in the nicest, technical sense) of how to incorporate it?

    It is also not so that a linear model is a reasonable approximation, especially when we have “more data.” There are plenty of reasons to suppose it is a horrible approximation, especially as our data set grows (in time).

    Now, just to eliminate my confusion. You agree that claims (A) and (B) are likely true?

  9. Ryan Cooper: “. . .that we have lots of evidence the Earth is warming up . . .”

    The earth is warming up? As compared to when? Who is disputing that it is now warmer than a couple hundred years ago? Is Phil saying that it “has warmed” but isn’t anymore, or is he making the much stronger claim that earth is “continuing” to warm up? It seems the latter, from his verb tense. Ah, but that is the precise point at issue, which is why the discussion is relevant. We cannot say with any certainty that it in fact is continuing to warm; indeed we have very good reason to believe it hasn’t continued to warm over the last decade plus. And why then hasn’t it continued to warm, even in the presence of increased CO2? What should we make of it? It this a temporary blip in an otherwise ever-increasing rise? Why would such a blip occur? What does it tell us about the ability of our models to predict future temperature increase? On what rational basis should we trust the cited models, which have, to date, overpredicted the warming?

    Inquiring minds want to know.

  10. @Eric Anderson,

    Which part of the nine of the ten warmest years on the instrument record have all been since 2000 do you not understand?

  11. @Ryan Cooper: “He called them names because they were obviously and deliberately misrepresenting the conclusion of those studies.”

    Really? How do you know it’s not just an honest difference of opinion? What’s your evidence that it’s deliberate?

    Moreover, what is Phil’s evidence to the contrary? He did not state what it is, he only asserted that it exists. As far as I can see, he did not demonstrate in any clear fashion what the source of the WSJ error was.

    @Ryan Cooper: “They deserved it, and you don’t even try to address the main point, instead diving into a bunch of irrelevant nitpicking.”

    Name calling says more about the caller than the callee, just so you know.

    Rational people recognize that argumentum ad hominem merely demonstrates the intellectual weakness of those employing that device, and that the need to do so is a consequence of being unable to conduct the argument on its merits.

  12. Dikran Marsupial

    2 February 2012 at 2:23 pm

    Whether forced versus unforced variability are relevant to assertions A or B depends on whether assertions A or B are used in a discussion on anthropogenic climate change. If the discussion is about AGW, then it is the forced component that is of interest. If A and B are merely abstract questions regarding a feature of the data, then yes, they would be irrelevant. However the question is ultimately about AGW.

    No sane statistician would cliam that any model is true. As GEP Box said, “all models are wrong, but some are useful”. The IPCC models are the climatologists best estimate of the plausible range of climate physics. which they can use to make projections based on their current understanding. The IPCC know that perfectly well.

    Claims A and B are too vague to rule out a linear trend as a reasonable interpretation. You may not think they are relevant, but perhaps you should ask the climatologists.

    “And, yes sir, you do indeed have to concede measurement error.” you miss my point, I wouldn’t have to concede measurement error as it would be in my analysis already. The reason it tends to be left out is that it makes hardly any difference. If you want to demonstrate that this is not the case, that would be a useful addition to the discussion.

    I did say that the period should be chosen so that a linear approximation is reasonable from our knowledge of the forcings. To clarify, what I meant is that one should use the longest period over which we can reasonably expect the forced component of climate change (i.e. the thing we are interested in) is approximately linear. Of course if you look at longer time periods, this approximation becomes less and less valid. But over the periods that climatologists use (e.g. 30 years) it is reasonable.

    To avoid confusion:

    A – “for the last ten years it has not grown warmer.” If you mean that global surface temperatures have not increased on average over the last ten years, then yes, I agree that this is true. Is it surprising or meaningful? No, and probably not. Decadal periods of little or no warming are expected to happen because over a decadal period the effects of the forced component of climate change are smaller than that of unforced variability. However on longer timescales, the unforced component averages out, which is why you need longer windows to make inferences about the forced component (which is the thing of interest in climatology).

    B – “says that the IPCC forecasts have been systematically too large” No, I would not agree with this, we have only one noisy observation of the earths climate, which means we cannot estimate the forced component of the observed change. Hence we cannot know whether the forecast is systematically wrong. The observations lie within the spread of the model, that is all we can expect, even if the models happen to be correct. This is because they predict the forced component of climate change only, the spread of the ensemble gives what is plausible when you add internal variability (which you can simulate but not predict).

  13. “…9 of the 10 hottest years on record occurred since the year 2000…”

    Yes, and Shaquille O’Neal has been one of the tallest men in the NBA for years. But that’s very different from saying that he’s getting taller.

  14. The real hilarity starts when they say that in the decadal trends, natural variation is so noisy compared to the forcings that they’re meaningless (remember, they don’t actually understand either well enough to say on what time scales this claim should operate), and then you squint at them and ask why, then, they are so very very certain that we’re doomed to Warmocalypse by 2100 (esp. given the abysmal predictive failure of Hansen 1988). Usually at that point they mutter something about war crimes trials and storm off to get arrested (objectively!) outside of a coal plant.

    @Trent1492 — Depending on the instrument, the record is either only a few decades long or fraught with so much uncertainty (even for UHI and TOD changes alone!) as to make that observation meaningless.

    @Briggs — The Big Oil claims are perfectly reasonable when you remember who we’re dealing with here — skeptics get just as much money from Big Oil as climatologists get from gov’t+enviro groups, within the same error bars you should see around the IPCC estimates.

  15. “The IPCC models are the climatologists best estimate of the plausible range of climate physics. which they can use to make projections based on their current understanding. The IPCC know that perfectly well.”

    It isn’t so much that anyone thinks that the IPCC don’t know that, as that they are of the opinion the IPCC should have striven for a higher level of predictive reliability before climatologists branched out into multi-trillion dollar global policy consulting.

  16. Trent1492: “Which part of the nine of the ten warmest years on the instrument record have all been since 2000 do you not understand?”

    Let’s set aside all the issues about data and coverage and instrumentation, because I’m willing to stipulate just for purposes of your question that 9 of the 10 warmest years in recent memory have been since 2000. Let’s assume for purposes of discussion that this is an incontrovertible fact.

    Now what? Precisely what is the argument at this stage? Is it continuing to warm, as Phil hints, but doesn’t come out and clearly state? Hmmm . . . doesn’t seem to be warming for the last decade plus, and yet CO2 continues to rise. Why is that? Is it an abberation that will soon give way to the inevitable warming, or should we take the data seriously and ask some thoughtful questions? Why hasn’t the increasing CO2 caused increasing warming? Are there other factors at work which might be more significant? How can we tell that the warming, which has apparently plateaued, was anthropogenically caused? What should we make of the fact that models did not predict this hiatus, and that models have regularly overstated the warming? On what basis should we trust distant future temperature predictions, which are farther out, and by definition, even more uncertain?

    (And these are just a few of the scientific questions. We haven’t even begun to discuss the subsequent economic and social questions about what and, indeed, if anything should be done about the warming. Those questions would need to be handled separately and in their own right, even if the alleged “continuing” warming were an iron-clad fact, foreseeable with our crystal-ball models into the distant future.)

  17. @Dikran Marsupial: “… the spread of the multi-model ensemble is an indication of the uncertainty of the forecasts, …”. The variance of different models’ (mean) forecasts bears no relationship to the prediction intervals of each model.

    You could easily have two models which have *large* prediction intervals, but whose mean forecasts (the pretty red lines down the middle) are very close to each other. I believe that is Briggs’ point.

    @Briggs: I’ve been suspicious of linear trends myself, since the justification offered is something like “on a short-enough timescale, just about any time series can be approximated with a linear trend”, but at the end of the day the conclusion extrapolates into the future implying that the linear trend’s fit is in fact not short-term.

    Beyond that, I’ve been wondering about lagged models. For example, someone uses ML to determine that forcing X has maximal influence at lag Y months. This assumes that X enters the system without effect, lingers in the black box for Y-1 months, bursts onto the scene in month Y, then disappears. I’ve experimented a bit with Distributed Lag models, which do improve results, but there are several knobs that should be set to plausible values (as opposed to ML values). Any thoughts?

  18. @Matt,

    That is an extremely bad analogy. You would need one side to claim he is getting shorter and another to point out that using fully grown human being are not on any growth curve by definition.

    @Tall Dave, The temperature record has been subject to a lot of scrutiny in the scientific literature and has passed that scrutiny. You are making assertions that are not factually true. I suggest you contemplate why the every indicator for a warming Earth is unreliable using peer reviewed literature. E.g we know that:

    89% of the glacier are melting.

    Arctic Sea Ice is shrinking

    Spring is arriving earlier by weeks .

    Now the above is not even a partial list of natural indicators of a warming world.

  19. Dikran Marsupial

    2 February 2012 at 3:04 pm

    @Wayne The point is that the mean of the multi-model ensemble is a projection of only the forced component of climate change, as the effects of unforced change cancel out when you take the average. It is not directly a projection of what we will actually observe, just the component that is caused by a change in the forcings (in this case CO2). So if the observations don’t match the multi-model mean that doesn’t imply that the models are systematically wrong becuase the difference may be due to the unforced component of the observed climate. However, we should expect the observations to lie within the spread of the ensemble. This point may be lost on many in the general public, but it is central to understanding what the IPCC projections actually say.

    Statement B is essentially a misrepresentation of what the IPCC models actually say. I am not surprised that such misinterpretations ocurr, I use Monte Carlo simulation on a regular basis, so I am familiar with the basic concepts, but I had to work hard to properly understand how these were applied in climatology.

  20. @Dirkan: “I did say that the period should be chosen so that a linear approximation is reasonable from our knowledge of the forcings.”

    Sorry, I realize this is a bit tangential to the topic, and I’m certainly not a “climatologist”, but isn’t this the source of the main debate in “AGW” — whether we know what we think we know about the “forcings”? So, please forgive my ignorance, but it seems to me that choosing model parameters based on what we think we know then means that we shouldn’t be surprised when the models reflect the selection of the parameters, and so the models cannot really be used to test our presumed knowledge without demonstrating the skill of the models in predicting reality. But, if we adjust either the actual observations or the model output to reconcile the divergence between observation and prediction, how are those reconciling values generated independently of the models (or the presumed knowledge that was used to construct the models)? Do climate scientists consider it important to have independent reconciling values? Would you be so kind as to enlighten me on the source of those values, and the uncertainty contained in them.

    You say “decades of little or no warming are expected to happen”. Do the climate models give any quantitative sense to this expectation?

    You say “on longer timescales, the unforced component averages out”. Is this the equivalent of saying that there is some number n, such that T_i – T_(i-n) for all observed T_i would have a mean of 0? If so, do you have an estimate of that number n? Or maybe you refer to something like a least-squares regression through the observations having a slope of 0?

    Is the earth’s natural radiation of heat into space considered a “forcing”?

    I’m very grateful for any help you may be able to provide, as I am trying very hard to untangle all this climate information.

  21. Dikran Marsupial

    2 February 2012 at 3:28 pm

    @Big Mike, the linear regression we are discussing isn’t a model used to make predictions/projections, it is just being used to summarise the rate of warming over some period. We actually know a fair bit about the forcings, most of which we can measure directly (e.g. Total Solar Irradiance, which is measured by satelites) or indirectly (e.g. CO2 radiative forcing – we can measure the rise in CO2 and from that calculae the forcing) with reasonable accuracy. What is more difficult is the unforced variability, which is chaotic and hence unpredictable. This means we can know whether the forced component ought to be approximately linear.

    The general circulation models are based as far as possible on our knowledge of the physics, rather than being statistical models that are fitted to the observations. The projections are best viewed as saying “based on our current understanding of the physics, this is the range of things that we consider to be plausible”. We don’t have observations of the future, but we have decisions that we need to make now, so while this is not an ideal situation, we have to base our decisions on the best understanding we currently have.

    For a discussion of how these models work and how they are parameterised, you would be better off asking a climate modeller (e.g. Gavin Schmidt at Real Climate); I am only a humble statistician, I have learned enough about the models to be able to work with model output and to understand the requirements of the climatologists, but I am no expert on the workings of GCMs.

    The climate models do indeed give a quantitative sense of this expectation [of decades of little or no warming], there is a good paper on the topic by Easterling and Wehner that addresses that very topic,

    http://www.agu.org/pubs/crossref/2009/2009GL037810.shtml

    I don’t know whether the radiation of heat into space is considered a forcing, but I do know it is modelled in general circulation models (you can’t make a model of the greenhouse effect without it).

    Hope this helps!

  22. @Eric Anderson,

    May I suggest that you actually read what Phil Plait says? He does not merely hint that the world is warming and it is human he says it flat out.

    So let us start with a head line from the relevant articles: http://blogs.discovermagazine.com/badastronomy/2012/01/30/while-temperatures-rise-denialists-reach-lower/

    How can someone interpret that as some sort of tentative hint? He clearly states it in the headline/b>. Need another example? OK, here you go:

    Go look at my article. If you remove that graph from it, what changes? Nothing. My main point — that the WSJ and DM articles are wrong, that we have lots of evidence the Earth is warming up, that 9 of the 10 hottest years on record occurred since the year 2000, that the DM article specifically uses scientific studies and presents them as if they say the exact opposite of what they actually say — still stands.

    Can it be said anymore clearer for you? If so how about you going over to Bad Astronomy and you ask the question directly. Your silence on this issue will be interpreted as concession to reality. Good for you.

    Now I wonder how you can both acknowledge that the 9 out of the 10 warmest years on the instrument record have happened since 2000 and then turn around and say we are not warming. You contradict yourself, sir.

    What is further puzzling here is that you seem to think that laws of thermodynamics have suspended in the case of global warming. You seem unaware that as far back as 1896 based on the physics it was predicted what a CO2 induced warming world would look like. Are you familiar with anything I just said? Are you aware that predictions were made back in 1896 that have been observed in the late 20th and early 21st century? Can you name those predictions that have been observed? Can you provide an alternative empirically based peer reviewed alternative to that body of predictions? I think it is time Mr. Anderson for you to rethink where you get your information on the science because you have been badly let down.

  23. Darn! My html skills so suck. For the confused the first paragraph with the quotes is the quoted material.

  24. @Trent — Scrutiny is irrelevant to the factual question of what certainty level applies. As for your various “natural indicators,” they were all just as true in the MWP. All indications are we are just continuing the natural warming trend that began after the LIA.

  25. The claim says that temperatures have not gone up over the last decade. To verify this, all we need do is look to see if any of the temperatures of the last decade are lower than they are this year. If any are, the claim is false. If none are, the claim is true.

    Maybe. Because claim (A) can also be taken to mean that at no time over the last decade have the temperatures increased (they could have stayed constant from year-to-year)…

    These are two confusing paragraphs to me.

    Example #1:
    Last decade:
    2002 – 14.45, 2003 – 14.47, 2004 – 14.48, 2005 – 14.48, 2006 – 14.49,
    2007 – 14.49, 2008- 14.50, 2009 – 14.50, 2010 – 14.51, 2011 – 14.51.

    This year: 2012 – 14.46.

    Year 2002-14.45 is lower than this year 2012-14.46. Hence I conclude that the claim that temperatures have not gone up over the last decade is FALSE.

    Example #2:
    Last decade:
    2002 – 14.45, 2003 – 14.47, 2004 – 14.48, 2005 – 14.48, 2006 – 14.49,
    2007 – 14.49, 2008- 14.50, 2009 – 14.50, 2010 – 14.51, 2011 – 14.51.

    This year: 2012 – 14.44.

    None of the temperatures is lower than this year 2012-14.44. Hence I conclude that the claim that temperatures have not gone up over the last decade is TRUE.

    Note that the only difference in the examples is the temperature for year 2012.

    ______

    Now, Mr. Briggs, how about using the data in my above two examples to demonstrate what you say that one is supposed to do in this post? Ultimately, I would want an answer of yes or not to the question of whether the temperature gone up over the last decade.

  26. I would want an answer of yes or no to the question of whether the temperature has gone up over the last decade via BAYESIAN METHODS.

  27. @Dikran Marsupial

    “So if the observations don’t match the multi-model mean that doesn’t imply that the models are systematically wrong becuase the difference may be due to the unforced component of the observed climate. However, we should expect the observations to lie within the spread of the ensemble.”

    With what predictive reliability, though? That’s the kind of question the IPCC needs to be more honest in answering, even though doing so will certainly negatively affect their funding and influence. The scientists who study these kinds of predictions find them to be failing so badly as to have “no scientific basis” by which to predict climate.

  28. @Briggs — BTW that was a noble, if futile, effort to try to teach Tamino some statistics. He still doesn’t understand the issue of the error estimates, or even what the wrong estimates mean, but at least you tried!

  29. @Tall Dave,

    Do you remember when I said that I had not even given a partial list of the indications of a warming world? Well, you should. Further, you are ignoring the context of the conversation when Eric Anderson questions the reliability of the temperature readings and I am pointing out that the natural world is exhibiting all the symptoms of a warming world. So can you try to stay on topic?

    Now onto the attribution of global warming being anthropogenic. There are about a half dozen empirically based peer reviewed evidences. To keep it simple I am going to name just three:

    1. The stratosphere is cooling while the troposphere is warming. This is also a falsification of the solar hypothesis. First predicted back in 1967.

    2. Nights are warming faster than days. Another falsification of the solar hypothesis incidentally. A prediction first made back in 1896.

    3. Winters are warming faster than summer. Again, another falsification of the solar hypothesis made back in 1896.

    Now all of the above are the anthropogenic foot prints of a warming world. some of those empirically based observations were predicted some as far back as 1896.

    Now do you have an empirically based peer reviewed alternative explanations that explain even my partial list of observations? You do not have the option of ignoring the evidence.

    Facts are stubborn things – John Adams

    I am more than willing to provide citations in the scientific literature for the predictions and their observations. Just ask.

  30. @Dikran Marsupial: It seems to me that for what you’re saying to be true, you have to say that:

    1. The various models feature the same forcings. I’m not sure that this is the case. Do they all have exactly the same forcings? Do they all feature, say, only CO2 as a forcing and that’s it?

    2. There are enough models where the (random) non-forcings will cancel out. Seems to me that this is a large-numbers kind of thing. Is the ensemble that large?

    3. The various models have the same set of coefficients, with random settings to tune them. If the settings are not random, they are not exploring the space, but rather are correlated and so will naturally tend to clump together, resulting in a narrow variance between forecasts.

    Or am I misunderstanding the ensemble (perhaps confusing it with ensemble weather forecasts) and it really is something like 1,000 MCMC-style runs with coefficients tweaked, to see what results?

  31. @Trent1492 — My original point was that there is too much uncertainty in the surface temp record for your assertion to be meaningful, and you haven’t posted anything but irrelevancies in response. You don’t even appear to understand the point; I suggest you re-read the author’s post.

    As for your “evidence” of AGW which you brought up off-point, I don’t know why you think any of that is inconsistent with natural variability (which is not the same as “solar”).

    Here’s some real science. Note the temp anomaly has once again gone negative.

  32. Re: Dikran Marsupial:

    “Decadal periods of little or no warming are expected to happen”

    May I ask you how many periods of 10-13 years of ‘little or no warming’ have happened in the past without a preceding major volcanic event?

    Re: Trent1492 says:

    “Which part of the nine of the ten warmest years on the instrument record have all been since 2000 do you not understand?”

    If – miraculously – the temperature were to drop steadily for the next 10, 30 or 50 years then your statement would still be correct. The only thing that statement proves is that it WAS warming, not that it will continue to do so.

  33. @Trent1492:

    “Now all of the above are the anthropogenic foot prints of a warming world. some of those empirically based observations were predicted some as far back as 1896.”

    So would it be correct to say, then, that the state of science is such that possible explanations of these phenomena other than human influences can conclusively be ruled out, or is there, in your opinion, room for uncertainty in an objective observer?

    “Now do you have an empirically based peer reviewed alternative explanations that explain even my partial list of observations? You do not have the option of ignoring the evidence.”

    Can I rightly conclude that your view is that absence of evidence is equivalent to evidence of absence? You do know the logical weakness of that perspective, right?

    “This is American Idol” — Ryan Seacrest

  34. I did try to post a comment last night on Tamino’s blog where he had insisted our host’s statements that “averages are models” and “uncertainty is unaccounted for” were wrong.

    Either I didn’t push “Post Comment” hard or often enough, or perhaps the post didn’t find favour with the blog’s owner, but it has failed to appear. I doubt the latter because all I did was give a little lesson on models and abstraction and also encouraged Tamino to check all the sources of uncertainty (quantified and unquantified) that takes one from the individual measurements through to assertions about whether trends are significant or not, to make sure they were properly accounted for.

    It may of course have been that I ended with:

    I do think a lack of methodological understanding creates unnecessary debate about uncertainty. I’m sure it use to be taught in statistics courses because this is what statistics talks to. It is the kind of stuff you also learn in philosophy (and not just the philosophy of science), formal logics, formal systems theories, algebras, automata theory, linguistics and the like where understanding abstraction and meta systems is fundamental to the discipline.

    Perhaps Tamino felt that a philosophical discussion about statistical methodology would be a distraction to his average reader (the chorus there is a little shrill – not at all my usual experience with Die Zauberflote).

    For my part this recent discussion about climate science has rather bought into focus a rather obvious but ironic point.

    Given the complex nature of the climate system and the nature of the policy decisions that rely upon it, the preoccupation of applied climate science should really be much more on the uncertainty, with a view to understanding, quantifying and reducing it.

    Instead discussion of uncertainty is seen as a trick to avoid action rather than the means to the right action.

  35. @Tall Dave,

    Your assertions without any science are not worth the bandwidth they take up. The fact remains is that you said that my original indications of warming world could just be as true for the Medieval Warm Period. I pointed out that is immaterial to the point I am making in regards to the baseless assertion that the world is not warming. The point of contention is the world warming or not? I simply pointed out that nature is weighing in with its own verdict. A verdict that includes, shrinking Arctic sea ice, shrinking glaciers, and spring coming earlier and earlier.

    You then moved the goal posts from the question of a warming world to asserting that those observations could be natural. I then pointed out to you that there are a whole host of evidences for anthropogenic warming. Such as, nights warming faster than days, the stratosphere cooling while the troposphere warms and winters warming faster than summers.

    Your response? That I am “off topic” . Really? It is like you think I can not scroll back up the thread.

    Do you understand that saying all those foot prints are simply natural variation is demonstrating that you do not understand why those phenomena are unique to being human induced?

    Now I am going to ask you to once again provide a coherent empirically based peer reviewed citation that can explain those phenomena. You do not have the option of ignoring those facts.

  36. Robert in Calgary

    2 February 2012 at 6:27 pm

    @Trent1492,

    “Your assertions without any science are not worth the bandwidth they take up.”

    A perfect assessment of your own babble here.

    “Which part of the nine of the ten warmest years on the instrument record have all been since 2000 do you not understand?”

    If it’s true, yawn…so what? BFD! The science doesn’t support CAGW.

    You’re still losing, get used to it.

  37. @ Big Mike,

    Here is the problem. We are talking about physics and its predicted and observed effect in nature. Full Stop. We have predictions based on physics. We have observations that we have had the physics basically correct, such as the cooling stratosphere, nights warming faster than days, and winters warming faster than summers. We are not operating in a paucity of evidence. This is not post hoc reasoning about the phenomena going on here but is in the best traditions of science of making predictions and letting reality be the finial arbiter.

    Saying that there could be some alternative coherent empirically observation out there to these physics predicted and observed phenomena is fine. What is not fine is failing to provide such explanations and yet still characterize your assertions as rational. That is not how the enterprise of science works.

    I see that you have quoted Carl Sagan. May I suggest that if you are truly interested in rational thought that you read the Dragon in My Garage from The Demon Haunted World and ask yourself how you and others constant refrain of its natural variation is not like that invisible dragon.

  38. @Robert From Calgary,

    So why do you not tell us how it is “babble” using peer reviewed empirically based science. Now I note that you do not bother to address the fact that since you acknowledged that since 2000 the past 9 of the ten hottest records that you are in direct contradiction of those who idiotically claim we are entering a cooling phase. I wonder why? Oh, I know. Because you would then be in contravention of the First Iron Law of Denialism. That law being that no denier shall ever critique another no matter how contradictory the others position is. Glad we got that cleared up.

    I also note that you and your fellow ideologues just simply refuse to address the evidence. You know like:

    The stratosphere is cooling while the troposphere is warming. This is also a falsification of the solar hypothesis. First predicted back in 1967.

    2. Nights are warming faster than days. Another falsification of the solar hypothesis incidentally. A prediction first made back in 1896.

    3. Winters are warming faster than summer. Again, another falsification of the solar hypothesis made back in 1896.

    Yet, you just can not even begin to address them using empirically based peer reviewed science. I wonder why? Is it because there is no alternative coherent empirically based explanation for those predicted and observed phenomena? I think so. Disagree? Then show it.

  39. If I may diverge the discussion slightly to one of English, I can see three statements that relate to this discussion:

    1. The world has been warming (past tense)
    2. The world is warming (present tense)
    3. The world will warm (future tense)

    Statement 1 can be analyzed for its truthfulness using the approach stated in the original post, with the issues indicated in the original post.

    Statement 2 is a statement regarding the current rate of temperature change, but since the temperature is always changing (it’s a very noisy dataset, even leaving aside the complexities on calculating the temperature of the world in the first place), it needs a time period to be associated with it to allow the current rate of change to be calculated. An instantaneous rate of change (ie. 1st derivative) is either not calculable (because we don’t have a universally agreed curve) or meaningless (because the curve is too ‘noisy’ for an instantaneous rate of change to make any sort of practical sense).

    Some of the discussions above are, in effect, asking the question as to whether a ten year period is sufficient to determine the current rate of temperature change. The original post also raises some of the questions on how the current rate of temperature change could be calculated.

    Statement 3 is a statement of prediction. The whole AGW debate is about whether it is possible to predict future temperature changes, which I suspect our host would rephrase as a question of the predictive capabilities of the models being used.

    My opinion is that statement 1 is true over the last 100-200 years, and definitely over the last 40 years.

    My opinion is that statement 2 is probably false. Please note that this statement does not make a prediction as to future activity. It does NOT state that the world will not start warming again – it merely comments on the current state of events.

    My opinion on statement 3 is that I don’t know enough to be able to state if I believe it is true or false.

  40. Trent1492: “. . . when Eric Anderson questions the reliability of the temperature readings . . .”

    That, sir, is either a mistake or a misrepresentation. I specifically assumed the reliability of the readings and everything behind them as part of the discussion.

    We need to take a deep breath and make the intellectual effort to honestly understand the difference between:

    - has warmed, but is no longer warming
    - has been warming, but the warming is slowing
    - is continuing to warm, at a steady pace
    - is continuing to warm, at an increasing pace

    Which is it? That is the simple question. If we take the reported temperatures as gospel, which you insist that we do and to which I am willing to stipulate for purposes of analyzing this specific question, the last decade plus certainly doesn’t give us any reason to think that the last two are right. The first two, however, are more consistent with the data. Why do we see this data, particularly in the face of rising CO2 and the computer projections of warming? These are interesting questions.

    And this still doesn’t address the subsequent questions about the extent of anthropogenic involvement, CO2′s role, whether anything can be done about it, whether anything should be done about it, and on and on — questions that have even less certainty than the simple question of whether we are still seeing warming, a question on which minds can apparently differ . . .

  41. Graeme W, apparently we were writing at the same time and when I posted I saw your post. Good, and similar, thoughts. I am focusing on the recent past and the recent decade+, which is why I don’t discuss the future, but I think you have some good thoughts in that regard.

  42. Thanks, Eric. I believe a lot of the angst on my statement 2 is because too many people mistake it for statement 3. That is, there’s an implicit assumption that the current rate of temperature change will continue indefinitely… which, when examined objectively, I think everyone will agree is an incorrect assumption.

    Why do people get so defensive about a statement on the rate of warming for the last ten years? Because they want to use to either say it falsifies a prediction of future warming (which, logically, it doesn’t), or because they want to say it’s irrelevant as to whether there will be future warming/cooling (which may be true – but too many people in this camp seem to refuse to acknowledge what the rate of change has been over the last ten years. It would be much better if they acknowledged the point and THEN make then make the comment that it neither refutes the historical record (statement 1) or refutes future predictions (statement 3).

  43. Robert in Calgary

    2 February 2012 at 7:44 pm

    @Trent1492,

    It’s not my job to prove anything. These days I don’t invest much time educating people like you with your mind welded shut. There’s no profit for me.

    It’s your job, all the folks promoting CAGW.

    You are the ones making claims and making demands to reorder civilization.

    Close to 25 years pushing this scam.

    I just came across this item. It fits right in.

    http://wattsupwiththat.com/2012/02/02/terrifying-new-book-about-climate-change/

  44. @Eric Anderson,

    Unfortunately, your questions aren’t quite so simple, particularly when people insist on being obtuse about them. They depend on:
    a) The definition of warming. It’s a gerund, so it’s ambiguous as a blanket statement. Where I’m at, it is ‘currently’ warming on a monthly scale since we’re entering Spring, but it’s also cooling on an hourly scale since night is coming on. ‘Has warmed’ depends on the length of the intervals you’re comparing and the ‘since when’ .
    b) Similarly, the definition of a ‘steady pace’, ‘continuing’, ‘slowing’, etc. For any of these you can get a yes or no answer by carefully picking your definition and the data you include. That’s the kind of game Briggs is playing above.

    The question is, what definitions and data sets are relevant to the concerns at hand. So, for instance, the temperature of the earth 4 b. years ago is not relevant, while that which is available from the instrument record, or from the beginning of the rapid escalation of industrial CO2 output, is.

    The last decade of data doesn’t give us any reason to think the first two of your scenarios are right or the last two are wrong. Luckily, we have much more than a decade, plus we have an understanding of fundamental physics and measured parameters from mountains of data. That’s not to say that we don’t want to improve further, but we can reasonably say that a) it has warmed, b) there is good reason to think it will continue warming, and c) there is no significant indication that it has or will stop warming. Again, those statements are to be understood in the relevant fashion, which is something like: trends on a multi-decadal time scale, i.e. such that they imply persistent climate shifts rapid enough to have a large-scale impact on human society.

    Why do you see this data ‘in the face of rising CO2…’ ? Because rising CO2 has an effect on the long term averages, while shorter term effects can counteract it on decadal scales but are expected to average out on longer scales, revealing the long term trend. This is consistent with computer projections. It’s like betting at a casino. On a given night you might have a great win, but in the long run the house comes out ahead.

  45. Robert in Calgary wrote: “The science doesn’t support CAGW.”

    If you want to debate climate change policy, that’s a valid goal. However, this thread is not the place for it.

    But what you asserted is a claim regarding the validity of the science that informs the choice of policy. You cannot falsify a scientific observation by mere assertion.

  46. @Robert of Calgary,

    Actually Robert when you have been provided with evidence for a proposition you can either rationally exam the evidence or be considered another irrational troll who will not consider the evidence. That is your choice. What you do not have the option of is to be considered a rational actor in a area of science if you forgo examining the evidence of for a theory. I am sorry Robert but your tone and outright rejection of the evidence places you deep in the crank ranks.

  47. Mr Anderson,

    It seems that Josh has answered a lot of your “questions” but I am going to add a few remarks here. You do realize that the instrument record for NASA igoes back to 1880 and for HAD-CRU to 1850? If you look at the NASA temps for example you will see that that the temperature anomaly is now 0.8c above the 19th century. Half of the temperature rise has occurred since 1980.

    Now you seem very stubborn in your refusal to recognize that these temperature rises and the accompanying climate change are based on physics. Could you give me so indication that you accept the laws of thermodynamics? I ask because you seem very reluctant to acknowledge that they have not been suspended in the case of AGW.

    I also want to note that you silence on Phil Plait’s supposed muted support on the reality of the recent temperature rise is just a product of your careless reading.

  48. Robert in Calgary

    3 February 2012 at 12:18 am

    @Trent1492,

    Yes, of course, I must be the crank.

    Assign yourself the victory sir. The rest of us will keep on laughing.

  49. Trent,
    You are losing it. “Could you give me so indication”
    These are the words of a coke addict close to crashing.
    I suggest you take a break and come up with a coherent write up that Mr. Briggs
    will most likely be happy to post and then tear to shreds.
    BTW. You might want to read your discourse and note how has it has degraded before
    simply attacking me.

  50. Briggs,
    Since I am a bit belligerent tonight.
    Noticed you never responded to JH.
    I have similar concerns.
    You are consistent in saying “just look at the data” but as JH pointed out
    What do you do when the absolute numbers do not behave.

  51. @ Bill,

    May I suggest that you actually proffer a critique about what I wrote instead of declaring it null and void. I am sorry but your declarations hold no authority. Now onto Mr. Briggs and his gross incompetency. People far more qualified than I have torn him a new one.

    William M. Briggs: Numerologist to the Stars

    I particularly like this post because Mr. Briggs shows up and demonstrates gross incompetency. Take this little exchange:

    Briggs said: Notice old Phil (his source, actually) starts, quite arbitrarily, with 1973, a point which is lower than the years preceding this date. If he would have read the post linked above, he would have known this is a common way that cheaters cheat. ”

    Deep Climate: When I showed that in fact 1973 (anomaly = 0.386) is a local maximum, Briggs attempted a rebuttal thus:

    Briggs: Deep Climate–try this from the 1940s.

    Deep Climate: Well, OK:

    1940 0.165
    1941 0.087
    1942 0.084
    1943 0.160
    1944 0.255
    1945 -0.042
    1946 0.022
    1947 0.165
    1948 0.103
    1949 -0.044

    1973 is still not lower than any of those (and neither is it lower than any year in the 1950s and 1960s).

    I do not think a demonstration of gross incompetency can get any plainer.

    Or how about this post from Skeptical Science:

    Still Going Down the Up Escalator

    You know claiming incoherency in your opponent and failing to substantiate it simply allows your opponent to demonstrate where the real irrationality lays.
    Such is the case here.

  52. Trent1492:

    “You do realize that the instrument record for NASA igoes back to 1880 and for HAD-CRU to 1850? If you look at the NASA temps for example you will see that that the temperature anomaly is now 0.8c above the 19th century. Half of the temperature rise has occurred since 1980. Now you seem very stubborn in your refusal to recognize that these temperature rises and the accompanying climate change are based on physics. Could you give me so indication that you accept the laws of thermodynamics?”

    Who is arguing against physics and thermodynamics? Certainly not me. Who argued that temperatures have not generally risen over the past 150 years. I definitely didn’t. I don’t know why you are making this stuff up. You seem to have come to this site with some pretty deep-seated preconceptions about what you thought “skeptics” would argue, and as a result you end up arguing against things people haven’t said but that you think they would have said had they followed your script.

    As for Phil Plait, hey, I was trying to give him the benefit of the doubt. If he in fact believes that humans are to blame and we are headed for termageddon, fine, I’m willing to stipulate that he believes that if it makes you feel better. All that means is that he has taken yet one more untenable position.

    The simple question this blog post started with was whether it was reasonable to say that there has been a lack of warming for more than a decade. Very focused issue. We don’t need to get all over the map.

    If you’re willing to discuss this focused issue and consider the possible implications of what the recent temperature data mean, great, we can continue the discussion. Otherwise, I’ll pop some popcorn and sit back and watch the show.

  53. @dikran marsupial: I do not understand your argument regarding forcings and models.

    It seems irrelevant if the model builder is interested in CO2, water, or candy apples. The only measure of a models skill is its ability to predict unseen values; i.e. validation testing. You seem to be stating that this is not the case. Did I misread you?

    If a model that includes a forcing is better able to predict an outcome than a model that doesn’t include forcings, then you have a test of forecast skill and/or model utility. No null hypothesis needed, no Gaussian or PDF needed. The mechanics of the model are irrelevant– only the predicted outcomes matter. Validation testing again.

    Using an ensemble of models doesn’t change the testing requirement The ensemble is still a model,mdespite having many moving parts. The manner in which you merge the results becomes part of the model, and says absolutely nothing about the underlying physical processes that are being modeled. The utility of the ensemble can only be measured by validation testing.

    I understand the utility of monte Carlo methods, but I was under the impression that the aogcm’s were not able to demonstrate predictive skill: if so, a sensitivity test doesn’t say much about the real world sensitivity of the earth to various forces.

  54. Mr. Anderson,

    The only goal post shifting I am seeing here is on your side of the field. First, you open with a bunch of questions such as “The earth is warming up? As compared to when?” My post and others have answered those questions. You could at least acknowledge to your interlocutors that the questions were asked. I think it displays a substantial amount of chutzpah disavow a question you asked less than a day earlier.

    You have described Phil Plait’s position as “untenable” yet provide no reason for this but offer it as simply another assertion that should be accepted without question. I am sorry but you do not get to dictate this conversation. As I have repeatedly pointed out and no one here wishes to address climate science has over a century made testable and observed predictions. Yet, all and ilk can offer up to the evidence is that it is “untenable”. I am sorry but that is not going to cut the cheese. So tell me what is “untenable” about:

    1. The stratosphere cooling while the troposphere warms.

    2. Nights warming faster than days

    3. Winters warming faster than summers.

    Phil Plait’s position is based on these and other physics based predictions and observations. Yet, to spite you claiming that you hold no quarrel with physics in the next sentence you call it “untenable”. Irregardless of what the poets may say, “Consistency is not the hobgoblin of little minds”. This is science Mr. Anderson not poetry or wishful thinking.

    Now you claim that you want to simply discuss the last ten years. Well here is the problem. It is a strawman, climate trends are not deduced by such short periods. To define a trend in climate you need 20 to 30 years. Do you know why Mr. Anderson why that is so? It is not a number that was picked out of a hat.

    Now allowing for the fact that ten years is too short a period to be discussing let me note that 9 out of 10 warmest years on the instrument record have been since the year 2000. That would hardly be called a plateau by anybody. Matter of fact, the 2000′s are maintaining the same 0.15c to 0.20c trend that has been on going since the 80′s. Let me put this another way: the 80′s were warmer than the 70′s, the 90′s were warmer than the 80′s and the 2000′s were warmer than the 90′s. That is all facts Mr. Anderson. You are not entitled to your own set of facts.

    Now Mr. Anderson I have one last question. How do you think trends are decided? Do you:

    A. Compute a trend by taking all the values of a period and then do a linear regression.

    B. Take a ruler and draw from point A to point B and call it a trend

    Which method do you think will result in a fail for your first semester of statistics?

  55. Trent,

    I thought the whole point of the original post was that it’s impossible to say that the anomaly has risen by 0.8c because the measurement error over the period 1850-2010 is greater than that? We surely should be saying it has risen by 0.8c +/- 2.5c (or something similar) in order to accurately reflect the amount of error involved in the process. Your extensive reading of the literature must furnish a likely error range, what is it?

    And while 9 of the top 10 warmest years are in the last 12 years, how many of the warmest 5 years are in the last 6? How many of the warmest 3 years are in the last 3? As D Robinson pointed out, that statement by itself actually doesn’t mean a lot because you’re cherry-picking arbitrary boundaries.

  56. Dikran Marsupial

    3 February 2012 at 6:40 am

    @wayne

    > 1. The various models feature the same forcings. I’m not sure that this is the case.
    > Do they all have exactly the same forcings?

    Yes, they are all running the same scenario, defined by the IPCC, which sets out the forcings

    > Do they all feature, say, only CO2 as a forcing and that’s it?

    No, for projections the scenarios define all forcings. However it is only the anthropogenic forcings that really change in the scenario, as those are the only ones we can predict/control. The projections are called projections rather than predictions as they are contingent on the forcings defined in the scenarios (there is no other way of doing it, except in hindsight).

    >2. There are enough models where the (random) non-forcings will cancel out. Seems to me that this is
    > a large-numbers kind of thing. Is the ensemble that large?

    It is large enough for the mean to be dominated by the forced component; there will always be some leakage of natural variability, but I don’t think it is sufficiently large to be an issue.

    > 3. The various models have the same set of coefficients, with random settings to tune them. If the settings > are not random, they are not exploring the space, but rather are correlated and so will naturally tend to
    > clump together, resulting in a narrow variance between forecasts.

    Yes, this is because the range of variabilty depends on what the group responsible for the model considers to be plausible values, given what they know about the physics. Different groups have different opinions.

    It is true to say that the spread of the multi-model ensemble is an indication of both natural variability and the climate modellers uncertainty regarding the physics. The spread of a single model ensemble is an indiction of natural variability contingent on the physics of that particular model.

    > Or am I misunderstanding the ensemble (perhaps confusing it with ensemble weather forecasts)

    It is very much like an ensemble weather forecast, except that an ensemble weather forecast is looking to predict natural variability, whereas climate model ensembles are trying to average it out to determine the underlying forced climate change.

    > and it really is something like 1,000 MCMC-style runs with coefficients tweaked, to see what results?

    I’m not sure to what extent the coefficients are tweaked, you would have to ask a climate modeller about that, and the ensembles are not that large, but it is the basic idea (Monte Carlo, but without the Markov Chain – the model itself is the same for all model runs).

  57. Dikran Marsupial

    3 February 2012 at 6:48 am

    @TallDave “With what predictive reliability, though? That’s the kind of question the IPCC needs to be more honest in answering”,

    we don’t have observations of the future, so how would you go about establishing their predictive reliability?

    As I have already explained, you need 30 years or so of observations to estimate the linear trend reliably, so there is no way of estimating the predictive reliability of the current generation of models. The IPCC and climate modellers are up front about this and do a lot of work on evaluation of models, see for example chapter 8 of the most recent IPCC WG1 report “Climate Models and their Evaluation”. The IPCC are honest about it, it is just that very few peaople have taken the trouble to read what they have actually written about it.

    If you take the projections made by Hansens early models, and choose the most similar scenario to the observed forcings, then the projections made by even those rather crude models (by todays standards) perform reasonably well.

  58. Dikran Marsupial

    3 February 2012 at 6:54 am

    D. Robinson asks: (regarding “Decadal periods of little or no warming are expected to happen”)

    “May I ask you how many periods of 10-13 years of ‘little or no warming’ have happened in the past without a preceding major volcanic event?”

    I don’t know off-hand, however IIRC they ocurr in the model runs with or without volcanos. The current one is likely to be due to ENSO, and you need to pick the start date carefully to coincide with the right phase of ENSO to get a flat trend. If you account for the effects of ENSO, the trend reappears again.

    Brigg’s post about cherry picking start dates is worth reading.

  59. Dikran Marsupial

    3 February 2012 at 7:02 am

    @Will says: “It seems irrelevant if the model builder is interested in CO2, water, or candy apples. The only measure of a models skill is its ability to predict unseen values; i.e. validation testing. ”

    The unforced component of climate change is chaotic, and hence no model can have any skill in predicting the unforced component in the long term. The forced component is (probably) non-chaotic, so we should be able to prdict it with some skill. Hence the way to get the best predictive skill for unseen values is to ignore the unforced component and try to predict the forced component alone (and use the error bars to characterise what is plausible when you add in the unforced variability). This is exactly what climate modellers do.

    There is also the point that it is the forced component that we affect by anthropogenic emissions, so it is what we want to infer in order to choose the best course of action.

  60. Dikran Marsupial

    3 February 2012 at 7:07 am

    @Will ” understand the utility of monte Carlo methods, but I was under the impression that the aogcm’s were not able to demonstrate predictive skill:”

    I dont think this is true. They can’t be expected to show predictive skill on a decadal scale (as decadal trrends are too short to be able to reliably estimate the slope of the trend), Hansen’s model projections have shown some useful predictive skill in hindsight (provided the baselines are fair and the scenario closest to the observations are chosen).

  61. Parts of this discussion are more than absurd. SkS has written about a “dampening” of global warming and a “pause”. I wonder why nobody complained when they did that. Do we want to open up the dictionaries and compare “pause” to “lack”???

    An adult discussion could have been about “is a decade long enough to make hypothesized temperature trends climate-relevant?”. But of course with the likes of Trent1492 around, it’s impossible to have an adult discussion.

    Two quick point then…Phil Plait is fully aware of measurement problems. Just read his recent post about Fomalhaut B (or lack thereof). It’s quite strange then to see him rely on amateurish efforts such as Skeptical Science, as if he were your average science-challenged journo, rather than “read the sources”.

    And another thing I find amazing is Tamino’s claim that uncertainty in trends reduces if we consider several decades…well, yes, up to a point: because Tamino (and countless others) still don’t get the problem of uncertainty. They live in a world where things can be computed to the 99th decimal, so data is considered having 99 decimals.

    It’s as if Briggs were showing a TV image of a fire to a bunch of primitives, and these complained to him the fire wasn’t giving out any heat. You think “yes, but…” and then give up in despair.

  62. Re: Dikran Marsupial

    “Hansen’s model projections have shown some useful predictive skill in hindsight (provided the baselines are fair and the scenario closest to the observations are chosen).”

    That’s a tough statement to let go. CO2 emissions have been between his A & B scenario but the temperature is significantly below his C scenario which was predicated on a ‘drastic reduction in worldwide CO2 emissions’. To put it another way the linear trend from 1900 – 1988 has shown more predictive skill than Hansen’s 1988 models. How much did he get paid in grants?

    “I don’t know off-hand, however IIRC they [decade scale periods of little to no warming] ocurr in the model runs with or without volcanos”

    No actually the models do not show a decade scale plateau without a major volcano. AND THEY BETTER NOT, because prior to now the only decade scale ‘plateaus’ in the actual temperature record were due to volcanos (or to the bucket to inlet measurement thing ~ 1945.)

    This current period of little to no warming is unique, and it was not predicted. Can you accept that?

  63. @Dikran “we don’t have observations of the future, so how would you go about establishing their predictive reliability?” You say that as though there is not an entire field of science devoted to answering that exact question.

    Hansen 1988 forecasts very badly, not “reasonably well.” The actual emissions look most like worst-case Scenario A, the actual temperatures are well below even Scenario C.

    At this point GCMs cannot outperform a naive forecast, so there is no reason to think they have any significant predictive reliability, and therefore absolutely no reason why we should be diverting trillions of dollars on their say-so.

  64. Ronald Baily at the Reason Blog writes, Every month University of Alabama in Huntsville climatologists John Christy and Roy Spencer report the latest global temperature trends from satellite data. Below are the newest data updated through January, 2012.
    (…)
    The 3rd order polynomial fit to the data (courtesy of Excel) is for entertainment purposes only, and should not be construed as having any predictive value whatsoever.

    Truth in statistics.

  65. Dikran Marsupial

    3 February 2012 at 11:57 am

    @D.Robinson Given what Hansen had to work with, the projections were pretty good. I didn’t say they were brilliant, just that they had some predictive skill. If you want to discuss that further, there is a good discussion here http://www.skepticalscience.com/Hansen-1988-prediction-advanced.htm

    The difference between the linear fit from 1900-1988 was and Hansens projection is that Hansen used physics, rather than statistics. With physics you can make projections based on scenarios; you can’t do that with a linear regression as the only scenario you can use is the one on which it was calibrated. That is a BIG difference, and well worth the grant money as our understanding of climate physics has benefitted substantially from modelling efforts.

    “This current period of little to no warming is unique, and it was not predicted. Can you accept that?”

    “Yes” (if evidence provided) and “no” respectively (the existence of such events was predicted (Easterling and Wehner), their timing cannot as they are chaotic and hence intrinsically unpredictable, so expecting the climate models to predict the timing is clearly unreasonable).

  66. Dikran Marsupial

    3 February 2012 at 11:58 am

    @TallDave yes, we have forecasting, however you need data to test the forecast and we don’t have data for the future.

  67. @Trent1942 — Again, your original claim was that some number of the hottest years are in the last ten, and you don’t seem to understand the errors bars around those measurements are too large for that assertion to be meaningful (again, read the post). As to your irrelevant raising of “natural phenomena,” I don’t know what your point is even supposed to be — pretty much no one argues today is not warmer than, say, 1850.

  68. @Dikran — Yes, but in the absence of that data, there are in fact other ways to assess the predictive reliability of a forecast, and GCMs tend to fail those measures pretty badly. Read the link.

  69. The SkepticalScience piece is a great example of how not to do science. The only way you get that result is to pretend Hansen A was something other than it really was (nicely done Gavin) and then compare it to Hansen’s own database (no conflict of interest there) with its numerous “corrections.” A more honest comparison using what Hansen A actually represented (little to no slowdown in emissions) and what satellite temperatures show finds that Hansen 1988 does a very poor job of predicting temps.

    http://pielkeclimatesci.wordpress.com/2010/08/13/is-jim-hansens-global-temperature-skillful-guest-weblog-by-john-christy/

    Meanwhile, the main problem supposedly caused by these temps, accelerating sea level rise, has also totally failed to materialize, contra Hansen’s prediction that, for instance, the West Side Highway would be underwater.

    http://sealevel.colorado.edu/

  70. Trent is a bit stupid is he not?

  71. Also, when citing Skeptical Science, one must remember the site is not only heavily censored to remove (non-abusive) comments simply for being inconvenient to the author’s arguments, and the author has been caught in some very dishonest editing. We have always been at war with Eastasia.

    http://nigguraths.wordpress.com/2011/10/10/skepticalscience-rewriting-history/

    http://www.bishop-hill.net/blog/2011/9/21/the-cook-timeline.html

  72. @Tall Dave,

    What you do not understand is that Mr. Briggs has been shown to be in serious error in this claim. I mean really, he thinks that calling every statistical technique a model is somehow a critique. I have to wonder why suddenly taking a average is suspect. He has also confused descriptions of data such as averages as being a “predictor”. Ridiculous. What is totally egregious is that he does not realize that the predictions are not statistically based but based on the physics. Everyone here keeps on telling me they have no problem with the Laws of Thermodynamics yet every word except for those explicit disavowals demonstrates otherwise.

    I find it particularly galling that Mr. Briggs seems to not understand that the more data you gather the smaller the error bar. I have already addressed this with my last post to Mr. Anderson. I suggest you go read it. I keep on pointing out that their is a reason why ten years is too short a time. I do not think you or Mr. Briggs realize that the longer the time the smaller the uncertainty.

    What is further perplexing from my point of view is that while you bunch go wrongly on and on about the *error bars being to big to say anything about it warming or not the real world out there is having an altogether different say. While Mr. Briggs incoherently navel gazes. The Arctic Sea Ice continues to retreat, 89% of glaciers are now melting, and Spring arrives weeks earlier than it did in the 19th century. I keep on telling you guys that nature is the final arbiter. You should take that to heart if nothing else the next time you seek to defend the indefensible position on whether we know it is warming or not.

  73. Briggs

    3 February 2012 at 1:51 pm

    Trent1492,

    I have no idea who you are, but your comments of today are typical, so I thought I’d single you out. Actually, yours are on top; the only reason I picked you.

    Now, I mean this nicely and in the spirit of constructive criticism. Your comments show that you haven’t any but the barest, slightest familiarity with statistics, particularly Bayesian statistics, and the branch of it called predictive statistics, nor of the philosophy of evidence. I have said, and do say, that I used particular terms in a technical way. I have given links to works which show exactly what those technical definitions are. Further, I have given extensive reasons why those methods should be used and the necessary difference between them and inferior methods of analysis.

    I accuse you of not reading them, of remaining in ignorance of the concepts at hand on purpose and of maintaining a blustery, unsustainable attitude of self-superiority. Your failure to do your homework could be because of laziness, or because of inability to grasp the material, or pigheadedness and inability to admit to a mistake, or some other reason I cannot fathom. If you were a student of mine, I not only would have failed you, I would have written a letter to your parents to come fetch you and give you a lecture on diligence and propriety.

    All are welcome to comment here, but only if they act like ladies and gentlemen. Criticize me, lecture me, tease me, even. By all means show me where I am in error and I will call you friend. But first learn whereof you speak or you make yourself look foolish and waste the time of others.

  74. Dikran Marsupial

    3 February 2012 at 2:03 pm

    Dr Briggs, I intend this as constructive criticism, but your article and comments (particularly regarding forced and unforced variability) suggest that while you have a very good grasp of statistics, you have little familiarity with the climatology. As a result the statistical methods you have adopted are inappropriate, for the reasons I have already given.

  75. Briggs

    3 February 2012 at 2:09 pm

    Dikran Marsupial,

    I say your comments about forced and unforced variability are besides the point in answering the three claims (or if they are to the point, they describe yet another model which has to be subjected to the same verification process as the others). I also say that your reasons, for the reasons I have given, are not persuasive and even in error. But they are honest, informed errors, so we can still be pals. You get a B+.

  76. Dikran Marsupial

    3 February 2012 at 2:16 pm

    Dr Briggs, the claim is

    “The lack of warming for more than a decade—indeed, the smaller-than-predicted warming over the 22 years since the U.N.’s Intergovernmental Panel on Climate Change (IPCC) began issuing projections—suggests that computer models have greatly exaggerated how much warming additional CO2 can cause.”

    The additional warming that CO2 can cause is a claim regarding the forced component and the forced component only. Thus the distinction between forced and unforced variability is central to the claim. So if they are not relevant to statements A and B, then it is because statements A and B are not relevant to the claim.

    As it is a projection of the IPCC that is in question it is manifestly unfair to define warming in any other way than that which a climate modeller would use (i.e. a linear trend). Furthemore it is obviously incorrect to choose a definition that an expert in the data would avoid as they know it is not a robust indicator of the forced component, which they aim to model.

    I am happy to be pals with everybody. Science is not advanced by emnity.

  77. Briggs

    3 February 2012 at 2:21 pm

    Dikran Marsupial,

    In the interest of keeping separate the various claims—you have added a fourth, which is different than (A), (B), or (C)—why don’t you write a guest post on how you would statistically analyze that claim? I’d be happy to post it. Email is on contacts page.

  78. Dikran Marsupial

    3 February 2012 at 2:32 pm

    I haven’t added a fourth, I have merely pointed out that the claim and the discussion is a misinterpretation of the projection.

    The analysis is simple: test whether the observations fall within the spread of the multi-model ensemble. If they do, then reality is consistent with what the modeling community consider to be plausible (given our understanding of the physics and the appropriateness of the scenario which defines the forcings).

    Yes, the spread of the ensemble is pretty wide, which is a reflection of the fact that climate modellers are quite open about the uncertainties of the projections.

    However, the main point is that decadal trends tell you next to nothing about the effect of CO2 on the climate as on those timescales it is dominated by unforced variability. If you want to draw conclusions from a trend, then either make sure the hypothesis test has sufficient power for a failure to reject the null to be interesting or use a Bayes factor, which will give an equivocal result when the data are equivocal.

  79. Mr. Briggs,

    It is blatantly obvious that you are an incompetent. You have confused descriptive data with predictions. I think that is blatantly clear that you have no clue. How else can it be explained your exchange on Tamino’s blog? Since you insist that we revisit it her we go:

    Briggs Said:

    Notice old Phil (his source, actually) starts, quite arbitrarily,
    with 1973, a point which is lower than the years preceding this date.
    If he would have read the post linked above, he would have known this
    is a common way that cheaters cheat. Not saying you cheated, Phil, old
    thing. But you didn’t do yourself any favors.

    Deep Climate Says:

    Actually 1973 appears to be *higher* not lower, than the years immediately preceding *and* succeeding it. That year was presumably chosen by John Cook (Skeptical Science) because it marked the beginning of the first of many “pauses” that one can identify if one cherrypicks the right subintervals. The choice of 1971 or 1972 would have had little effect on the overall trend to 2009, but it did raise the slope a bit.

    For greater certainty here are the BEST annual values for 1970s with trend slope to 2009:

    Year Anom Trend to 2009
    1970 0.070 0.271
    1971 -0.042 0.277
    1972 -0.137 0.277
    1973 0.386 0.273
    1974 -0.130 0.290
    1975 0.166 0.284
    1976 -0.213 0.291
    1977 0.274 0.276
    1978 0.104 0.286
    1979 0.023 0.284

    And what is your response?

    Briggs:

    try this from the 1940s.

    Deep Climate Says:

    Well, OK:

    1940 0.165
    1941 0.087
    1942 0.084
    1943 0.160
    1944 0.255
    1945 -0.042
    1946 0.022
    1947 0.165
    1948 0.103
    1949 -0.044

    1973 is still not lower than any of those (and neither is it lower than any year in the 1950s and 1960s).

    Want to try again? Let me save you the trouble. It turns out 1973 is *higher* than every single year preceding it. So you were utterly, completely wrong.

    I think that it is patently clear that you are an incompetent.

    For those who want to revisit that train wreck go here: William M. Briggs: Numerologist to the Stars

    Really what more needs to be said here?

  80. Dikran Marsupial

    3 February 2012 at 2:36 pm

    BTW thank you for the kind offer of a guest post, but the issues are really more to do with the climatology rather than the statistics, so it might not be the best venue. I may post something over at SkS at some point if I can find the time.

  81. Briggs

    3 February 2012 at 2:39 pm

    Trent1942,

    The only thing we can agree on is you need say nothing more. Go find somewhere else to play.

    But as you go, see Josh’s cartoon: http://www.cartoonsbyjosh.com/how%20to%20do%20graphs.jpg

  82. I think what is further damming is that in some sense this a distraction. You critiqued Phil Plait’s original post and missed out (purposely?) the main argument. The Daily Mail and the editorial in the Wall Street Journal lied. Let me say that again, they told blatant and unequivocal lies. Shall we demonstrate those lies? Yes, you says? OK

    The Daily Mail said: Met Office releases new figures which show no warming in 15 years”

    What the Met Office actually said is: The Met Office: 4 January 2012 – 2012 is expected to be around 0.48 °C warmer than the long-term (1961-1990) global average of 14.0 °C, with a predicted likely range of between 0.34 °C and 0.62 °C, according to the Met Office annual global temperature forecast.

    Now whether you agree with what the Met Office said is accurate or not that is immaterial. It is immaterial because the Daily Mail lied about what the Met Office said. You claim to want a certain amount of decorum. Yet, all I can see here is that you seem to be engaging in distractions about the main thrust of Phil Plait’s article which is that a tremendous amount of disinformation was published by the Daily Mail and the Wall Street Journal.

    How exactly are blatant lies to be handled Mr. Briggs? You are clutching your pearls and pulling out the fainting couch because you think intemperate language is being exercised and yet your silence on the large lies being told ad nausauem is more telling, in my humble opinion, than any words that I have written on this thread. Your silence is your conviction.

  83. Briggs

    3 February 2012 at 3:14 pm

    Trent1492,

    “in my humble opinion” Oh good grief.

  84. @Dikran Marsupial: OK, I was under the impression that weather ensemble forecasts were literally competing models, developed by different groups, and with nothing guaranteed to be in common. By running current data through them, you get a whole range of forecasts, the outcomes of which depend on the totality of the assumptions of each modeler. Perhaps I’m wrong on that.

    So you’re saying that there is a single IPCC model, basically. It no doubt has many parameters that can be adjusted to give reasonable back-casts, and it is thus tuned on historical data. It is then run forward, setting a couple of forcings (particular parameters) to various “scenario” values and the ensemble forecast is basically the mean of these scenarios, and the uncertainty is the spread of those scenarios.

    Does that sum it up correctly?

    It seems to me that the spread of the scenario outcomes is not the uncertainty of anything, but rather the sensitivity of the model to some of its parameters (“forcings”). The uncertainty of the training data is not carried forward. The uncertainty of the many hidden parameters (tuned to the historical data) are not accounted for. Etc.

    As you noted, I used MCMC when I really meant something more like MC, and I think that’s where the ensemble approach would need to go: simulate from the entire range over which historical data could have actually fallen, simulate from the range of variability of all parameters, and so on. Then we could say that the resulting (thousands of) simulations reflect the uncertainty inherent in the process. Right?

  85. Since you like cartoons. Josh is no xkcd.

  86. Trying that link again xkcd

  87. @wayne: thats exactly what I have been getting at. The models, so far, have not been validated, and so any sensitivity analysis against the models will only tell you how sensitive the models are to various parameters without reflecting anything about real world sensitivities.

    @dikran marsupial: hind cast, as climatologists use it, is not a valid validation test. It’s a no no to validate a model against data that was used to create the model. By testing agaisnt data that was used to construct the model you are really performing a type of signal-to-noise-ratio measurement, which just confirms how good or bad your model acts as a compression algorithm.

    Dikran, the issues you raise (about detecting and quantifying components in a signal) are entirely about statistics. I can accept that there are many subtle nuances unique to climate modeling, but the underlying concepts are the same for all disciplines.

  88. So, the new answer-bot of the Team is now called Trent1492? I see the same pro AGW arguments over and over again, “melting glaciers and ice caps”, “the stratosphere cooling while the troposphere warms.”, “which part of the nine of the ten warmest years on the instrument record have all been since 2000 do you not understand?”, “Spring is arriving earlier by weeks”, (well not the last couple of years here as I remember), pre-programmed sound bytes, triggered by the word “cooling”.

    Trent is a bit stupid is he not?

    Naah, he’s not human, just an android programmed that way.

  89. If Trent and Mann held hands a synapse would form. Sadly it would be inhibitory.

  90. Shorter Hoi Polloi: I got nothing no substantive argument I think I am going just throw out some random insults.

  91. Told you so.

  92. Yo Andy,

    Got anything to add beyond acting like a 12 year old? You lot have been shown in detail where you are wrong. Now you have resorted to simple name calling without even a pretense of engaging the facts and logic. You know like insisting that we do not even know if the world is warming or not when all of the global trends in nature show otherwise. Andy, maybe you should go back to your play yard and start building more sand castles of the mind.

  93. Briggs

    3 February 2012 at 6:22 pm

    Trent1492,

    Start behaving in a civilized manner or you’re out.

  94. Mr. Briggs,

    I will stop posting since you seem to have an asymmetrical standard for behavior for yourself and others. When I say asymmetrical I mean that apparently you are cool with one side calling the other “stupid”, “crack addict” and other meaningless gibberish.You allow to pass with out comment the Daily Mail’s and Wall Street Journal’s demonstrable lies. But that is cool. This is your blog and if you wished to act in a hypocritical manner that is your prerogative.

    I should mention though that Google now has William M. Briggs: Numerologist to the Stars as #5 in its search results for your name. Keep it up and your infamy may zoom to #1.

    To Andy and his other ideologues. If you have any wish to continue our “discussion” I will be over at Deltoid in the open thread. I got to inform you though that you are not going to be in a space with Mr. Briggs watching over you.

  95. Dikran,

    You say:
    “The analysis is simple: test whether the observations fall within the spread of the multi-model ensemble. If they do, then reality is consistent with what the modeling community consider to be plausible (given our understanding of the physics and the appropriateness of the scenario which defines the forcings).”

    This question of whether to use the spread of the ensemble seems to be have been debated since the beginning of time on the climate blogs – in particular the eternal discussions about Douglass 2007 at ClimateAudit, the AirVent, Lucia’s etc. There was significant disagreement on the appropriate statistical test but most people seemed quite skeptical of just looking at the spread of the ensemble. As numerous commentators pointed out, you would pass this test by just having some very bad models biased to high and low sides! Your comment that “the analysis is simple” is a gross simplification.

    MMH (2010) is an interesting analysis of evaluating the ensemble versus observations (http://www.rossmckitrick.com/uploads/4/8/0/8/4808045/mmh_asl2010.pdf) which is not rosy for the models of tropospheric trends. To my knowledge, this approach is yet to be refuted in the literature. Perhaps you could take a look and opine as to whether their statistical approach is superior to just looking at the spread of the ensemble.

  96. Briggs

    3 February 2012 at 7:01 pm

    All,

    I missed the “crack addict” crack that Trent1492 alluded to. None of that kind of thing is allowed here. Jokes are okay, ungentlemanly behavior is not.

  97. Trent1492,

    That’s “Dr. Briggs” to you.

  98. Dikran Marsupial

    3 February 2012 at 7:31 pm

    @certy The test used by Douglass et al. is very clearly incorrect, as an infinite ensemble with perfect physics [i.e. the best Monte Carlo ensemble that the modelers could possibly make] would be guaranteed to fail the test. I was rather surprised that the paper made it through peer review given how obvious is the flaw.

    If the ensemble is large enough, feel free to use (say) 5% and 95% quantiles so that it isn’t so strongly dependent on the most extreme. Increasing the spread of the ensemble would be pointless, the broader the credible interval, the less confident the projection, the weaker the conclusions that can be drawn. It would be an entirely counterproductive way to try and make the models look good. Climatologists want an accurate model with a narrow credible interval; both are important. The fact that they make no attempt to hide the breadth of the credible interval is an indication of their honesty in representing the shortcomings of the models (at least to anyone that understands how the models should be interpreted).

    Again much of the discussion suffers from a lack of understanding of the climatology rather than the statistics. The correct test depends on what the models aim to predict and what the mean and variance/spread of the ensemble actually tell you.

  99. ……over at Deltoid. Says it all really. I think I hit a nerve and he’s taken his ball away. Oh well I will just wonder at the hubris of someone thinking he will get people following him to another site.

  100. @dikrak marsupial: I am trying to understand the process you describe. In my field I’m usually only interested in either prediction accuracy or in performance versus some other model. I get to cheat, in a sense, because the underlying process I am modeling is always considered a black box.

    Where I am stumbling is: If the model, given a set of observations, hasn’t been validated then how can you ever know if tweaks to the inputs are good or bad? Don’t you need something to test against besides just trying to reduce variance?

    I use ensemble methods regularly, but not to increase a spread of predictions. The idea behind ensemble methods, as i use them, is that many weak predictors, while not perfect, can often give a better prediction than a single strong predictor. They can also be used to robustly handle missing or corrupt measurements. How does what I described differ from the way ensembles are used in climate modeling?

  101. Dikran Marsupial

    4 February 2012 at 3:23 am

    @Will, In a statistical ensemble (I use them a lot as well, mainly bagging) each individual model is intended to be a predictor of the value of the response variable as a function of the attributes, and ensembling provides a useful variance reduction that on average will improve predictions.

    In a GCM ensemble, the individual model is not intended as an accurate predictor of observed climate (as that is effectively impossible), but a simulation of how the climate might evolve. The way that climate might evolve has two components, a deterministic (hopefully non-chaotic) response to the forcings (the forced response) and a chaotic component (the unforced response) which is essentially “weather noise” comprised of things like ENSO. So with a GCM ensemble, averaging over many runs cancels the unforced component in each run and leaves you with an estimate of the forced response, which is what we need to know for planning a course of action.

    Now the observed climate will have a forced response and an unforced response, so we should not expect it to match the ensemble mean (which is an estimate of the forced response) even if the ensemble is exactly correct. How close we should expect the observations to match the ensemble mean depends on how large we can expect the unforced response to be. At the moment the best way to determine this is to simulate future climate many times and see what range of outcomes we might see around the forced response. This is exactly what the spread of the ensemble runs tells you.

    Sadly we can’t estimate the magnitude of th unforced response from the observations as we have only one realisation of the actual climate to look at.

    I have found a double pendulum is a good analogy to the way a climate model operates http://www.skepticalscience.com/on_consensus.html#20068

  102. Mr Briggs,

    Thank you very much for this link:

    http://www.cartoonsbyjosh.com/how%20to%20do%20graphs.jpg

    Hat tip to Josh. I was thinking of the same presentation of graphs/double standards a few days ago myself, so it was a delight to come across this. Climate science… it’s an embarrassing mess of a field, isn’t it?

  103. Dikran Marsupial

    4 February 2012 at 4:43 am

    @Will Nitschke There is no double standard, the trends in the SkS escalator (middle figure) are not statistically significant and essentially meaningless (which is why you shouldn’t do that), the ones in the IPCC diagram (bottom figure) are statistically significant, which is a very different matter.

    Amusing cartoon, but don’t take it too seriously.

  104. Briggs

    4 February 2012 at 8:36 am

    Dikran Marsupial, Will Nitschke,

    But then “statistically significant” is what we agreed is not a way to measure model truth, as the opening comments to this thread show.

    Yes, the IPCC makes the same mistake. “Statistical significance” is, as regular readers of this blog know, trivially easy to find. Why those lines for the IPCC? How can we know those models were true, etc.? It’s the same question as we started with.

    As always, the best test is a model which predicts new data well (claim (B)). So far, the IPCC does not do well at that. We are right to suspect their models.

    Update: be sure to see the post on how to fool yourself and others with time series.

  105. Dikran, Yes, I am aware that Douglas 2007 has a statistical error which is why I asked for your opinion about MMH 2010, not Douglas. MMH uses econometric techniques to test models versus observations and to my knowledge its conclusions and techniques have not as yet been refuted.

    As you are familiar with Douglas 2007, you will know that it was refuted by Santer 2008. The latter used an improved statistical test as opposed to your simplistic spread of the ensemble. You should look at the list of co-authors on Santer 2008. You appear to be in a small minority in believing in your simple spread test. Was Santer 2008 also incorrect?

    Further, MMH 2010 showed that if you apply Santer’s own approach to latest data, the model projections are significantly different from observed trends.

  106. Dikran Marsupial

    4 February 2012 at 1:13 pm

    @certy, I suspect the test in Santer is slightly too conservative (I am not sure it fully accounts for the uncertainty in the physics, although I believe it does account properly for unforced variability properly). However it is rather a long time since I read it and I haven’t analysed the data to determine whether this is a substantial problem.

    I’ve not read MMH2010 yet, so I can’t comment on that. It is worth noting however that the majority of papers that are incorrect are never formally refuted, they just get ignored. There is just so much published these days that not many researchers write comments papers any more as there just isn’t time.

    IIRC Gavin Schmidt is also a co-author of the Santer paper, and he wrote an article on RealClimate discussing model-data comparison, and he used the “lies in the spread of the models” test (I suspect he used +- 2 standard deviations, but it amounts to petty much the same thing). So there is precendent for the test, even if you don’t accept the explanation of why it is a reasonable test.

    Note that as I mentioned I think the Santer test is slightly conservative, so I am not unduly surprised if the test gives different results for different timespans. It is because the observations are around the limit of what the models can currently explain by natural variability. This could be a problem with the models, or with the observations, or it could be that the realisation of natural variability that we observe is unusual, or a combination of all three.

  107. Dikran Marsupial

    4 February 2012 at 1:34 pm

    Dr Briggs, science is going to continue using frequentists hypothesis tests for the forseeable future, and they do provide a useful sanity check. The trends in the SkS escalator fail this sanity check, which is a good reason not to use them (whch is the point of the diagram). The trends in the IPCC diagram have at least passed that basic sanity check, so it is a non-sequitur as I pointed out to suggest there is a double standard between the escalator and the IPCC diagram.

    “How can we know those models were true,”

    I’m with GEP Box on that one, all models are false, but some are useful.

    “As always, the best test is a model which predicts new data well (claim (B)). ”

    I would agree, however (i) we don’t have enough new data to reliably assess the projections of the model. (ii) the test should be a test of what the models actually predict, which (B) is not, for the reasons I have given.

    “We are right to suspect their models.”

    All models should be subject to suspicion.

  108. @dikran marsupial: Thanks for the information and the link. I appreciate you taking the time to help answer my questions. Hopefully I can return the favor someday. :)

    I will have a look at the double pendulum example you provided.

  109. @dikran marsupial: just read the example. Very well written btw.

    The pendulum example you provided was very well explained, but the analogy is misleading; you are comparing a system where the functions describing thr dynamics are known to be perfectly precise and there is only one external influence which can be turned on or off at will.

    This is not the case with our moist blue sphere. We have multiple unknown forces, with an unknown number of interactions which are still not understood, and a method of simulation (box models, etc..) which is known to be imprecise.

    To use the pendulum example; pendulum has two to infinity joints, between 0 and infinity electromagnets, force of gravity is changing, and sometimes the pendulum is a superconductor– and we only get to see the pendulum animation as a 2×2 pixel image. (Thats my attempt at being funny..)

    Is my revised analogy in the right ballpark, or am i still horribly confused?

  110. Dikran Marsupial

    5 February 2012 at 2:02 pm

    @Will, yes the double pendulum is indeed much simpler than climate modelling for exactly the reasons you suggest; however even when you know the physics exactly and there is only one forcing, you can still only expect the observations to lie within the spread of the ensemble of models.

    Another good analogy is to consider having a time machine that could visit alternate realities where Earth had identical climate physics and identical forcings, but different initial conditions (perhaps a different butterfly flapped its wings in different universes). These alternate Earths are perfect models of climate change on our Earth, with perfect physics and essentially infinite temporal and spatial resolution. Climate modellers could not even theoretically improve on this model (without having exact information on initial conditions). We could make an ensemble from these alternate Earths, but even then you could only expect the climate on our Earth to lie in the spread of the ensemble somewhere.

    There would be no reason to expect the observed climate to be any closer to the ensemble mean than any of the alternate Earths comprising the ensemble (there is nothing special about our Earth).

  111. @Dikran Marsupial: Thanks for the response.

    What you are describing as a model ensemble sounds more like a sensitivity analysis. The model results are not actually combined to create an ‘ensemble estimate’, but rather are used in the way results of a risk assesment would be. Really, the models themselves become parameters in a way. Am I correct in thinking this?

    I’m really stuck on something though.. Using the pendulum example:

    If you have an unknown number of electromagnets and the pendulum could have an infinite number of moving segments, how can you be confident that any approximation the pendulum system is a valid approximation?

    In other words, how can you know that you’re exploring the bounds of a valid, or even plausible, model? If you’re using a 2-10 arm model in your simulation, what reason is there to not use a 50 arm model? What reason is there not to include wind and air viscosity in to the pendulum model?

  112. Dikran Marsupial

    6 February 2012 at 5:19 am

    @Will, it seems to me that the best way to view the ensemble is as a subjective Bayesian posterior distribution of the plausible outcomes given the climate modellers current understanding of climate physics. The spread is evey bit as important as the mean in characterising the posterior. This will be very familiar with Bayesians as statistical decision theory would suggets that we integrate over the whole posterior in evaluating the expected loss for differrent courses of action, rather than just concentrating on the most plausible outcome (the ensemble mean). Of course our knowledge of climate physics is imperfect, to say the least, but it would be irrational to plan our course of action based on anything other than our best understanding of climate physics, including all uncertainties, as embodied in the models.

    The climate system does not have an infinite number of forcings, the major forcings such as solar activity, GHG radiative forcings, aerosols are well characterised in the models. Some minor forcings are less well characterised (e.g. clouds), but there isn’t currently a great deal of evidence to suggest that they are dominant, and no model can include all of the features of climate (without an Earth in a parallel universe). This is what GEP Box was getting at when he said “all models are false, but some are useful”; all models are necessarily abstractions or simplifications of reality in order to be tractable, and it is always bearing in mind that the model is not “true”, no matter how well it fits the observations. Similarly the temperal and spatial resolution of the model don’t need to be ininite for the model to be useful. Such models are routinely used in science and enginnering. The higher the resolution the better the simulation of climate, but that is the nature of approximations.

    The important thing I am trying to say is that if you want to compare observations against the multi-model mean, then there is no reason to expect the observations to be any closer to the mean than the plausible range of effects of unforced variation. The problem is that we cannot estimate that range from a single data point (the observed climate on our Earth); the best we can do is to estimate it from climate models. This makes any reasonable test of the models rather circular (other than the lies in the spread test) as it would rely on the spread of the model ensemble not being an underestimate of the plausible spread due to unforced variability. As the models are less complex than reality, this doesn’t seem a reasonable assumption to me.

    The bottom line is that the model is wrong, we know that as all models are wrong (GEP Box), the question is, “are the models useful”, and if the answer is “no” what are you going to replace them with that will be better (statistical forecasting won’t be as it will involve extrapolating from the model beyond the conditions underwhich it was calibrated, which is very risky unless the model captures causal rather than statistical relationships).

  113. @Dikran Marsupial: We are on the same page regarding the ensemble. Having never made a perfect model of anything, I happily agree with you that a model can be imperfect while still being useful.

    That said, there are a number of assumptions made in the description you provide that seem to be important to the outcome, and leave me a little unsure of how the process works.

    - How do you know that a forcing is a major or minor one if the system is untestable? From what I have read elsewhere it seems as though the forcings haven’t been nailed down. Like in the pendulum example, simply by observing the pendulums location we would have no way of deducing the number (or strength) of electromagnets and arms.

    - How do you know that the model embodies the forcing properly? Back to the pendulum example; how do you know how powerful each magnet is? What is the coefficient of friction for each segment on the pendulums arm? We could make a bunch of guesses, but would have no way of knowing if they are right or wrong– its untestable.

    I guess what I’m getting it as is this: How do you know that the ensemble output is any better, or worse, than simply picking a number at random? How do you test it to make sure that it is better/worse than simply picking a number at random? I understand you check the ensemble spread, but anyone could say ’0 +/-5′ and be pretty much right.

  114. Dikran,

    I think you miss the point when you say: “I am not unduly surprised if the (Santer) test gives different results for different timespans”.

    The key point is that when you use the full time period of the observations, incorporating all available data from 1979-2009, the model projections significantly differ from the obs. This should give one pause about the models.

  115. Dikran Marsupial

    6 February 2012 at 2:03 pm

    @Will “- How do you know that a forcing is a major or minor one if the system is untestable? ” mainly from the observations (including paleoclimate) and from physics/experiments. I didn’t say that the projections are not testable, just that a fair test should be based on what the modellers claim to be able to do, not what they claim not to be able to do. If the observations lie within the stated uncertainty of the model, then the model is essentially performing as well as it claims to be able to perform.

    “How do you know that the model embodies the forcing properly?” The forcings are the input to the model, not part of it. However if you mean “how do you know the model responds to the forcing properly” then strictly speaking we know that they don’t because all models are wrong, as they are only approximations to reality. The question is whether the approximation is good enough to be useful.

    “I guess what I’m getting it as is this: How do you know that the ensemble output is any better, or worse, than simply picking a number at random? ” O.K., take a random number generator and see if it would have better predictive ability than Hansen’s 1988 projections. You will find that it wont.

    However, you are missing the point of the models, which is to tell us the plausible consequences of our actions according to our knowledge of climate physics. Which is more rational, making decisions based on the expert understanding of climatologists, or based on a random number generator? Yes obviously there is a flippant answer to that question, but ask yourself why we have doctors diagnose disease rather than flip a coin. If you think climatology is an imperfect science, then medical science is way worse, we are barel;y begining to map out gene regulatory networks for example, which are vitally important in understanding the body’s reaction to drugs, so why do we trust the knowledge of doctors, but not climatologists (I can assure you there is WAY more money involved in drug research than there ever will be in climatology)?

  116. Dikran Marsupial

    6 February 2012 at 2:10 pm

    certy as I have pointed out, the observations are only “significantly” different if you choose the right start and end dates, and choose a test that as I have pointed out is over-conservative. It shouldn’t give anyone “pause about the models” becuase scrutiny and consideration of the models should be continuous and ongoing, which is in fact exactly what the climatologists actually do. Go read the relevant article at Real Climate and you will find Gavin Schmidt, a climate modeller, openly discussing the shortcomings of the models.

    As I said, the observations are on the borderline of what the models consider plausible and there are three reasons why this may be (the models, the observations, that the unforced variation is currently highly unusual). I don’t see any reason why we should fixate on the models, we should keep an open mind about all three possibilities.

  117. Dikran, Again, no, it is not a matter of whether you “choose the right start and end dates”. This is simply choosing all the available data. When you are doing this and using 30 years of data, the old accusation of cherry picking just doesn’t hold water.

    I also don’t get your point that this is not a reflection on the models because “scrutiny and consideration of the models should be continuous and ongoing.” This is a non sequitur.

    We have a situation whereby a number of different published statistical methodologies show a significant difference between observations and model trends. Of course we should keep an open mind about why this is, but your reluctance to accept that this creates questions about the models is puzzling. As is you continual refusal to concede that the appropriate test is not your simple “spread of the ensemble”. Can you point to a single peer reviewed paper that has used this spread test?

  118. @Dikran Marsupial:

    I think you answered my question. :) The range of the ensemble is used to determine if the models are accurate by measuring the observation against the range of predictions.

    The models are tested, and they are validated (possibly against unseen data.)

    That sounds like a predictive measure to me. No need to talk about monte-carlo or pendulums. Data in, prediction out. :)

    Is this correct?

  119. Dikran Marsupial

    6 February 2012 at 4:18 pm

    @certy I didn’t say cherry picked, I was merely pointing out that the result of the test is not robust because if you change the start and end dates slightly, you change the result of the test. Given ENSO it is quite likely that if we wait a little longer the difference will go back to being not significantly again, but that won’t actually mean that anything has suddenly changed either.

    “I also don’t get your point that this is not a reflection on the models because “scrutiny and consideration of the models should be continuous and ongoing.” This is a non sequitur.”

    Given that I didn’t say that it is not a reflection on the models, then of course it is a non-sequitur. What I did say is that the change is the results of the test is not a cause for pause (for thought) regarding the models, the observations were bumping along the threshold already, which is reason for scrutiny of the model already BEFORE the result of the test flipped.

    “We have a situation whereby a number of different published statistical methodologies show a significant difference between observations and model trends. Of course we should keep an open mind about why this is, but your reluctance to accept that this creates questions about the models is puzzling.”

    I don’t know how more clearly I can say this than I have already, but the questions about the models were already there and being discussed in the litterature before the likes of Douglass et al were published.

    “As is you continual refusal to concede that the appropriate test is not your simple “spread of the ensemble”.”

    Well if you could point out why the test is incorrect (i.e. point out the flaw in the reasoning) then I might. However so far all you have done is point out that there are other tests, at least one of which is fundamentally incorrect, and one that I have pointed out is conservative.

    ” Can you point to a single peer reviewed paper that has used this spread test?”

    No, not off hand; to be honest I was rather surprised that Santer et al didn’t use it. Their test is only slightly conservative compared to the “spread of the ensemble” test, so it doesn’t make a great deal of difference. If the models currently fail the Santer test, they are very close to failing the spread of the ensemble test if they haven’t already. It is a bad idea to think of the tests as simply pass-fail, especially if the test is updated so the test it sequentially repeated.

    This is why I prefer Bayes factors, which are a continuous assessment of the relative plausibility of two hypotheses.

  120. Dikran Marsupial

    6 February 2012 at 4:22 pm

    @will, sorry it seems to me that the discussion is no longer being taken seriously, so I will leave it there.

  121. Dikran,

    Perhaps we are in agreement on one point – the fact that the models and observations are, in your words, “bumping along the threshold” means there is doubt on the accuracy of the models. Matt’s claim (B) that “the IPCC forecasts have been systematically too large” seem buttressed by this (though of course not proven).

    I already pointed out why the spread of the ensemble is a poor test. You can have a very poor model that is biased to the high side and another very poor model that is biased to the low side. The average of the ensemble is unchanged but you will have a spread so wide as to make your suggested test meaningless. As has been described in a number of blog discussions, this seems to the case we are facing in reality. Some less sophisticated models from less sophisticated countries were included in AR4. Ideally they would have been weeded out, but politics and courtesy dictated otherwise. So, if you are going to attempt a test of an ensemble spread nature, at a minimum you would need to eliminate all outliers. I can’t see you getting a simple spread test past peer review.

    As an aside, I have never said anything contrary to your point that “the questions about the models were already there and being discussed in the litterature (sic) before the likes of Douglass et al were published”. Why do you think otherwise?

    I also disagree with your statement that “the result of the test is not robust because if you change the start and end dates slightly, you change the result of the test.” MMH did not change dates “slightly”, they increased the time period by a full 50% (20 years to 30 years). Thirty years is a robust time period for a trend.

  122. @Dikran: I’m feeling a bit neglected here as you focused on a couple of firefights and missed my question of 3 February 2012 at 3:28 pm. I’ll rephrase and make it a bit briefer:

    The IPCC ensemble, as I understand your description, seems to refer to a single model with let’s say a dozen parameters of which a couple (CO2 being one) have been singled as out as “forcings” for exploration through a series of scenarios. The other parameters remain unchanged and their uncertainties are not reflected in the ensemble, nor is uncertainty in the original data. Is this correct?

    If so, it seems that the ensemble does not reveal any uncertainties of the model, but only the usual uncertainties associated with forecasting covariates in order to forecast a desired outcome. You’re playing “what if” assuming the model (and the data upon which it was tuned) itself had no uncertainties.

    Or am I misunderstanding it again?

  123. @Dikran: Reading @certy’s comment after just reposting my question, it appears that there actually is an ensemble of different models from different sources, as with weather forecasting ensembles. This affects my previous question to the extent that the models truly come from different perspectives and thus explore the space.

    I seem to remember you saying in an earlier posting that the ensemble was basically all based on an IPCC model, or something like that, and perhaps in my previous question I misconstrued that to understand this to be a single model. None the less, if the scenarios are common, the models must all be using the same visible parameters (forcings) and a host of invisible parameters (everything that’s tuned with historical data, or to back-cast reasonably). I don’t think there’s any statistical argument that would say that their invisible parameters would cancel each other out, even if they didn’t have many in common and even if they weren’t tuning to the same training set.

    Thoughts?

  124. Dikran – First, I want to thank you for providing the first easy to understand explanation of why some (e.g., Santer, Schmidt, and you) feel that the appropriate method for evaluating models is to see whether the observations fall within the range of the model spread. Never quite got it from either Gavin’s explanation or reading Santer. And I agree with you that this is an appropriate method IF the models are being used for their original purpose of trying to understand how climate works and how various components of climate interact. However, if the models are being used to make projections or predictions (use whichever term you prefer) about how future climate might unfold then we need to have a more stringent test that tells us something about the skill of the model at matching whatever is our parameter of interest. The compare to the spread method simply is not useful for that in that it essentially says all models , from those that should get a D- to those that deserve an A, are all good enough. In that case, if the claim is the ensemble mean is the best estimate of what will happen, then it is fair to directly compare the obervations (and assoicated error bars) to the ensemble mean (and associated error).

  125. My point is that sometimes different questions are better answered by different approaches.

  126. @Dikran Marsupial: Why are you saying that this discussion is not being taken seriously? If thats how you feel then I can assure you that there has been a misunderstanding.

    I appreciate your willingness to answer the questions of a complete stranger, but try looking it from the ‘other side’. When I brought up a ‘random number’ model you said:

    O.K., take a random number generator and see if it would have better predictive ability than Hansen’s 1988 projections. You will find that it wont.

    But haven’t you’ve been saying, all along, that the very test you just proposed isn’t applicable??

    You also said:
    “If the observations lie within the stated uncertainty of the model, then the model is essentially performing as well as it claims to be able to perform”

    This would imply that the model is not performing as well if the observations lie outside of the stated uncertainty of the model. Great! We are out of the world of monte-carlo and chaos, and in the land of quantifiable testing. That’s a bad thing?

  127. Dikran Marsupial

    7 February 2012 at 5:41 am

    @certy I already said that if you are concerned about the test depending on the two most extreme models you can define the spread as 2 times the standard deviation of the model runs, which addresses that issue. There is no such thing as an outlier in the ensemble, they all represent outcomes that are plausible according to our understanding of the physics.

    Regarding the point about non-robusteness of the test, I was discussing the Santer et al test. I haven’t read the MMH paper yet, so I can’t comment on that test.

    The point that is continually being missed is when discussing whether the models are accurate or not, you need to know how accurate we could reasonably expect a model to be even if it were perfect. This error would not be zero, it would depend on the plausible magnitude of unforced variability, which we have no means of estimating other than by the models we seek to test. This is the key point.

  128. Dikran Marsupial

    7 February 2012 at 5:54 am

    Wayne, the forcings are the input to the models, they are not parameters. We don’t know what these inputs will actually be in the future, so the IPCC have a set of scenarios that they consider to be representative of what might happen. The reason they use the term “projections” rather than “predictions” is to emphasise the fact that they are projecting what might plausibly happen IF the scenario of forcings happens.

    As you have gathered from certy, the multi-model ensemble has models from various different groups, which covers some of the uncertainty in modelling community regarding climate physics. The modellers also perform “peturbed physics” runs, where they can examine the sensitivity of the projections to changes in the physics in the models.

    “You’re playing “what if” assuming the model (and the data upon which it was tuned) itself had no uncertainties”

    the models are indeed playing “what if” simulations, but the model uncertainties do contribute to the spread of the ensemble run, so they are considered.

    It is a mistake to think of the models as if they were statistical models calibrated on training data. They are mostly constructed from knowledge of physics and only some parts of the models are “parameterised” and calibrated with data. If the models were very sensitive to the training data you can be sure that a skeptic scientists by now would have come up with a GCM that showed that the warming can be explained without CO2, but no such model has been implemented.

    Hope this helps, however I am only a statistician (who has worked with model output), for detailed questions about the models it would be better to ask someone like Gavin Schmidt at Real Climate, who is heavily involved in that side of things.

  129. Dikran Marsupial

    7 February 2012 at 6:02 am

    BobN, we can’t assess the quality of the models without knowing how well a perfect model would be able to predict the observed climate, and we have no way of estimating that from a single datapoint (the climate on the Earth we can actually observe). The only estimate we have of this is from the models themselves.

    If we just choose the model that gives the best prediction of the observed climate it may just be that the systematic bias in that particular model by random chance matches the effects of unforced variability on the observations, and hence may well give worse predictions than the multi-mode mean.

    It is a bit like predicting the outcome of one roll of a biased die using an ensemble of biased die. Say the die in question is biased so that it gives high numbers more frequenty than low numbers, but on the occasion we actually observe a roll we get a two. In this case the model from the ensemble giving the best prediction is likely to be one that is biased low and hence will give worse predictions of future rolls and will probably be worse than just averaging over the ensemble.

    We know that none of the models in the ensemble are “true”, our knowledge of the physics means they are all plausible, and we don’t have enough data to confidently rule any of them out. So we are better off keeping all of the models and having a broad spread of projections which honestly represents our uncertainty.

  130. Dikran Marsupial

    7 February 2012 at 6:12 am

    @Will, the uncertainty of the projections is of vital importance. In statistical decision theory we should evaluate the expected loss by averaging the losses for each outcome, weighted by its plausibility according to the model. You need to stop focussing on the accuracy of the models, and consider the importance of the uncertainty.

    We can only determine if the models are accurate if we know the plausible magnitude of the effects of unforced variability. If you know how to estimate that without the models we are testing, then explain how.

  131. @dikran marsupial: I don’t think I am ignoring the importance of the uncertainty. I don’t have a problem with saying that a prediction has an upper limit in terms of accuracy. I accept your position regarding upper and lower bounds. You seem determined to avoid any kind of testable metric relating to the models though, and this I find puzzling.

    I cannot abandon the concept of testing against observations, and that is what you seem to be asking me to do. To do otherwise would be an act of faith.

  132. Dikran Marsupial

    7 February 2012 at 12:06 pm

    @will there is an absolute testable metric, which is whether the models can explain the observations (in the sense that the observations lie within the spread of outcomes that the models consider plausible). There are relative metrics as well, we can see if one model predicts the observations better than another in a least-squares sense for example. However the problem is what does that actually mean?

    Nobody has abandoned the idea of testing the models against the observations. That is exactly what Ben Santer’s research group does, for example. There is a chapter in the IPCC report devoted to exactly that topic. The point is that in testing you have to understand what you can reasonably expect in terms of accuracy (e.g. Santer et al.), and what would be an unreasonable expectation (e.g. Douglass et al). They key to that is understanding what the modellers claim their models actually do and base the test on that, rather than on something they do not claim their models do (or even worse something they would tell you they can’t do).

    I am not against testing, I am for fair testing, and open mindedness and skepticism where the statistical evidence is equivocal.

Comments are closed.

© 2014 William M. Briggs

Theme by Anders NorenUp ↑