Does Averaging Incorrect Data Give A Result That Is Less Incorrect? Climate Modeling

Another question from the statistics mailbag:

Dear Matt: I recently got into a discussion with a CAGW “believer” and of course the discussion turned to global average temperature (whatever that is) anomalies and that the predictions of climate catastrophe are based on computer model output. I then said, “If a computer model cannot predict regional changes, it cannot predict global changes. Averaging incorrect data does not give accurate data,” referring to the computer models. Was that a correct statement?

Although I once took statistics courses, about the only things I remember are median, mode, mean, and standard deviation so if you have time to respond to this e-mail, please do so in a ridiculously simple way that I might be able to understand.

Thanks. By the way, I like your new format.

Regards,

Chuck Lampert

Sort of Yes, Chuck. The part that’s tricky is your conditional: climate models necessarily do better at “higher” scales than “lower ones.” But your second part is right: averaging a Messerschmidt, no matter how large, still leaves you with a Messerschmidt, if I may abuse the punchline to the old joke.

First, a “climate” model is just a model of the atmosphere. What makes it “climate” is its scale and nothing more; what we call “climate” versus what we label “weather” is really a matter of custom. So imagine a model of climate of the Earth from the view of Alpha Centauri. From that vantage the Earth is indeed a pale blue dot and its “global mean” temperature can be modeled to high accuracy, as long as we don’t try for too many decimal places. We can even ignore seasonality at this distance. Heck, I’d even believe a forecast from James Hansen for “climate” as defined this way.

But now imagine the temperature and precipitation on a scale of a city block for each hour of the day and over the entire surface. This would be incredibly complex to model and verify. Even trying to write down the computing resources required produces a dull pain in the occipital lobe. To my knowledge nobody tries this for the globe as a whole, though it is done over very small areas for limited time frames. The hope that this scale of model would be accurate or useful as a climate model matches that of a Marxist who says to himself, “Next time it’ll be different.”

Here’s the tricky part. A climate model built for large-scale climate can do well, while another built for smaller-scale climate will fare more poorly, each verification considered at the scale intended of each model. We can, as you suggest, average the small-scale model so that the resultant output is on the same scale as its coarser brother.

Now it can happen that the averaged model judged on the coarser scale will outperform itself judged on its original scale. This could be simply because the model did well on the coarse scale but poorly on the fine scale. Of course, the averaged model may also perform poorly even on the large scale. There is no way to know in advance which will be the case (it all depends on the competence of the modelers and how well the models reproduce the physics).

But, all things equal, the variance (of the verification or the model itself) of the averaged model will be larger than the variance of the large-scale-from-birth model. That means we would have either more trust in the large-scale model, or in its verification statistics (even if those stats showed the model to be poor) or both.

The old tale of the Chinese Emperor’s Nose is somewhat relevant here. Nobody in China knew its length, but they desired to have the information. Why they wanted to know is a separate question we leave to competent psychologists. Anyway, many people each contributed a guess, each knowing that his answer was probably wrong. But they figured the average of all the wrong guesses would be better than any individual guess.

Wrong. Taking the mean of nonsense, as suggested above, only produces mean nonsense. So that if the small-scale model stunk at predicting small-scale climate, taking averages of itself (via ensemble forecasting, say) and then examining the average model on the same small (not large) scale will still leave you with a Messerschmidt.

—————————————————

Photo source

11 Comments

  1. Big Mike

    Climate. In the title.

  2. Milton Hathaway

    Here’s a thought – if you run a climate model backwards in time, does it accurately ‘predict’ tree ring spacings for, say, the last 100 years?

  3. Briggs

    Big Mike,

    My enemies are out in force!

  4. A couple of observations:

    If an ensemble (perhaps a good collective noun would be a “spread”?) of models produce different results, using slightly different techniques and algorithms, then clearly only one can be “correct”, though it’s not known which one. If so, what is gained by averaging the output of the ensemble? The results from the “accurate” model (doesn’t mean it’s correct, just that it gave the right answer(s)) will be lost in the averaging.

    If forecasts are made, and a single die is thrown, the result may be any of 1 to 6, yet an average of a large number of forecasts is likely to be 3.5, an impossibility.

    Whatever size of region is chosen for a regional model, the weather (short-term) or climate (long-term) in that region will be affected by, and will affect, all adjoining regions. This also applies to the adjoining regions, and so on, until the entire globe has to be taken into account. This means that “regional” models cannot produce anything like the “correct” forecasts (sorry, “projections”), and are likely to generate garbage, which which would approximate to their proven accuracy to date.

    BTW, the “couple” in the first sentence was a preliminary estimate, made using a simple mental model, and proved to be incorrect by 33.33%, as I actually made a “few” observations.

  5. Smoking Frog

    Briggs: I agree with you that the average of nonsense is nonsense, but when I was a kid, many times I did an experiment whose results might seem to tend the other way. I would make several guesses at the time of day, and average them. As I recall, the average was usually closer to the truth than any of the guesses. Of course, the guesses were not actual nonsense; some of the guesses would be based on “nothing”; the others would be based on clues taken from events of the past few hours. Maybe the moral of the story is that you have to make sure the inputs really are nonsense.

  6. Will

    Ensemble modeling isn’t the same as random sampling. Dont ever confuse the two lest you be struck down by a lightning bolt from the heavens.

    A linear regression of a data set, conducted six separate times, will produce the same answer each time. Averaging together the outputs will give you, surprise, the same answer as before.

    Six linear regressions made using a unique subset of a single data set will likely give you six answers that, when averaged together, become less accurate than a single regression performed on the entire data set.

    See the problem? In most, but not all, cases there is usually a single model that will outperform all others. Averaging In the outputs from other models will just degrade the accuracy of the best available model.

    In a world where predictive accuracy is the only measure of success it would be folly to treat models as random samples. There are excellent ways of making ensembles, but taking the mean output is not one of them.

    I think what the climate folks would say is that the ensemble provides a more accurate output “range”. Of course you could also just spit out “-999 to +999” with certainty that your answer is right.

  7. The IPCC bases its creed on scenarios developed by computer simulations. Assumptions as to parameters are many. The programs themselves are contaminated by the preset assumption tat CO2 increases cause catastrophic global warming, a hypothesis that has never been proven according to the requirements of the Scientific Method.
    Computer simulations (models, if you wish) are useful tools to explore possibilities when looking for explanations of observed events. This is no what the UN’s political IPCC is doing. In any case simulations do not provide scientific proof.
    As for the temperature case that started this discussion: Get hold of the book “Taken by Storm” by statistician/mathematician Chris Essex and environmental economist Ross McKitrick and see what they say about how mean global temperature tyrannises the climate issue. They call it T Rex.

  8. To Milton H.
    It’s been done. It’s called “hindcasting”. You take the data (such as they are) that were available in – say 1990 – put them into your model and see whether it accurately predicted the state of affairs in say, 2010. It doesn’t. For one thing, none of the IPCC models predicted that – based on current satellite records – there would be no warming over the last ten years.
    Your question deals with tree ring spaces. Let us first make clear that tree ring development depends on a lot more than temperature and is actually a pretty poor proxy, but it is used, together with stomata etc ‘en faute de mieux’. Better proxies are found in stalagmites, lake bottom sediments and other isotope tools. But they’re still proxies, and fiddling with data (“adjustments”, “corrections” etc) is well known from the internal ClimateGate e-mails of theCRU/GISS participants.

  9. j ferguson

    Messershmitt?

  10. j ferguson

    oops, I screwed it up too. Messerschmitt

  11. cb

    Not sure if I agree:
    1- as noted: just say -999 to +999, and you will be right. But since the hippies are not merely damned liars, expect more ‘believable’ values for the statistical stuff.
    2- a climate model, per def, uses much larger time-steps than some weather model. Seen per step, taking into account 1-, will averaging help matters? Given the non-ideal nature of the world (Ludic fallacy – read about that here somewhere, methinks), it seems more likely that averaging will gather errors together that will not cancel out.

    Then there is the following simple observation: as time passes, ALL the models flat-line.

    The more, er, crappy the model, the more applicable the Ludic-fallacy-idea becomes, and the ever greater the error the ‘average’ contains should become.

Leave a Reply

Your email address will not be published. Required fields are marked *