William M. Briggs

Statistician to the Stars!

How Good Is That Model? Scoring Rules For Forecasts: Part III

8424034566_9f914088b2_k

Read Part I, Part II

Part III

This brings us to the second reason for measuring model goodness. Or rather an incorrect implementation of it. A lot of folks announce how well their model fit past data, i.e. the data used to fit the model. Indeed, classical statistical procedure, which includes hypothesis testing, is built around telling the world how well models fit data. Yet model fit is of no real interest to measure forecast goodness.

I’ll repeat that. Model fit is of no interest to measure forecast goodness.

It matters not to the world that your patented whiz-bang prize-winning model, built by the best experts government grants can buy, can be painted on to past data so closely that nothing is left to the imagination, because a model that fits well is not necessarily one that predicts well. Over-fitting leads to great fits but it causes bad predictions. Incidentally, this is yet another in a long list of reasons to distrust p-values.

You have to be careful. What we’re listening for are claims of model skill. What we sometimes get are announcements of skill that appear to be predictive skill but are model-fit skill. A model, once created (and this can be a purely statistical or purely physics model or somewhere in-between as most are) is used to make “predictions” of the data used to create the model and skill scores are calculated. But these are just another measure of model fit. They are not real predictive skill. We only care about predictions made on observations never made known (in any way) to the model developers.

There are claims that climate models have skill or that they have good proper scores. This is false in the predictive sense for forecasts out past around a year or so (climate models out a few months actually have a form of skill noted below). What is usually meant by this, when people claim good model performance, is that the model either fit old data well or that the model was able to reproduce features in old data. However much interest this has for model builders—and it does have some—it is of zero, absolutely zero interest for judging how good the model is at its job.

There are two standard simpler or naive model is meteorology and climatology: climatology (unfortunately named) and persistence. The climatology forecast is some constant, just like in the naive regression model. It’s usually the value (mean and standard deviation used to fit a normal) over some period of time, like 30 years. Obviously, a complex model that can’t beat the forecast of “It’ll be just like it was on average over the last 30 years” is of no predictive value. Persistence says the next time point will be equal to this time point (again, this time point might be fit to something like a normal, a procedure which uses more than just one time period, in order to make persistence into a probability forecast). Again, complex models which can’t beat persistence are of no predictive value.

Would you use a model which can’t beat just guessing the future will be like the past?

I’m not entirely sure, but I don’t think the sort of models on which the IPCC relies even have climatology predictive skill these last 20 or so years. None have persistence skill.

Again I ask: why use any model which can’t beat just guessing? The “model” of saying the future will be (more or less) like the past is beating the pants off of the highly sophisticated complex models. Why? I have no idea: well, some idea. But even if we’re right in that link, it doesn’t solve the model problems. Indeed, nobody really knows what’s wrong. If they did, they would fix it and the models would work. Since they don’t work, we know that nobody has identified the faults.

The third reason to check model performance is somewhat neglected. If the model over some time period has this-and-such score, it is rational to suppose, outside of evidence to the contrary, that it will continue to perform similarly in the future. This is why, unless we hear of major breakthroughs in climate science, it is rational to continue doubting GCMs.

But past performance can also be quantified. In effect, the past scores become data to a new model of future performance. We can predict skill. Not only that, but we can take measures of things that were simultaneous with the forecast-observation pairs. These become part of the model to predict skill. Then if we have a good idea of the value of these other things (say El Niño versus non-El Niño years), then we might be able to see in advance if the forecast is worth relying on.

Those are the basics of forecast verification. There is, of course, much more to it. For instance, we haven’t yet discussed calibration. Of that, more later.

Bonus Via Bishop Hill, this.

19 Comments

  1. “There are two standard simpler or naive model is meteorology and climatology:” Should that be “in” meteorolgy…?

    Beyond the typo, what you’re saying in this paragraph is that there are two simple models: one is the average of observations from some previous time period and the other is a repeat of the last observation. Right?

  2. Briggs, your point that past fit is no guarantee of predictive skill is nowhere better illustrated than in the stock market.

  3. Bob: The stock market and human behaviour. How many times do we see the child that fits the model of a perfect life end up a drug addict or a murderer? Complex systems, like people, climate and stocks (people are part of the stock equation, of course) may be described by a model well based only on their past behaviour, yet totally strike out on future predictions. It, to me, means we really don’t know the factors involved and the past fit is just a bunch of retrofitting using whatever factors it takes or creating ones if we can’t find any.

  4. Sheri:
    Complex systems, like people, climate and stocks (people are part of the stock equation, of course)
    … according to theory, people are a part of the Climate and in fact people “drive” the climate.

    Briggs:
    A lot of folks announce how well their model fit past data, i.e. the data used to fit the model
    … and then there’s the accusation that the data they’re fitting has been adjusted to fit their models!

  5. But, Sheri, all the models for future stock prices are supposed to take account of human behavior, which is why, as you point out, they fail.

  6. John: I guess I felt the notation that people are part of the climate was evident and didn’t note that. My bad.

    As for the accusations that the data they’re filling has been adjusted, the problem seems to be that 20 years ago, before that nasty leveling off of temps while CO2 rose, everything seemed to fit well. Now, it’s either adjust the data or redo the model. My guess is data manipulation is quicker and easier. Since scientists are pure and without “sin” (or that’s the story they are telling), no one would ever suspect them except those pesky, unscientific skeptics.

  7. Perhaps I shouldn’t “assume” everyone is familiar with Warren Meyer.

    http://www.climate-skeptic.com/2010/01/catastrophe-denied-the-science-of-the-skeptics-position.html

    …and speaking of 20 years, his five year old presentation is even more relevant when he discusses plug variables, positive feedbacks and back casting (it’s relatively old but I think even more damaging to the CAGW case).

  8. Well, here’s the problem with this argument. Of all the models, and there are many, NONE fit current warming trends without the added CO2. Not a one. You’d dismissing all and the aggregate! It’s just blithely dismissive.

    JMJ

  9. JMJ:

    That none fit the trend without CO2 means nothing, especially considering they don’t really fit the current trend with it (nor do they predict forward, making any hindcasting look just like a useless optimization exercise). Your statement also ignores models that hindcastwell and diminish the impact of CO2: http://hockeyschtick.blogspot.co.uk/2014/04/the-time-integral-of-solar-activity.html

    The point is, we can blather on all day about hind-casting models and how well they fit seen data. It’s too easy to make that work. Since temperature has gone mostly up in the global data sets, any kind of input to a regression that goes mostly up will be able to give you a good fit. The whole point of these posts has been about skill in predicting (looking forward). Talking about what happened in the past is pretty worthless.

  10. “Well, here’s the problem with this argument. Of all the models, and there are many, NONE fit current warming trends without the added CO2.”

    This is the classic logical fallacy called the Argument From Ignorance. Paraphrased, “the cause must be X before we don’t know what else it might be.”

    The claim has been debunked countless times but like all such claims it has acquired a zombie like persistence, resistant to all logical argumentation.

    One same debunking among many others –

    http://jonova.s3.amazonaws.com/graphs/hadley/Hadley-global-temps-1850-2010-web.jpg

  11. Will, you’re not making a substantive argument, and with the CO2 levels the way they’re going, wouldn’t it be flat-out just stupid to ignore them? Again, just blithely ignoring…

    James, if modelling was useless, multibillion dollar corporations wouldn’t use them. Any other things you don’t like that are useless? The vote? Accountancy?

    JMJ

  12. JMJ,

    More fallacious thinking. One, I never said ALL models were useless, only that talk of hindcasting is a useless optimization exercise (and it is, why would you need a model to tell you what you already know? it’s prediction that matters!). Two, just because some models are valuable to some entities does not therefore mean all are automatically valuable. It doesn’t even mean that the ‘valuable’ models (to someone) are even correct. It just means that they have happened to work out well enough in their particular circumstances.

    Voting and accountancy aren’t even related to statistical modeling or prediction! What a silly direction to move the conversation.

  13. Jersey, you need to learn to present arguments. You’ve basically just said I’m wrong, a typical “is not!” response. In your second sentence you assume your premise, which is another a logical fallacy called ‘begging the question’.

    With regard to the second paragraph not directed at me, I would observe that there are no “multibillion dollar corporations” that use climate models for any constructive purpose. Making stuff up doesn’t even make it up to the level of a logical fallacy. Climate models presently are unvalidated and have shown no predictive skill. The hope is, in the long run, they will be validated. However, I’m old school in the sense that my beliefs have to be based on evidence, not wishful thinking.

  14. Will you more optimistic than I, I have no exception of a model ever being to predict a chaotic system.

    JMJ your so called model do not even come close to fitting what CO2 is doing, the missed the mark badly in the 90s (no upper atmospheric warming, none, not any!) Now 20 years later they are so far off the mark now there is no hope for them, If the so called “climate scientist” they revamped them and removed the positive feedback they might be coming close, but you and I know the so call “climate scientist” cannot do that since if they did we and they would have nothing to be alarmed about.

    After 40 year of working with computer I certainty recognize GIGO, to bad you don’t.

  15. A few years ago I had a look at the Nino 3-4 anomaly predictions from 2000 onwards until 2012 (can’t remember and in a hurry) at this site:
    http://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/
    There were 30+ models- physical and statistical.
    They were compared with the recorded results from:
    http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for
    The range of the anomaly is only +- 2.5 degrees.
    I found some models had mean errors of up to +0.5 degree
    and error SDs approaching a degree after 6 months or so.
    I wrote my own model with about a yard of Python
    that looked to find a similar time in the past and then used
    what followed as a prediction. Considered Nino 1, 2, 3, 4 and 3.4 in similarity.
    I can’t swear the were no errors in my work, but I was in the top third of
    models at 4 months falling to bottom quarter further out.

  16. @Mark Luhman
    “… you more optimistic than I, I have no exception of a model ever being to predict a chaotic system.”

    The climate is not chaotic. It’s incredibly stable over centuries and millenniums. Over the last 100 years temperature has only varied by about 0.5C. The climate system does, of course, have various chaotic elements. You’re not trying to measure the chaos. You’re trying to measure what’s not chaotic. Virtually all physical systems have chaotic elements. In fact, weather is chaotic yet we have some rather impressive weather models these days. I listened to the news a few weeks ago and the reporter informed me it would rain at 10pm. It actually rained in my area at 10:20pm. Not bad, eh?

    (Skepticism is healthy but skepticism based on ignorance is decidedly unhealthy.)

  17. Jersey: Maybe you need to check what “substantive argument” means.

    Should we ignore the increase in CO2? I don’t know. What if the O2 level increases? What if the nitrogen level increases? Does CO2 never rise unless something bad is about to happen? Do we have any evidence of this or is it based on tree rings, ice cores and government grants? CO2 is rising and temperture is not. I thought temperature was the problem.

    Multibillion dollar corporations do many foolish, expensive and useless things, generally if they don’t use their own money to pay for it. You know, like the government does. I would refer you to Dilbert for further instruction in multibillion dollar corporation behaviour, or maybe the Peter Principle if you prefer non-fiction to humor.

    Will: Can we have your weather guy? So far, for the past week’s forecast, they have gone from us reaching 50 to reaching 40 back to reaching 50 and then added that winter is going to return (no definitive date on that one—could be a year, I guess). Or maybe just a translation of what this means. The forecasts here change from the 5 o’clock to the 6 o’clock broadcasts and I have screen prints of 6 or 7 very different forecasts from agencies for one week’s time. Obviously, we need better weather guys.

  18. Forecasting in my region is extremely accurate. Not perfect. But extremely accurate. I remember back in the ’70’s on my black & white TV listening (on Friday night) to the weekend weather forecast. It wasn’t fun being a weather man back in the ’70’s as half the time they got their forecast wrong, so that sunny weekend they promised would often get rained out, ruining everyone’s weekend at the beach or picnic.

    So much for not being able to forecast chaotic systems and other such nonsense.

  19. Briggs,

    We only care about predictions made on observations never made known (in any way) to the model developers. […] The climatology forecast is some constant, just like in the naive regression model. It’s usually the value (mean and standard deviation used to fit a normal) over some period of time, like 30 years.

    If the policy, no matter what, is “wait and see” it makes no rational sense whatsoever to do the model at all. Just wait and see.

    If the model over some time period has this-and-such score, it is rational to suppose, outside of evidence to the contrary, that it will continue to perform similarly in the future. This is why, unless we hear of major breakthroughs in climate science, it is rational to continue doubting GCMs.

    After taking care to disambiguate model-fit and prediction skill, here you use the term “score” ambiguously. By your statements in the first block of quoted text, it will take not only news of a “major breakthrough” but on the order of decades of “skillful predictions” to convince you to not continue doubting GCMs.

    I’m not entirely sure, but I don’t think the sort of models on which the IPCC relies even have climatology predictive skill these last 20 or so years. None have persistence skill.

    For both the former supposition and the latter assertion, it comes down to what reference models were used and the specifics of the skill calculations. Which details are once again notably absent.

Leave a Reply

Your email address will not be published.

*

© 2016 William M. Briggs

Theme by Anders NorenUp ↑