The Answer to Senn will continue on Monday. Look for my Finger Lakes winery tour tasting notes Sunday!
Several readers asked me to comment on an ensemble climate forecasting post over at Anthony’s place, written by Robert G. Brown. Truthfully, quite truthfully, I’d rather not. I am sicker of climate statistics than I am about dice probabilities. But…
I agree with very little of Brown’s interpretation of statistics. The gentleman takes too literally the language of classical, frequentist statistics, and this leads him astray.
There is nothing wrong, statistically or practically, with using “ensemble” forecasts (averages or functions of forecasts as new forecasts). They are often in weather forecasts better than “plain” or lone-model predications. The theory on which they are based is sound (the atmosphere is sensitive to initial conditions), the statistics, while imperfect, are in the ballpark and not unreasonable.
Ignore technicalities and think of this. We have model A, written by a group at some Leviathan-funded university, model B, written by a different group at another ward of Leviathan, and so on with C, D, etc. through Z. Each of these is largely the same, but different in detail. They differ because there is no Consensus on what the best model should be. Each of these predicts temperature (for ease, suppose just one number). Whether any of these models faithfully represents the physics of the atmosphere is a different question and is addressed below (and not important here).
Let’s define the ensemble forecast as the average of A through Z. Since forecasts that give an idea of uncertainty are better than forecasts which don’t, our ensemble forecast will use the spread of these models as an idea of the uncertainty.
We can go further and say that our uncertainty in the future temperature will be quantified by (say) a normal distribution1, which needs a central and a spread parameter. We’ll let the ensemble mean equal the central parameter and let the standard deviation of the ensemble equal the spread parameter.
This is an operational definition of a forecast. It is sane and comprehensible. The central parameter is not an estimate: we say it equals the ensemble mean. Same with the spread parameter: it is we who say what it is.
There is no “true” value of these parameters, which is why there are no estimates. Strike that: in one sense—perfection—there is a true value of the spread parameter, which is 0, and a true value of the central parameter, which is whatever (exactly) the temperature will be. But since we do not know the temperature in advance, there is no point to talking about “true” values.
Since there aren’t any “true” values (except in that degenerate sense), there are no estimates. Thus we have no interest in “independent and identically distributed models”, or in “random” or “uncorrelated samples” or any of that gobbledygook. There is no “abuse”, “horrendous” or otherwise, in the creation of this (potentially useful) forecast.
Listen: I could forecast tomorrow’s high temperature (quantify my uncertainty in its value) at Central Park with a normal with parameters 15o C (central) and 8o C (spread) every day forever. Just as you could thump your chest and say, every day from now until the Trump of Doom, the maximum will be 17o C (which is equivalent to central 17o C and spread 0o C).
Okay, so we have three forecasts in contention: the ensemble/normal, my unvarying normal, and your rigid normal. Who’s is better?
I don’t know, and neither do you.
It’s likely yours stinks, given our knowledge of past high temperatures (they aren’t always 17o C). But this isn’t proof it stinks. We’d have to wait until actual temperatures came in to say so. My forecast is not likely much better. It acknowledges more uncertainty than yours, but it’s still inflexible.
The ensemble will probably be best. It might be, as is usually the case with ensemble forecasts, that it will evince a steady bias: say it’s on average hot by 2o C. And it might be that the spread of the ensemble is too narrow; that is, the forecast will not be calibrated (calibration has several dimensions, none of which I will discuss today; look up my pal Tilmann Gneiting’s paper on the subject).
Bias and too-narrow spread are common failings of ensemble forecasts, but these can be fixed in the sense that the ensembles themselves go into a process which attempts a correction based on past performance and which outputs (something like) another normal distribution with modified parameters. Don’t sniff at this: this kind of correction is applied all the time to weather forecasts (it’s called MOS).
Now, are the original or adjusted ensemble forecasts any good? If so, then the models are probably getting the physics right. If not, then not. We have to check: do the validation and apply some proper score to them. Only that would tell us. We cannot, in any way, say they are wrong before we do the checking. They are certainly not wrong because they are ensemble forecasts. They could only be wrong if they fail to match reality. (The forecasts Roy S. had up a week or so ago didn’t look like they did too well, but I only glanced at his picture.)
Conclusion: ensemble forecasts are fine, even desirable since they acknowledge up front the uncertainty in the forecasts. Anything that gives a nod to chaos is a good thing.
Update Although it is true ensemble forecasting makes sense, I do NOT claim that they do well in practice for climate models. I also dispute the notion that we have to act before we are able to verify the models. That’s nuts. If that logic held, then we would have to act on any bizarre notion that took our fancy as long as we perceived it might be a big enough threat.
Come to think of it, that’s how politicians gain power.
Update I weep at the difficulty of explaining things. I’ve seen comments about this post on other sites. A few understand what I said, others—who I suspect want Brown to be right but aren’t bothering to be careful about the matter—did not. Don’t bother denying it. So many people say things like, “I don’t understand Brown, but I’m going to frame his post.” Good grief.
There are two separate matters here. Keep them that way.
ONE Do ensemble forecast make statistical sense? Yes. Yes, they do. Of course they do. There is nothing in the world wrong with them. It does NOT matter whether the object of the forecast is chaotic, complex, physical, emotional, anything. All that gibberish about “random samples of models” or whatever is meaningless. There will be no “b****-slapping” anybody. (And don’t forget ensembles were invented to acknowledge the chaotic nature of the atmosphere, as I said above.)
Forecasts are statements of uncertainty. Since we do not know the future state of the atmosphere, it is fine to say “I am uncertain about it.” We might even attach a number to this uncertainty. Why not? I saw somebody say something like “It’s wrong to say our uncertainty is 95% because the atmosphere is chaotic.” That’s as wrong as when a rabid progressive says, “There is no truth.”
TWO Are the ensemble models used in climate forecasts any good? They don’t seem to be; not for longer-range predictions (and don’t forget that ensembles can have just one member). Some climate model forecasts—those for a few months ahead—seem to have skill, i.e. they are good. Why deny the obvious? The multi-year ones look like they’re too hot.
If that’s so, that means when a fervent climatologists says, “The probability the global temperature will increase by 1 degree C over the next five years is 95%” he is making a statement which is too sure of itself. But that he can make such a statement—that it makes statistical sense to do so—is certain.
If you don’t believe this, you’re not thinking straight. After all, do you not believe yourself that the climatologist is too certain? If so, then you are equivalently making a statement of uncertainty about the future atmosphere. Even saying, “Nobody knows” is making a statement of uncertainty.
See the notes below this line and in my comments to others in the text.
1I pick the normal because of its ubiquity, not its appropriateness. Also, probability is not a real physical thing but a measure of uncertainty. Thus nothing—as in no thing—is “normally distributed”. Rather we quantify our uncertainty in the value of a thing with a normal. We say, “Given for the sake of argument that uncertainty in this thing is quantified by a normal, with this and that value of the central and spread parameter, the probability the thing equals X is 0.”
Little joke there. The probability of the thing equaling any—as in any—value is always and forevermore 0 for any normal. Normal distributions are weird.