NEWS: Vicodin info Diprolene Picture of xanax Maximum dosage of phentermine Phentermine from the uk Fast delivery phentermine, Cyclobenzaprine Purchase vicodin Fluorouracil Xanax for dogs Xanax no prescription needed Cyber pharmacy phentermine Buy tramadol online Viagra online store. Norvasc Acyclovir Phentermine guaranteed overnight shipping Zyprexa Free shipping on phentermine diet pills Phentermine without doctor's approval Coreg Cialis lowest price Ambien coupon cr Order phentermine on line Free pack sample viagra Side effects of xanax mylan, Vicodin prescription Purchase tramadol online, Actonel Viagra online Methaqualone Oxycontin xanax bars percasettes and lor tabs Herbal alternatives to viagra Imuran Luvox Dalteparin: What does xanax do Nabumetone: Foscarnet Tramadol 200 mg Cheap generic viagra online Side effects of phentermine Tramadol hcl 50 mg tab Ambien and pregnancy! Viagra on line Phentermine online without a prescription! Prescription tramadol Tolmetin Trimethobenzamide Physical symptoms of high blood pressure and xanax Diet pill addiction phentermine Viagra drug! Phentermine drug interactions Piperacetazine! 2 mg xanax Lozol, Xanax withdrawal effects Cipro, Cialis impotence drug eli lilly co Buy cheap phentermine free fedex Adipex Tramadol 50mg. Phentermine 37.5mg tablets Amiodarone Mixing viagra and cialis Viagra prescription drug. Hydrocodone cod only Buy meridia Mark martin viagra On line doctor phentermine Viagra price comparison Cheap phentermine free shipping Bupropion Medrol Niacin Iothalamate Lowest cost phentermine guarantee free shipping Phentermine no perscription needed Buy viagra Antipyrine. Hydrocodone information Viagra levivia Phentermine without doctor's approval Hydroxyurea, Tripelennamine Phentermine compare prices Norvasc Generic lowest price viagra Phentermine shipped to missouri Soma sale Get online viagra Bacitracin Robaxin Phentermine 37.5 cash on delivery No overnight prescription xanax Oxycontin Doxycycline Methazolamide! Tools needed for injecting xanax Phentermine 30 Cialis comparison levitra viagra Tetracycline? Soma addiction Acquisto cialis Sell viagra Cymbalta Phentermine prescription Generic viagra india Discount generic cialis Viagra versus levivia Xanax side effect Cialis for woman Atrovent Buy hydrocodone overnight What happens when women take viagra Cialis online discount Cheap phentermine online 37 5 Phentermine canada Nevirapine Viagra testimonials Viagra shelf life Viagra overdose Cleocin Phentermine and sibutramine be combined? Epoprostenol Viagra cialis: Genaric viagra Ups cod phentermine? Colesevelam Buy phentermine no prescription Ativan re valium vs vs xanax Online viagra sales Is xanax addictive Kaopectate Phenazopyridine Taking viagra or levitra as a booster for cialis, Avalide Phentermine & health risks Snorting vicodin Buy no phentermine prescription Natural supplement for viagra Buying tramadol online Macular degeneration caused by viagra Tegretol: Cialis viagra levitra Phentermine from canada Xanax withdrawl symptoms Phentermine online cod Hydrocodone prescription Get viagra, Synthroid Viagra alternative for women Ethynodiol Add a link viagra! Secobarbital Buying viagra online uk Difference between viagra and levivia Crestor! Ritonavir Tocainide Cheap vicodin Pyridoxine! Albuterol What does phentermine do to your heart Xanax versus klonopin for chronic anxiety Cheaper viagra levivia cyalis Filling online prescription viagra Trimeprazine Quazepam Iodipamide Cialis compare levitra Cheap phentermine online no prescription Enoxaparin Imipramine Methylergonovine Phentermine prescription Without prescription phentermine Stavudine Cash on delivery shipping of phentermine Cheap viagra pills Cyclophosphamide Trihexyphenidyl Generic name online qoclick tramadol Aricept Viagra conviaindications Viagra alternates Cheap hydrocodone Mesalamine Tramadol hydrochloride overdose Better than viagra Locoid I need to identify pictures of phentermine: Sell viagra online Mixing viagra and cialis: Online cialis Allopurinol Woman taking viagra Levaquin No prescription needed phentermine Keyword tramadol: Phentermine no credit card required Mobic Probenecid Xanax online without prescription Mechlorethamine Dacarbazine Overnight xanax or alprazolam delivery Chlorpromazine Phentermine ingredients Viagra substitutes. Adipex p phentermine vs Dextromethorphan Motrin Spectinomycin Androgel On line phentermine Half price viagra Phendimetrazine, Order viagra visit your doctor online Phentermine fda Drug phentermine 37.5 pdr Nonoxynol Phentermine 37 5mg Viagra female sexual inhancement Phentermine online without a prescription Amsterdam holland viagra Norethynodrel Compare viagra cialis levitra Phentermine from canada Cialis purchase Lorazepam Viagra drug interaction Buying xanax online 100 mg tramadol Cod phentermine Felodipine. Generic cialis uk About xanax Cialis drug for impotence Accolate Long term phentermine use Ethynodiol: Cycloserine Viagra for sale online, Fluoxetine Phentermine info Online tramadol Erythromycin: Diethylpropion Phentermine dangerous Ambien cr Phentermine airborne express+cod Femara Ssri phentermine heart Bush inauguration speech draft viagra bastard of Celebrex Cialis tablets Vitamin Info on meridia Cod phentermine? Niacin Cheap generic viagra substitutes: Overnight phentermine Generic xanax online Ash of soma Mucomyst Epivir Buy get online prescription viagra Link buy online viagra info domain Phentermine hc: Prozac and xanax induced mood disorder Climara Clomid Comparison levivia viagra, Soma sale Hydrochlorothiazide Order viagra online Order cialis online: Adderall Phentermine 37.5 mg sale Hydrochlorothiazide Phentermine next day Podophyllum Viagra experience Hydrocodone withdrawal Buy cheap purchase uk viagra! Effects of phentermine Ritalin Canada viagra Xanax zoloft Crystal meth and xanax Apomorphine Phentermine without a prescription Hydrocodone cough Diet pill phentermine Anagrelide Extra cheap phentermine Online pharmacy phentermine Natural viagra alternative Pyrimethamine! Cheapest cialis Custom hrt phentermine Phentermine risks Viagra alternatives? Phentermine priority mail Buy online salescom viagra? Naprosyn Buying vicodin Nifedipine Thyrotropin Isoxsuprine Dipyridamole. Nizoral Tinzaparin? Aminophylline Fioricet description Hyzaar Is tramadol a narcotic Soma muscle Phentermine discount no prescription? Imiquimod Estrogen Next day phentermine Buy generic viagra Exelon Dantrolene. Estradiol Xanax picture Phentermine interactions Yasmin Viagra levivia Generic sample viagra Crestor Mexican pharmacy viagra: Half life of xanax Fast phentermine: Buy viagra internet Adipexdrug addiction order phentermine online: Atarax Cialis viagra, Cheapest phentermine price Side effects of the drug tramadol: Hydrocodone ap ap Viagra online consultation. Vicodin information Lovenox On line viagra Trifluoperazine! Purchase viagra Fosamax Buy vicodin Abbr href rel title title viagra: Xanax drug testing Actos? Viagra alternatives Viagra like pill. Phentermine by fedex Lowest prices on phentermine Xanax half life Fluconazole? Afrin Trimethobenzamide? Effects phentermine side strong Cortisol Generic cialis Phentermine 37.5 no prescription Filing income tax tramadol Why phentermine. Viagra women Xanax xr 3 mg, Phentermine without rx Viagra substitute Lynestrenol Lexapro and phentermine Ethinamate Phentermine delivered overnight, Cialis review Ceftizoxime How to get xanax Buy cialis uk! Physican's desk reference phentermine Meridia weight loss Per day buy phentermine Phentermine eprescriptions Herbal phentermine does it work 0 buy by popl powered viagra wordpress? Congress viagra Buy generic hydrocodone? Trimetrexate Coreg Phentermine 30mg Phentermine diet medication. 2005 comment december leave viagra Vicodin abuse Online phentermine order Mefloquine Atenolol viagra Ativan xanax Viagra canada prescription Dutasteride Leflunomide Cialis generic? Cilexetil Hydrocodone description Phentermine side effects dangers Adipex cheap phentermine Recreational viagra Mirtazapine. Purchase xanax online Noroxin, Cialis price Per day buy phentermine Cheap phentermine Phentermine np Avandia Discount drug viagra: Butaperazine Pentaerythritol Buy vicodin online Buy cialis generic online Isotretinoin Xanax gg 258 Hyperalimentation No prior perscription tramadol Pfizer xanax Meperidine, Meridia order Phentermine overnight Buy viagra online uk Paxil Bretylium Generic viagra canada Hydrocodone drug test Lithium Cyclamate Cidofovir Buy phentermine Felbamate! Flexeril Adipex loss phentermine weight, Identify xanax Purchase xanax: Fenoldopam Laetrile Cheapest phentermine pill Xenical hgh phentermine quit smoking detox. Differin Phentermine success story Xanax gg 258 Cheapest xanax! Medication drug mylan online search phentermine diet Online pharmacy tramadol! Trovafloxacin Cefoperazone Generic viagra online Xanax prescriptions, Phentermine dosage Viagra online cheap: From generic india viagra Herbal viagra affiliate Buy cheap phentermine Gitalin Cod overnight tramadol Nalbuphine Watson soma Natural alternatives to viagra? Isoetharine Hydrocodone online! Viagra alternates Xanax 2mg generic alprazolam 180 pills: Phentermine 90 day Prozac and phentermine: Mepenzolate Uk viagra suppliers Women viagra Lexapro, Phentermine amide Buy online tramadol Cialis comparison levitra Hyperalimentation! Ethopropazine Arthrotec Lindane Can woman take cialis Xanax detox Tramadol used for Is tramadol a narcotic Haldol: Viagra no prescription Levothyroxine! Xanax withdrawal muscle joint nerve pain Blue xanax Leuprolide Spironolactone Pfizer xanax information Viagra sale Ambien and pregnancy Nolvadex Perscription phentermine Ipratropium Clofazimine Phentermine online prescriptions Fioricet line Loss phentermine story success weight Ketamine Amphetamine Phentermine prescriptions Xanax abuse: Phentermine no prescription Cheap generic viagra: Cheapest cialis generic Xanax alcohol? Phentermine sales Ibuprofen: Ciprofloxacin Asa Buy phentermine without a prescription Viagra commercials Xanax online without a prescription Tramadol active ingredient Phentermine 15 mgs Polythiazide Viagra price compare Paroxetine Tobramycin Pfizer viagra Phentermine depression Meridia side effects Caffeine Tranylcypromine Methadone and xanax Vicodin Altace Combivent Brand drug generic name viagra Canada cialis Alternative viagra Diflucan Guanethidine No perscription tramadol: Betamethasone Phentermine wholesale Avapro Hyzaar Mexican pharmacies online+no precription xanax Diazepam. Piroxicam Viagra sale online Phentermine pharmacys online Generic cialis softtabs Phentermine Clonazepam Abacavir Side effects from viagra Clomipramine Mexican pharmacies online+no precription xanax Cheapest xanax Mexican phentermine Cialis vs viagra Soma san diego: Buy phentermine on line Zithromax Primidone Phentermine cheap no prescription Bad side effects of viagra Phentermine 37.5 adipex 37.5 mg? Thioridazine Glucophage Climara Penicillamine: Amerge C.o.d. Phentermine Cheap viagra order online Generic cialis overnight Diet inexpensive phentermine pill Tramadol hcl Chemical name for viagra Generic viagra overnight, Enalapril Lisinopril with viagra Phentermine weight loss medication Free viagra without a perscription Phenytoin Phentermine buy cheap Delivery florida online pharmacy phentermine Ambien sleep aid Cimetidine Compare viagra cialis levivia Online adipex phentermine prescriptions Cheapest diet phentermine pill Misoprostol 50 hcl mg tramadol. Ciguatoxin Methimazole Buy generic ambien Tramadol 100 mg no prescription Ergotamine Buy phentermine prozac Nasonex Cheap soma online Free overnight phentermine shipping Omeprazole Xanax no prescription Phenylephrine. Viagra without prescription Phendimetrazine versus phentermine! Viagra cialis Cetirizine! Xanax weight loss Discount viagra sales? Viagra Buy xanax Phentermine 37.5 adipex 37.5 mg Viagra online shop Viagra alternative Xanax detoxification! Tramadol withdrawal symptoms Phentermine pill online discount: Herbal alternative to viagra Misoprostol Addiction recovery xanax Phentermine no prescription needed Ambien side effect Pyridium Benztropine Tramadol narcotic:

Why multiple climate model agreement is not that exciting

April 8th, 2008

There are several global climate models (GCMs) produced by many different groups. There are a half dozen from the USA, some from the UK Met Office, a well known one from Australia, and so on. GCMs are a truly global effort. These GCMs are of course referenced by the IPCC, and each version is known to the creators of the other versions.

Much is made of the fact that these various GCMs show rough agreement with each other. People have the sense that, since so many “different” GCMs agree, we should have more confidence that what they say is true. Today I will discuss why this view is false. This is not an easy subject, so we will take it slowly.

Suppose first that you and I want to predict tomorrow’s high temperature in Central Park in New York City (this example naturally works for any thing we want to predict, from stock prices to number of people who will vote for a certain USA presidential candidate). I have a weather model called MMatt. I run this model on my computer and it predicts 66 degrees F. I then give you this model so that you can run it on your computer, but you are vain and rename the model to MMe. You make the change, run the model, and announce that MMe predicts 66 degrees F.

Are we now more confident that tomorrow’s high temperature will be 66 because two different models predicted that number?

Obviously not.

The reason is that changing the name does not change the model. Simply running the model twice, or a dozen, or a hundred times, does not give us any additional evidence than if we only ran it just once. We reach the same conclusion if instead of predicting tomorrow’s high temperature, we use GCMs to predict next year’s global mean temperature: no matter how many times we run the model, or how many different places in the world we run it, we are no more confident of the final prediction than if we only ran the model once.

So Point One of why multiple GCMs agreeing is not that exciting is that if all the different GCMs are really the same model but each just has a different name, then we have not gained new information by running the models many times. And we might suspect that if somebody keeps telling us that “all the models agree” to imply there is greater certainty, he either might not understand this simple point or he has ulterior motives.

Are all the many GMCs touted by the IPCC the same except for name? No. Since they are not, then we might hope to gain much new information from examining all of them. Unfortunately, they are not, and can not be, that different either. We cannot here go into detail of each component of each model (books are written on these subjects), but we can make some broad conclusions.

The atmosphere, like the ocean, is a fluid and it flows like one. The fundamental equations of motion that govern this flow are known. They cannot differ from model to model; or to state this positively, they will be the same in each model. On paper, anyway, because those equations have to be approximated in a computer, and there is not universal agreement, nor is there a proof, of the best way to do this. So the manner each GCM implements this approximation might be different, and these differences might cause the outputs to differ (though this is not guaranteed).

The equations describing the physics of a photon of sunlight interacting with our atmosphere are also known, but these interactions happen on a scale too small to model, so the effects of sunlight must be parameterized, which is a semi-statistical semi-physical guess of how the small scale effects accumulate to the large scale used in GCMs. Parameterization schemes can differ from model to model and these differences almost certainly will cause the outputs to differ.

And so on for the other components of the models. Already, then, it begins to look like there might be a lot of different information available from the many GCMs, so we would be right to make something of the cases where these models agree. Not quite.

The groups that build the GCMs do not work independently of one another (nor should they). They read and write for the same journals, attend the same conferences, and are familiar with each other’s work. In fact, many of the components used in the different GCMs are the same, even exactly the same, in more than one model. The same person or persons may be responsible, through some line of research, for a particular parameterization used in all the models. Computer code is shared. Thus, while there are some reasons for differing output (and we haven’t covered all of them yet), there are many more reasons that the output should agree.

Results from different GCMs are thus not independent, so our enthusiasm generated because they all roughly agree should at least be tempered, until we understand how dependent the models are.

This next part is tricky, so stay with me. The models differ in more ways than just the physical representations previously noted. They also differ in strictly computational ways and through different hypotheses of how, for example, CO2 should be treated. Some models use a coarse grid point representation of the earth and others use a finer grid: the first method generally attempts to do better with the physics but sacrifices resolution, the second method attempts to provide a finer look at the world, while typically sacrificing accuracy in other parts of the model. While the positive feedback in temperature caused by increasing CO2 is the same in spirit for all models, the exact way it is implemented in each can differ.

Now, each climate model, as a result of the many approximations that must be made, has, if you like, hundreds (even thousands) of knobs that can be dialed to and fro. Each twist of the dial produces a difference in the output. Tweaking these dials, then, is a necessary part of the model building process. The models are tuned so that they, as closely as possible, first are able to produce climate that looks like the past, already observed, climate. Much time is spent tuning and tweaking the models so that they can, at least roughly, reproduce past climate. Thus, the fact that all the GCMs can roughly represent the past climate is again not as interesting as it first seemed. They better had, or nobody would seriously consider the model as a contender.

Reproducing past data is a necessary but not sufficient condition that the models can predict future data. Thus, it is also not at all clear how these tweakings affect the accuracy in predicting new data, which is data that was not used in any way to build the models, that is, future data. Predicting future data has several components.

It might be that one of the models, say GCM1 is the best of the bunch in the sense that it matches most closely future data. If this is always the case, if GCM1 is always closest (using some proper measure of skill), then it means that the other models are not as good, they are wrong in some way, and thus they should be ignored when making predictions. The fact that they come close to GCM1 should not give us more reason to believe the predictions made by GCM1. The other models are not providing new information in this case. This argument, which is admittedly subtle, also holds if a certain group of GCMs are always better than the remainder of models. Only the close group can be considered independent evidence.

Even if you don’t follow—or believe—that argument, there is also the problem of how to quantify the certainty of the GCM predictions. I often see pictures like this:
GCM predictions
Each horizontal line represents the output of a GCM, say predicting next year’s average global temperature. It is often thought that the spread of the outputs can be used to describe a probability distribution over the possible future temperatures. The probability distribution is the black curve drawn over the predictions, and neatly captures the range of possibilities. This particular picture looks to say that there is about a 90% chance that the temperature will be between 10 and 14 degrees. It is at this point that people fool themselves, probably because the uncertainty in the forecast has become prettily quantified by some sophisticated statistical routines. But the probability estimate is just plain wrong.

How do I know this? Suppose that each of the eight GCMs predicted that the temperature will be 12 degrees. Would we then say, would anybody say, that we are now 100% certain in the prediction?

Again, obviously not. Nobody would believe that if all GCMs agreed exactly (or nearly so) that we would be 100% certain of the outcome. Why? Because everybody knows that these models are not perfect.

The exact same situation was met by meteorologists when they tried this trick with weather forecasts (this is called ensemble forecasting). They found two things. The probability forecasts made by this averaging process were far too sure—the probabilities, like our black curve, were too tight and had to made much wider. Second, the averages were usually biased—meaning that the individual forecasts should all be shifted upwards or downwards by some amount.

This should also be true for GCMs, but the fact has not yet been widely recognized. The amount of certainty we have in future predictions should be less, but we also have to consider the bias. Right now, all GCMs are predicting warmer temperatures than are actually occurring. That means the GCMs are wrong, or biased, or both. The GCM forecasts should be shifted lower, and our certainty in their predictions should be decreased.

All of this implies that we should take the agreement of GCMs far less seriously than is often supposed. And if anything, the fact that the GCMs routinely over-predict is positive evidence of something: that some of the suppositions of the models are wrong.

Entry Filed under: General Statistics, Bad Statistics, Global warming

116 Comments Add your own

  • 1. lucia  |  April 8th, 2008 at 9:33 am

    Each horizontal line represents the output of a GCM, say predicting next year’s average global temperature.
    Vertical?

    The same person or persons may be responsible, through some line of research, for a particular parameterization used in all the models.
    Yes. This happens for many reasons. One is that when a model requires many parameterizations, each must be developed by specialists. One or two seem to work best in the limited circumstances in which they are tested. (The reasons testing is limited is that it costs to much to test under any and all circumstances.) These begin to be adopted, and if you are writing a full model containing many parameterizations, in most fields, it ends up being easier to publish your paper if you stick with the parameterizations that are most commonly used: this minimizes the likelyhood a reviewer will turn down your work because of the choice of parameterizations.

    If it is the great parameterization, that’s great. But if it’s not, then many models end up biased in the same way.

    In the end, the proof that the choice parameterizations resulted in predictive ability is to test against data collected after projections were made. Simpler tests are better than more complicated ones, and right now, the predictions are on the high side.

  • 2. Gavin  |  April 8th, 2008 at 10:21 am

    Your examples are mostly fine, but your conclusion does not follow. Firstly, the models *are* different - some have similar pedigrees but in no case is the same model being run under a different name - that is just a strawman argument. You are however correct in stating that they are not completely independent and so cannot be assumed ‘a priori’ to be independent draws from some underlying distribution of all possible models.

    The models are based on similar underlying assumptions (conservation of energy, momentum, radiative transfer etc.) but which are implemented independently and with different approximations. If you ask the question, what are the consequences of the underlying assumptions that are independent of the implementation, you naturally look for metrics where the models agree. Those metrics can be taken as being reflective of the underlying physics that everyone agrees on. This is clearly not sufficient to prove it ‘true’ in any sense (there maybe shared erroneous assumptions), but it clearly must be necessary.

    Think about the converse of your claim - i.e. that disagreement among models makes their results more believable. That is clearly absurd. Therefore, agreement between models should increase the believability of any result. Other reasons to think any result more credible would be clear theoretical and observational support for such an effect and a match between the various predicted amplitudes of any signal.

    You also make errors in assuming that a) ‘hundreds’ of parameters are tweaked to improve the model performances. This is nothing like the case (at most people play with maybe half a dozen); b) that there is only one metric that is useful and therefore only one model that is best. Unfortunately this is woefully simplistic. There are hundreds of interesting metrics, and no one model is the best at all of them. Instead most models are in the top 5 for some and in the bottom 5 for others. You could maybe discard 3 or 4 of them on such an analysis, but not enough to make any difference. The fact that the average of all models is a better match to the observations than any single model, implies that there is some unbiased and random component to their errors.

    Finally, you make a fundamental error in your treatment of model-data discrepancies. Any such comparison is based on multiple assumptions - that the data are valid and represent what they say they do, that the hypothesis being tested is appropriate (i.e. what is the driver of any changes in the model), that there really is a signal that can be extracted from the noise that is not part of the hypothesis and that the model is a valid representation of the real world. Thus a mismatch must perforce drive an examination of all these aspects and prior to that you cannot simply say the models are incorrect. Indeed, climate science is littered with past controversies where it turned out the data were the problem and not the models (CLIMAP, MSU etc.), or where the noise in short period data precluded a significant identification of a signal.

    In the particular case you appear to be alluding to, none of the blogosphere analyses have even looked at the real distribution of the model output on such short time periods and so their claims of dramatic mismatches (based IMO on an underestimation of the impact of intrinsic variability) are very likely to be wrong. This data is available from PCMDI and I would encourage you and others to download it and do this properly. You will find that for such short periods, the models give very varied results depending on (amongst other things) the phase of their tropical oscillations.

    You recently visited a modelling center where I’m sure any of us would have been happy to discuss the philosophy of modelling and how it really works - it’s a shame you did not avail yourself of that opportunity. You are of course welcome to return at any time.

  • 3. Bernie  |  April 8th, 2008 at 10:29 am

    I guess I don’t find the above particularly trenchant because anyone who has been around a race track or tried to predict the stock market has run into the same issues on a much simpler scale - but IMHO it is the same issue. It might be more informative to think about models of any physical process that were strongly relied upon for forecasting and were replaced by new models that were far more successful at forecasting. The question then becomes one of the nature of the changes to the model that result in dramatic improvements in their ability to forecast. I suspect, but clearly don’t know, that the differences in the the two sets of models are dramatic - capturing paradigm shifts rather than greater computational precision or nob twindling.

  • 4. Briggs  |  April 8th, 2008 at 10:56 am

    Gavin,

    Thank you very much for your comments. You make some excellent points, including, though I’m not sure you would say this, agreeing with me on most details; though certainly not in philosophy.

    We agree the models are different. I see that I did say so in the original post; the weather model example was to focus attention on the problem, and I hope we can agree that is a useful example in this sense. I say the GCMs are not very different. I think you are saying they are very different. I say they are similar enough so that they are alike in at least the sense that ensemble weather forecasts are. I believe that last example I gave is relevant, by which I mean that “averaging” the output of them and doing nothing else makes you more certain than you should be.

    I am confused where you say, in your second paragraph, “This is clearly not sufficient to prove it ‘true’ in any sense.” I am puzzled why you have the word true in quotes, for one, and I admit to not being able to follow the rest of the argument. Would you care to re-explain it?

    We disagree about the number of tweaks that can be made. I do claim hundreds. I will concede that all tweaks are not all equally important. Every numerical approximation scheme, of which there are many, consists of parameters or a choices of one routine over another somewhere in the code, and these are an example of the very small knobs that can be twisted. But that is neither here nor there, and I don’t think we really disagree a lot about this in the sense that we both believe that tweaking parameters can cause changes in the output. And that models are certainly tuned to fit past data. That is all I really want to show here.

    I do not, however, in any way, claim that converse of my original argument. That is, ” the more the models disagree, the more we should believe them.” I’m with you 100% here: that is a silly statement. I think unlike you, and I might be wrong about what you are saying, I do believe that there is one correct model, which would be the one that always predicts the observations perfectly. Of course, we do not know what this model is. But I stand by the philosophical argument I made about “GCM1” being the best therefore the rest are not independent etc.

    You’re quite right that I presented a very simplistic example of model output. The actual output of GCMs is hugely multi-dimensional, and one model can be good in one dimension and bad in another, in just the same places where a second models is bad then good. However, while this has importance, it is not a convincing argument that we should ignore the “one” output which even the IPCC routinely highlights, and is the one I used in my examples.

    Now, you say that “the fact that the average of all models is a better match to the observations than any single model, implies that there is some unbiased and random component to their errors.” Hmmm. It is false, actually, to claim that because the average does better that there is “some unbiased” component to the errors. Each model may be biased, but the average can still be better. I hope you will trust my mathematics here, but if not, we can probably work up an example. It is true that there are “random” components to the error, but only if you believe, as I do, that “random” means “unknown.” So if you look at the output, for example, you might not be able to guess why the model went wrong (in a particular place and dimension).

    Do you mind if I hold off on your claim where you (in paragraph 5) say that I “make a fundamental error in your treatment of model-data discrepancies”? This is, I do agree, a very tricky point to understand, and I think if we went over it in the comments it would get lost. I do promise to answer, however. This is our biggest point of disagreement, and represents a truly fundamental difference of philosophy. To give you a highlight: if a model predicts 7 and the observation is 8, then I say the model is wrong. I mean, in plain English, it is wrong. But I think I understand your (classical) argument, and I want a fuller discussion of it.

    I am appreciative of your generous offer to teach me how modeling really works and so forth. I might just take you up on your offer of another visit.

    Briggs

  • 5. Andrew  |  April 8th, 2008 at 12:09 pm

    This brings me to one of the major problems I have with the models. They have different input values of very basic variables, like climate sensitivity, yet they can all be made to fit the observed changes in surface temperature. How can this be? The reason is pretty obvious, actually, that the models all did so, not becuase they are all correct (which is impossible) but becuase they were ~made~ to. Every modeler knew the answer ahead of time. They use “aerosols” and “ocean delay” as highly “adjustable” fudge factors. Natural forcings are also unknown, and can be “adjusted”. The models can match history, not becuase they are good models (they aren’t) but becuase they have been ~made~ to do so. On the other hand, if you test the models with measurements other than those they were adjusted to fit, they almost invariably fail miserably, every one of them, to match what we see there. For example:
    http://icecap.us/images/uploads/DOUGLASPAPER.pdf
    If every model agrees, it probably is becuase they are all doing the same thing wrong.

  • 6. Andrew  |  April 8th, 2008 at 12:41 pm

    BTW Gavin, I find your statement that measurements are frequently wrong rather than the models disturbing. Measurements can be wrong, but the reality is that no good scientist assumes that theory is right and measurements wrong. If we go down that road, we get epicycles and aether. MSU turned out to have problems, but contrary to the constant assertions of the thugs at RC, they still haven’t been brought in line with models. David Douglass (one of the authors of the above paper) and I have the same opinion of all of this-we were taught that if your theory does not match the observations, your theory is wrong. Did we get bad educations?

  • 7. Gavin  |  April 8th, 2008 at 1:01 pm

    i) does model argeement imply ‘truth’? Truth is in quotes because neither you nor I can ever define the true state of the climate (or any observed feature within it), and so every statement is about an approximation to the real state of the world. Specifically, take the situation of stratospheric chemistry in the early 1980s. Most of the chemistry involved in ozone depletion was known and all these models agreed that the decline in strat. ozone would be smooth but slow (in the absence of CFC mitigation). They were all wrong. The decline in strat ozone in the Antarctic polar vortex was fast and dramatic. The missing piece was the presence of specific reactions on the polar stratospheric clouds that enhanced by orders of magnitude the processing of the chlorine. Thus the model agreement did demonstrate the (correct) implications of the (known) underlying chemistry, but obviously did not get the outcome right because the key reactions didn’t turn out to be included by any model. Hence agreement, while necessary, is not sufficient.

    ii) models are not tweaked to reproduce all past data. They are tweaked to fit modern climatology and some intrinsic variability (like ENSO). That tweaking does not involve hundreds of parameters (consider how long it would take to do even if someone thought it useful). Read my description of the model development process (http://pubs.giss.nasa.gov/abstracts/2006/Schmidt_etal_1.html ) and see what is actually done. Specifically the models are not tweaked to produce the ‘right’ sensitivity (even if we knew what that was). That emerges from everything else. If we really played with hundreds of parameters don’t you think we’d do a better job?

    iii) IPCC has hundreds of graphs showing different model metrics, and rightly so. Most of the important impacts are in some way tied to the global mean temperature change, and so that is used as a useful shorthand. But don’t confuse iconisation of specific graphs with a real statement about importance. Would you pick the one model that has the best annual mean temperature, or the best seasonal cycle or the best interannual variability, in Europe? in N. America? in Africa? I guarantee no one model is ‘best’ on all those metrics.

    iv) It is not a priori obvious that the mean of multiple models should outperform the best of any individual one. This remains an unexplained but interesting result. The upshot is that you can treat the model ensemble like a random sample to reduce errors. Of course all of the models are biased (as is the mean, but less so) and if I suggested anything different, I apologise.

    v) if the model predicts 7 and the observations say 8, you cannot say anything about the usefulness of the model without a) an understanding of the uncertainties in the model predictions (as a function of the underlying physics, their implementation and the hypothesised driver(s)), b) the uncertainties in the observations, and c) what a naive model would have predicted (so that you can judge whether your model had any skill). For instance, if the model predictions were for 7+/-1 and the obs were 8+/-1 and my naive model (say no change) implied 0, I’d be pretty happy with prediction. If the uncertainties were an order of magnitude smaller, then there would be a clear discrepancy, but the model would still be skillful compared to the naive prediction. Remember George Box’s admonition - all models are wrong, but some are useful.

  • 8. Gavin  |  April 8th, 2008 at 1:10 pm

    Andrew, if you want a serious discussion, don’t start by calling me a thug. The Douglass et al paper is fundamentally flawed as has been pointed out many times (maybe Matt can go through the argument for you - it relates to the uncertainty in the estimate of the mean being compared to any draw from the same distribution). The statement that sometimes the models have been right and the data wrong is factually accurate. Why then must we assume that the any mismatch is because the theory is wrong? Instead, I suggest you should maintain an open mind and examine every point at which there might be an error - that includes the models, but it also includes the data and the hypothesis. Anything else is just dumb.

  • 9. Bernie  |  April 8th, 2008 at 1:31 pm

    Andrew:
    Gavin is essentially correct. At any instance, the measurement can be accurate or inaccurate and the model can be accurate or inaccurate. The trick is figuring out which cell you are in at any particular point in time and how to get to the sweet spot, i.e., accurate data and accurate model. One of Matt’s points is that just because you appear to be in a sweet spot at one point in time does not mean that you actually are, viz., Gavin’s Ozone example.

    This is a very courteous site and I for one would like to keep it that way.

  • 10. Bob B  |  April 8th, 2008 at 2:51 pm

    The problem I have with models is that they are only “Models” of reality. The problem I have with climate models and modelers is that they refuse to declare or define testable cases that would invalidate their pet models.

  • 11. Bob B  |  April 8th, 2008 at 3:40 pm

    Gavin will you take-on this challenge from Roger Pielke?

    The test of the dynamical core fits into these evaluations and assessment of the global climate models as prediction tools. As a necessary condition, when configured to run in a multi-decadal predictive mode they should still be used to make short-term global weather predictions in order to asses their skill at simulating the development and movement of major high and low pressure systems, including tropical cyclones. Moreover, they should be run as seasonal weather predictions using inserted sea surface temperatures at the initial time in order to see if they can skillfully predict the development of El Nino and La Nina events, as well as other circulation patterns such as the North Atlantic Oscillation. If they cannot accurately predict these short term and seasonal weather patterns, they should not believed as valid and societally useful prediction tools on the regional (and even the global average) scale decades into the future.

  • 12. Wade  |  April 8th, 2008 at 4:43 pm

    I would normally not jump into this arena, but I think Bob is expecting too much out of these models…

    I like to think that GCMs and Climate similarly compare in how Newtonian Mechanics and Quantum Mechanics/Relativistic Motion fit together.

    Sure, you can model a lot of interactions that humans can see using GCMs and Newtonian, but for extreme events in either direction the model will never really be “true”.

    In the spirit of education, I would love to find info on any GCMs that “project” the next glaciation. I haven’t had time to research that many models, or maybe the MSM just doesn’t report about them, but are there any places I can find info on models that predict climate that follows the ice core data?

  • 13. Craig  |  April 8th, 2008 at 4:52 pm

    First, I would like to point out that I also fundamentally disagree with the statement “if a model predicts 7 and the observation is 8, then I say the model is wrong. I mean, in plain English, it is wrong.” I think the choice is not binary and that Gavin is correct in that as long as we understand that the model is imperfect, we can still learn useful information from it. Also, as I pointed out in the previous post, models are necessarily approximations, and judging the usefulness of an approximation is often an exercise in judgment and intuition - unfortunately these are not rigorously quantitative, but that is how science actually works. Briggs, you have promised a fuller discussion of this which I eagerly await.

    As far as the model parameter tweaking is concerned, I agree that in a literal sense there are probably hundreds of approximations and parameterizations used in a climate model (although, as previously stated, I have no direct experience in this particular field). However, the point is that many of these are not tuned at all. I would argue that fact they are not specifically tuned, and since many different codes make slightly different approximations, the fact that the models yield similar results means that these are probably adequate assumptions and not terribly pertinent to the models’ output. Obviously, this is not a rigorously true statement (the fact that someone made a good guess at a parameter does not imply that the parameter is unimportant), but it seems to me to be the only way to proceed if we want to even attempt to make such models. I suppose that’s an argument someone could make - that we have no business making making models in the first place - but I would strongly disagree.

  • 14. Bob B  |  April 8th, 2008 at 5:04 pm

    Wade, then I would submit with your logic that the models are therefore unsuitable for any policy decisions such and actions as suggested by Kyoto agreement

  • 15. Dan Hughes  |  April 8th, 2008 at 6:07 pm

    The numbers generated by GCM calculations cannot be shown to converge; where converge is used in the numerical-methods sense. As the size of the discrete temporal and spatial increments are refined the numbers for all dependent variables uniformly approach constant values at all spatial and temporal increments. That is, the solutions of the discrete approximations (or series expansions) approach the solutions of the continuous equations.

    Given this situation, the numbers generated by the GCMs are simply numbers and nothing more. Numbers that are unrelated to solutions of the continuous equations of the models. These approximate ’solutions’ to the discrete equations then become the model. The continuous equations are not the model, no matter how may times ‘conservation of mass, momentum, and energy’ is repeated. If the numbers generated by the discrete approximations cannot satisfy the continuous equations, mass, momentum, and energy of the physical system are not calculated, much less conserved. Even given that the continuous equations exactly describe the physical system. We’re all in agreement that they do not. The model continuous equations do not describe conservation principles of the physical system; they are a model of the physical system.

    The real-world-application order of the discrete approximations is not known. Given that deterioration in some performance metrics is observed as the sizes of the discrete increments are refined, I suspect the order is actually less than one. Algebraic parameterizations, especially those that are functions of the independent variables, can lead to the observed behavior.

  • 16. Gavin  |  April 8th, 2008 at 6:34 pm

    E pur funzionano…

  • 17. Dan Hughes  |  April 8th, 2008 at 7:05 pm

    Solution meta-functionals can be calculated by any number of incorrect models and methods ‘that work’. That does measure anything at all relative to Validation and Qualification for the intended applications.

    I would hope that models/codes the results from which might impact the health and safety of people all over the planet are based on science much more fundamental and sound than “E pur funzionano”

  • 18. Andrew  |  April 8th, 2008 at 8:46 pm

    Gavin, have you fully considered the implications of considering those error estimates? No? Think hard on it, then you will see the problem.

    An anonymous post at RC doesn’t count as real rebutall. I believe Christy remarked that it must have been written by someone of “significant inexperience”.

    Bernie, it is okay to consider the possibility that both are wrong, but Gavin, whether he admits it or not, certainly hasn’t and won’t consdier the possibility of the models being wrong. This has been going on for quite some time. The search for measurement errors can go on till the cows come home, but sooner or later you’ll have to recognize how improbable it is that the measurements, rather than the models are wrong.

    BTW, I’m not trying to be rude, and I too would like to maintain polite discourse.

  • 19. Andrew  |  April 8th, 2008 at 8:49 pm

    BTW calling a paper “fundamentally flawed”, um, who’s close minded? Just curious.

  • 20. Sylvain  |  April 8th, 2008 at 10:05 pm

    Gavin, I have a few questions/comments

    When you write:

    1) “There are hundreds of interesting metrics, and no one model is the best at all of them. Instead most models are in the top 5 for some and in the bottom 5 for others.”

    Do modeler/scientist know in advance which model is better at handling/forecasting each specific metrics or is it determined after modeler/scientific were able to compare them with new data?

    2) “The models are based on similar underlying assumptions (conservation of energy, momentum, radiative transfer etc.) but which are implemented independently and with different approximations.”

    If models are based on assumptions, who is to say that they are the right assumptions ?

  • 21. Mike D.  |  April 9th, 2008 at 2:41 am

    Gavin makes the excellent point, attributed to George Box, that “all models are wrong, but some are useful.” The usefulness of models fall into two broad classes: theory and prediction. Theoretical models attempt to map known physical, chemical, and biological relationships. Predictive models (sometimes called “black box”) attempt to make accurate predictions.

    There is a strong tendency to confuse or combine these utilities, and that is true is any modeling (my specialty is forest growth and yield models). Proponents of theoretical models are often adamant that their models are best (a value judgement) and insist that they be used in predictive situations. Predictive modelers, in contrast, may use crude rules of thumb that are unattractive to theoreticians, but predictive modelers emphasize that their goal is accurate prediction.

    Hence the assertion that models are wrong must also be bifurcated. Theoretical models are wrong if the theories behind them are invalid. Predictive models are wrong if they make poor predictions. It is easy (but not useful) to confuse these wrong-itudes.

    The best weather prediction models are more empirical than theoretical. They look at current conditions (fronts, pressure gradients, jet streams, etc.) as they are cadastrally arrayed across the globe, and compare those to past dates when the same or very similar arrays occurred. Then the weather outcomes of the similar past conformations are examined, and use to predict the immediate future weather. Not much theory to that, more of a data mining of the past; hence the descriptor “empirical.”

    Climate models are much more theoretical because we basically lack empirical data about past climate. Some attempts are made to use proxies, sunspot data, Milankovitch cycles, etc. but the data are sparse and time frames vary widely. In general we can predict a decline in temperatures and a return to Ice Age conditions based on fairly good evidence at a long time scales, but when and how that slide will occur is imprecise at short time scales. When theoretical GHG “forcings” are included in climate models, empiricism is almost completely absent.

    So we are in a situation where theoretical climate models are being used to make short-term predictions. Further, those predictions have generated some fairly Draconian suggested measures that are extremely distasteful, at least to many people. More taxes, less freedom, “sacrifices”, economic disruptions etc. are being recommended (imposed) based on the predictions of theoretical models. Political “solutions” to fuzzy predictions from “wrong” and improperly classed models are greatly feared, and I think properly so.

    The discourse cannot help but become impolite in this situation. Neither “side” is immune. How much better it would be if we realized that we cannot predict the climate (in the short term) and instead prepared to be adaptable to whatever happens, while preserving (enhancing) as much freedom, justice, and prosperity as we possibly can.

  • 22. Bob B  |  April 9th, 2008 at 5:07 am

    I find it funny that when confronted with the task of declaring benchmarks or testable conditions the “modelerers” just become silent. This is an unsupportable position and I predict sometime in the near future this will come to a head and they will be forced to answer.

  • 23. Briggs  |  April 9th, 2008 at 5:18 am


    Gentlemen,

    I was forced to do the work of my masters yesterday, and will likely have to do so today, so I did not and might not have the time to follow all the comments until tomorrow (except I might tackle Box’s popular but false statement today).

    However, there can be no more use of words like “thug”. If these sort of things crop up, I will delete them in the future.

    Whether or not GCMs are useful in predicting the future is a matter of fact, and we should be able to decide the question without resort to uncivil language.

    Also, calling a paper or statement “fundamentally flawed” is perfectly reasonable if that paper or statement can so be shown. There is nothing inherently closed minded or ungracious about this.

    Thanks.

    Briggs

  • 24. Dan Hughes  |  April 9th, 2008 at 6:23 am

    oops, the second sentence in the first paragraph of my comment at #17 should read:

    That does not measure anything at all relative to Validation and Qualification for the intended applications.

  • 25. Gavin  |  April 9th, 2008 at 7:08 am

    There is a wide range of knowledge among the posters on this thread and a fair few mistaken statements. I’ll try and address some of the more relevant.

    First off, weather prediction models are not empirical searches for similar patterns in the past, instead they are very similar to climate models in formulation (though usually at higher resolution). The big difference is that they are run using observed initial conditions and try to predict the exact path of the specific weather situation. Climate models are run in boundary condition mode and try and see how the envelope of all weather situations is affected. The actual calculations are very similar and depending on the configuration, a climate model can do weather forecasts and weather forecasting models can do climate projections.

    Pielke’s suggestion is interesting but not a necessary condition for climate models to be useful. There is no evidence that climate sensitivity or the climatology (for instance) is correlated to performance in weather forecasts. In any case, these tests are being done. The Hadley Centre for instance uses the same model for both weather forecasts and climate.

    However, statements that climate model projections include ‘less freedom’ among their outputs are just ridiculous. There is no ‘politics’ subroutine in these models and I have no idea how the freedom-CO2 feedback would be quantified let alone coded. Confusing a scientific situation (i.e that increasing GHGs lead to warming) with the political decision about what to do about that information is extremely unhelpful. Model outputs do not determine political decisions, politicians do, and if you have a problem with them (as I’m sure we all do) take it to them. Implying that climate models are wrong because some politicians use them to justify political decisions you don’t like is fundamentally unscientific. Radiative transfer does not care who you vote for.

    Back to actual modelling though, Dan’s point, which he makes all the time, is that because models are approximations to the real world and he can’t check every line, they can’t possibly be useful. My slightly tongue-in-cheek reply was a shorthand for explaining (again) that model outputs have been tested on hundreds of cases of ‘out of sample’ cases (paleo climate LGM, 8.2kyr, mid-Holocene, responses to volcanoes, ENSO, 400+ papers from the PCMDI archive) and found to perform well (if not perfectly) in many of them. If the models were so impossible to make, why do they do so well? That isn’t to say they couldn’t be made better, or clearer, of course they can - but declaring that until they are perfect, they are useless, is logical leap too far.

    Finally, I’m impressed that Matt thinks that Box’s aphorism is false. It’s seems self-evidently true - models are models of reality, reality is more complicated than we can ever model, therefore all models will fail to match the real world in some detail and therefore all models must, perforce, be ‘wrong’. That some are useful is shown very clearly by weather forecasting models. QED. What is relevant from this is that focusing on binary issues like right/wrong are not as worthwhile as the less rhetorically satisfying (but more constructive) quantifications of the degree of usefulness. But I look forward to the contrary argument.

  • 26. Bob B  |  April 9th, 2008 at 8:03 am

    Gavin you still have not addressed my posts about suitable test cases. If Roger Pielke’s test case is not acceptable then you propose one. In the end if climate models cannot be validated then they are not worth more then a bucket of warm spit

  • 27. Gavin  |  April 9th, 2008 at 8:38 am

    Read it again Bob. Evaluation of models is going on all the time on ‘out of sample’ data. Where you may be having a problem is in realising that scientific predictions from a model are not limited to what has to happen in the future. They can predict consequences to changes in the past that we might not know about yet, or they can make predictions for things that might be seen in future analyses of current data. But even for future projections, the models have been shown to work pretty well - Hansen’s 1988 runs are a great example. As are the predictions (made before the fact) for the magnitude of the Pinatubo cooling (Hansen et al 1992). These evaluations will be ongoing of course, but you appear to think that we are starting from scratch here. We aren’t.

  • 28. jmrSudbury  |  April 9th, 2008 at 9:44 am

    The main problem I see is that usefulness is tightly related to correctness. The difficulty is with the long term. The farther you look into the future, the more accurate the model must be for it to provide useful data; otherwise, you will get a propagation of errors. When dealing with tenths of a degree Celsius, being out by 10% over 5 years is not so bad, but over 50 or 100 years it becomes a big problem. The absolute amount of the difference between reality and the forecasted value only grows over time.

    Another key difference is to whom are the models useful. Both scientists and politicians can use the models, but they use them differently. How they get used also determines their usefulness to each group.

    To a scientist, the models can be constantly tweaked and modified as new observations arise. As Gavin suggested, they can also be used to indicate other areas where data may be captured through observation. This can aid in advancing our understanding of the world around us. A scientist can hypothesize that increasing CO2 levels will cause the globe to warm. That seems to fit the data for a recent 20 year period, but not during all times. Of course, it could be that the evidence of the past is not as accurate as required, so more observations are continuously made in an effort to prove the theory. If the hypothesis is wrong, up to several thousand scientists will be affected. The net result is that scientists gain a better understanding on the relationship between CO2 levels and climate, or stock market prices and other influences during a recession.

    Politically though, other factors come into the equation. Outside of funding the science, thus when it comes to making policies, cause and effect become more important. Politicians must balance the uncertainty of the models with other models like economic and environmental. We have already found that solutions to model predicted scenarios cause other hardships like rising food prices. The effects of the predictions themselves have to be realized and evaluated before solutions are determined. Politically, there is much more at stake. If the hypothesis is wrong, billions of people will be affected. The models are just predicting a small piece of the puzzle. Political solutions need much more than that. Politically, the usefulness of the models are overrated. They would have to be much more accurate to be useful to a politician than to a scientist.

    John M Reynolds

  • 29. Dan Hughes  |  April 9th, 2008 at 10:06 am

    A little clarification.

    Blog posts and comments don’t have the advantage of internal peer review and review by ‘upper management’ before thoughts are exposed to the public. Those processes are very useful for filtering out use of lax and somewhat imprecise or inappropriate use of language. I have very likely used imprecise language in the heat of Blog discussions. As an aside I would sometimes insert into my reports and papers words and phrases that I knew would get the attention of ‘upper management’ so as to help them focus on something and get them distracted from my main points.

    It seems to me, after a few years of attempting to discuss real issues with the GCM software, that diversion into unstated areas is a tactic that is frequently employed. Gavin simply threw out a phrase that diverts the discussions from useful practices that are SOP for all other software. And these practices are especially applied to software that might affect public policy decisions. Other diversions are ‘it can’t be done for our software’ and ‘it costs too much’. When many know that it is done on a daily basis and the costs must be weighed against the consequences of inappropriate applications of black-box software by unqualified users.

    Gavin says: ” … but declaring that until they are perfect, they are useless, … ”

    I am aware that perfect is basically unattainable for the class of software that is under consideration. And I am also certain that it is very unlikely that I have ever used that word in this context. I have not insisted on perfection.

    I have probably used worthless, and maybe useless, but I hope that I have used those within the context of my major objections. The main objection that I have is the use of research-grade software to set policies that will affect the health and safety of the public. So far as I know this has never, and I know never is a very long time, before been done. In the case of GCMs, I view them as being based on a process-model approach to a very difficult problem resulting in research-grade software that can in fact be useful as a research tool. I frequently also object to the over-blown mother-hood and apple-pie use of ‘conservation of mass, energy, and momentum’; especially the mass and energy parts. Let’s face it, the fundamental un-approximated forms of the complete continuous equations have yet to be coded and the modified and coded equations have yet to be actually solved.

    I do state explicitly that software that has not had Independent Verification and Validation procedures and processes applied is worthless relative to being used to set such policies. I also require that such software be maintained and released for production applications under approved and audited Software Quality Assurance procedures. There are other equally important aspects of production-grade software that I would also require before release of the code for production use. Qualification of the users, for example, when doing applications that might affect public policy. A more general and detailed discussion is available here. I have specific citations to peer-reviewed publications scattered around several discussion sites and can supply these to anyone interested.

    Inter-comparisons of numbers calculated by different models/codes is the least acceptable method of demonstrating ‘correctness’. The peer-reviewed literature on engineering and scientific software Verification and Validation explicitly discusses the many issues associated with this approach. Its application can provide guidance, but even then only under very limiting conditions. It is considered to be one of the seven deadly sins of software verification and validation. To propose use of this method as a best defense doesn’t buy any points in the software Verification and Validation community.

  • 30. Dan Hughes  |  April 9th, 2008 at 10:15 am

    Additional, more explicit, clarification.

    I just noticed that in his introduction Gavin says, ” … and a fair few mistaken statements.” And then when discussing my comment says, ” … - but declaring that until they are perfect, they are useless, is logical leap too far.”

    I did not make that statement. Gavin did.

  • 31. Roger Pielke, Jr.  |  April 9th, 2008 at 10:48 am

    An interesting discussion. I too am looking forward to Matt’s discussion of model correctness versus usefulness. Here is a few cents of input on the subject.

    To say that a model is “correct” is to say that it offers up skillful predictions, where “skillful” means improvement over a naive baseline. The definitions of “improvement” (in the context of uncertainties and ignorance) and “naive baseline” (in the context of trends, and well-known relationships) are, as we have seen, contested.

    To say that a model is “useful” opens up a entirely different set of complications. One definition is that the predictions from the model shape the treatment of alternative possible courses of action before a decision maker. This could include enlarging the set of possible options or reducing that set.

    But in a wide range of contexts there is no necessary relationship between correctness and usefulness, which might be a surprising claim to some. Consider that I may come up with an astrologically-based model (i.e., grounded in myth) that will turn out to accurately predict the winner of the US presidential election this year, and based on that I model I bet the farm on the outcome. Surely I will have judged that model “useful” as I count my winnings. Of course, an astrologically-based model will be far less useful in a long game of poker. But that this the point. The decision context matters a great deal.

    We discuss much of these complexities in the following book chapters, for those interested in a bit more detail:

    Pielke, Jr., R. A., 2000: Policy Responses to the 1997/1998 El Niño: Implications for Forecast Value and the Future of Climate Services. Chapter 7 in S. Changnon (ed.), The 1997/1998 El Niño in the United States. Oxford University Press: New York. 172-196.
    http://sciencepolicy.colorado.edu/admin/publication_files/2000.08.pdf

    Pielke, Jr., R.A., 2003: The role of models in prediction for decision, Chapter 7, pp. 113-137 in C. Canham and W. Lauenroth (eds.), Understanding Ecosystems: The Role of Quantitative Models in Observations, Synthesis, and Prediction, Princeton University Press, Princeton, N.J.
    http://sciencepolicy.colorado.edu/admin/publication_files/2001.12.pdf

    Pielke Jr., R. A., D. Sarewitz and R. Byerly Jr., 2000: Decision Making and the Future of Nature: Understanding and Using Predictions. Chapter 18 in Sarewitz, D., R. A. Pielke Jr., and R. Byerly Jr., (eds.), Prediction: Science Decision Making and the Future of Nature. Island press: Washington, DC.
    http://sciencepolicy.colorado.edu/admin/publication_files/resource-73-2000.06.pdf

  • 32. Mike D.  |  April 9th, 2008 at 12:24 pm

    The Fleet Numerical Meteorology and Oceanography Center (or FNMOC), known prior to 1995 as the Fleet Numerical Oceanography Center (FNOC), is a meteorological and oceanographic center located in Monterey, California. A United States Navy facility, it prepares worldwide weather and oceanographic forecasts every six hours, which are made available to the public by the National Oceanic and Atmospheric Administration. Meteorological observations use an EMPIRICAL atmospheric data base which is queried for every weather prediction. Current methodologies have evolved from the global, primitive-equation model (GPEM) which used a staggered, spherical, sigma-coordinate system with real input data interpolated to the sigma surfaces to a constant feedback loop system using REAL DATA crunched in state-of-the-art silicon graphics super computers, enabling even higher-resolution meteorological and oceanographic products that are the BEST weather predictions in the world.

  • 33. steven mosher  |  April 9th, 2008 at 2:34 pm

    re 27. Gavin. I still cant get to the CIMP data. Any word on when they will open it up more?

    One thing would be really easy. ModelE output of GSMT from
    1850 to 2000. That’s simple enough a vector of 150 numbers. just start with the simple stuff, when CIMP opens up a bit more then guys can amuse themselves with that and I’ll stop bugging you about it. OH, Lief would like an answer to the Question he asked you about excentricy, he asked you over at Tamino

  • 34. Dave Andrews  |  April 9th, 2008 at 2:55 pm

    It is with trepidation that I enter this discussion.

    Am I right that Gavin is a stalwart of RealClimate?

    If so there is an interesting roundtable at The Bulletin Online,
    http://www.thebulletin.org/roundtable/uncertainty-in-climate-modelling/
    which Gavin kicks off.

    He mentionsthat there are 20 or so climate groups around the world developing climate models and that each group makes different assumptions about the physics to include and the parameterizations. He then goes on to say of the models -

    “Thus while they are different, they are not independent in any strict statistical sense”

    Is’nt this what William was saying in his blog?

  • 35. Dan Hughes  |  April 9th, 2008 at 3:25 pm

    re: #34

    The following needs to be stuck onto the end of that URL:

    20080310.html

  • 36. Bob B  |  April 9th, 2008 at 3:30 pm

    I believe the 1988 Hansen runs were discussed on CA and shown not to be valid on a strict sense.

  • 37. Mike D.  |  April 9th, 2008 at 3:44 pm

    I do not wish to seem hard-nosed about it, but the statement that “a climate model can do weather forecasts and weather forecasting models can do climate projections” is simply factually incorrect. If this discussion is about models, we should be clear about what models we are talking about.

  • 38. jae  |  April 9th, 2008 at 4:32 pm

    My 2c, for grins: IMHO, the GCMs are just big overfitted models. Who was the famous statistician that said he could fit an elephant with 3 or 4 parameters. If you have “about” 6 tuneable parameters (as Gavin says GCMs do), I suppose you could then make the elephant dance and blow bubbles. I also note that there is a lot of disagreement about even the SIGN of the “feedbacks” that are incorporated into these models. They may ALL be using the wrong methods to get the “right” hindcasting. If so, they can’t be trusted for future predicitons.

  • 39. Bob B  |  April 9th, 2008 at 5:27 pm

    None of the GHG forcings in Hansen’s scenarios A,B or C are valid therefore I consider it not to be a proper test case.

  • 40. Bernie  |  April 9th, 2008 at 7:42 pm

    Personally I would have preferred to see Gavin continue his discussion with Matt. Alas the explicit and implicit animosity towards Gavin makes it unlikely he will think that it worth the trouble. It is very unfortunate.

  • 41. Gavin  |  April 9th, 2008 at 9:55 pm

    Steve, GISS ModelE data are all available independent of PCMDI at http://data.giss.nasa.gov/modelE/transient/climsim.html (click on all forcings/Lat-Time and the global mean is in the second figure).

    But PCMDI should only require that you accept the terms of agreement - email me their response if that is not the case and I’ll make inquiries.

  • 42. Poptech  |  April 10th, 2008 at 12:13 am

    1. Why is 100% of the model(s) code used by the IPCC not available in the public domain?

    2. Has any of the model(s) code or data sets been updated since the the last IPCC report? - If so it invalidates all previous model runs for that model

    3. Can the computer models fill in the blanks for what is not known in the real world? - If not then they are irrelevant for prediction unless everything in them is 100% understood and correct.

    There is no “close enough” with computer results. Computer results can only be right or wrong. And all the models are wrong.

  • 43. Dave Andrews  |  April 10th, 2008 at 5:03 am

    Dan’

    re# 35 Thanks. For some reason that bit doesn’t show up in my browser

  • 44. Briggs  |  April 10th, 2008 at 6:18 am

    All,

    Again I apologize for my delay (well, I suppose it only seems like a delay, give the speed at which things are expected these days; sometimes this speed causes us, me at least, to say things we later regret). I am very happy with the comments; many useful points are being made.

    I will hold off on my discussion of “what is a model and what makes a good one” until later, because the I point I wish to make is philosophical, quite general, applies to models other than just GCMs, and I think it would be distracting here. For now, I will concede that “GCMs can be useful” and I’ll leave ‘useful’ vague. I am here only interested in the predictive ability of GCMs, though I of course agree that these models are useful in their explanatory power (again, leaving ‘useful’ vague). I suppose we can say, with Gavin, “E pur funzionano…tranne quando non” and leave it at that.

    Here I do want to make one major point that is getting a little buried. First some details.

    1) I am willing to weaken my argument and say that the number of knobs is small, even just one if you like (in general, the more knobs, the more the models might offer independent evidence). But this (aggregate) ‘knob’ is still tuned so that the model fits past data. Gavin is correct to point out that ‘past data’ does not mean ‘all past data’, which I accept. Still, the models are tuned to fit some past data, which, as I said before, is a necessary but not sufficient condition to ‘usefully’ predict future data. (By ‘past data’ I mean ‘data not in the future’, so this includes today’s known outcomes.)

    2) I think we all agree that different models can do better predicting outcomes at various dimensions, and so might offer less dependent evidence at some of those dimensions. One of these dimensions might be, say, the height of a certain pressure surface at a given latitude and longitude. This dimension may be rich with interest and provide many deep insights to climatologists. However, it has no direct interest to those who are making decisions based on GCM output. Those dimensions of interest are small in number, and though my main point is in no way dependent on this, this is my reason for choosing the one dimension which is in everybody’s mind, global mean temperature (GMT).

    3) I hope we can all agree that if the GCMs cannot usefully predict future data, then there is something, who knows what, wrong with them (this statement is of course still dependent on what ‘useful’ means). For a crude and unrealistic example: if all GCMs predict a GMT greater than 20 for next year but the observation is 15, then something has gone wrong, and thus the raw GCM output should not have been used to make decisions.

    4) To say we can never “define the true state of the climate” is probably true because of the enormous multidimensionality of the problem, but to say we can never define “any observed feature within it” is false, even obviously false. If we cannot identify any feature, then nobody would ever attempt to write a GCM, because how could you ever tell if the thing worked? The model has to have some concordance, however you want to measure that, with actual climatological features. I do not think anybody claims that the measurement error in observations is so large that we can never use the observations to corroborate GMC output. Still, measurement error must be accounted for, and I am very happy to have this better known: any increase in measurement error should increase our uncertainty that we are making useful statements about the climate. But for the purposes of this discussion, let us assume that measurement error is negligible for the dimensions of interest, specifically GMT (if it is not, then my point is still valid, but what follows becomes more complicated).

    5) The cartoon graph I drew above is important, because it somewhat mimics the actual state of affairs with respect to our dimension of interest. We want to find some way of taking the different outputs and combine them to make a probability forecast for the future observable. We cannot just take the raw average, because that is still just a point estimate, nor, as it has been correctly argued, is there any a priori reason to assume the average is the best method of combining. We cannot also just use the raw output and form a probability forecast, in the manner that produced the black curve. I’ll repeat: suppose all the GCMs gave the same forecast for next year. Would we be 100% sure that the actual temperature will equal that forecast? Obviously not. The width of that black line needs to be widened to match the actual uncertainty. The analogy with weather forecasts is apt, because the techniques to treat output like our dimension of interest would be the same. What happens is that each model is corrected separately for bias (which may differ from model to model, or even not exist in some models), and then these corrected versions are “added” together in such a fashion that the width of the probability distribution of the combination is wider than it would be using just the raw output. I do not claim that the methods currently used in weather models are the best statistical models, merely that they have shown some utility. These methods, or whatever new ones arise, are just what are needed to quantify the amount of independent information each GCM offers. (The comments I made about “the most useful model or groups of models” are still true, but I can see that these are a distraction here: these arguments, too, are very general and apply to models of any kind, but you needn’t take my word for it here.)

    6) Which brings me to the main point, which I still hold is true: we are too certain of the forecasts made by GCMs, either singly or in aggregate. I have seen little or no evidence that they have skill (see Pielke’s definition above). And GCMs are almost certainly not calibrated (I cannot prove this, but I have not seen much in the literature that this well known criterion has been routinely considered for GCMs). For example, the (combined) output might claim that there is 90% certainty that the future GMT will be between 14.5 and 15.5 degrees. It is my guess that statements like these will be found to be true only around 40% of the time, which, to avoid any math, is another way of saying the forecasts are overconfident. In order that any forecast (GCM or not) to be useful in making decisions, it has to accurately quantify the uncertainty of the future observable. If it does not, then decisions made using this forecast will be sub-optimal at the least, or just plain wrong at the worst. And given the, let us say, enthusiasm to make decisions as quickly as possible re: global warming, my concern is that we are in danger of making many sub-optimal or wrong decisions.

    I do promise to answer the questions of “model goodness” soon (’soon’ as in ‘real life soon’ not ‘blogosphere soon’).

    Briggs

  • 45. steven mosher  |  April 10th, 2008 at 8:28 am

    thx gavin

  • 46. jmrSudbury  |  April 10th, 2008 at 9:04 am

    If we are too certain of the forecasts made by GCMs such that that we are in danger of making many sub-optimal or wrong decisions, then we are back to the usefulness topic. Trying to avoid that area, I would like to add some thoughts as to the accuracy, how it is measured and the effects of the base assumptions. This is the best way I can figure to argue the usefulness without having to define the word useful.

    Not enough past years are being matched accurately enough. We may not need to go back millennia, but we should be able to approximate the temperatures going back to the late 1800s. I say approximate since many of the temperature data are poor, especially from the oceans, due to difficulties in collecting them. Assuming that the data are close enough, then how far into the future can we then predict. The longer back we match the data in the past, I would like think that we can go a similar distance predicting into the future, but I fear that the propagation of error would be too great beyond the 20 year mark. We will end up with forecasts that are as accurate as the data collecting was 100 years ago. In other words, it will be easy to be off by half a degree for large areas of the globe.

    Models have to be proven to be accurate. That takes time. So far the GCM have been shown to be inaccurate, so they are modified with each iteration of newly acquired yearly average data in an effort to match the new data. Each IPCC report has shown large differences in their forecasts/what if scenarios due to these modifications to the models. We can evaluate which models have been most accurate. Have none been accurate at all over the past 20 years or have they all required significant modifications? Then again, does it automatically follow that the models that were the most accurate for the past 5 years will continue to be when predicting such a volatile system such as climate? Sometimes the small modifications to the models are not enough. That they have not taken into account the PDO and AMO would mean that they could be inaccurate for a decade or so. This would require us to trust the models even though they are not currently accurate. Sometimes something new is found or created that invalidates large chunks of the models.

    There are the assumptions upon which the models are based. In all models things change. Life happens. As people adapt to the systems that are put in place, by passing new laws for example, the assumptions of the model have to change. The forest yield models will change if a new predator like the pine beetle moves in. Another example is if laws prevent logging in a nearby forest such that dead material builds up and allows a massive forest fire to spread, much farther than if the law was not put in place, that wipes out the area that was to be logged. The assumptions as to the best logging and forest fire techniques will have to change to match the new conditions. The climatology branch of science is so young. The initial guesses are being investigated and found lacking. The new data from the Aqua satellite is going to have a profound impact on the models.

    I submit that the climatology field is just too young to be relied upon to make political decisions. The models can’t be accurate enough until we learn more from all sides of the debate.

    John M Reynolds

  • 47. Frank K.  |  April 10th, 2008 at 9:37 am

    Dan Hughes said…

    “I do state explicitly that software that has not had Independent Verification and Validation procedures and processes applied is worthless relative to being used to set such policies. I also require that such software be maintained and released for production applications under approved and audited Software Quality Assurance procedures. There are other equally important aspects of production-grade software that I would also require before release of the code for production use. Qualification of the users, for example, when doing applications that might affect public policy. A more general and detailed discussion is available here. I have specific citations to peer-reviewed publications scattered around several discussion sites and can supply these to anyone interested.”

    Dan, of course, has hit the nail on the head here. The problem I have with codes like the GISS model E is that many of these codes (in particular model E) are poorly documented. If you go the GISS website, there are no documents which tell you basic things like what differential equations are being solved, what boundary conditions, how they are discretized, what numerical procedures/algorithms are being employed, stability and error analyses etc. And if you look at many of the FORTRAN subroutines in the listings provided, they are very poorly commented. This is entirely ** unacceptable ** for a code which is being used to shape public policy decisions, as there is no way anyone can do an independent verification of the procedures and algorithms embodied in the software.

    Until the GISS and others get serious about documenting model E and similar GCMs and submitting these codes to independent verification and validation procedures, I will find it it hard accept the results they produce very seriously…

    PS
    For an example of good documentation, go here:

    http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

  • 48. Paolo M.  |  April 10th, 2008 at 10:22 am

    Contrary to our wishes, climate is not a boundary condition matter. It is also, at lest for the time range we are dealing with, an initial con dition affair.
    Initial assumptions of current climate models are, therefore, wrong.

  • 49. Why multiple climate mode&hellip  |  April 10th, 2008 at 11:21 am

    […] http://wmbriggs.com/blog/2008/04/08/why-multiple-climate-model-agreement-is-not-that-exciting/ […]

  • 50. Mike D.  |  April 10th, 2008 at 3:25 pm

    Wow. The most recent above remarks by Briggs and Reynolds are so well-stated there is nothing to add. Thank you, gentlemen.

  • 51. Bernie  |  April 10th, 2008 at 5:18 pm

    The above criticisms of the models are far too loose IMHO. If I had spent 10 years developing a model I would see these comments as largely gratuitous with a lot of hand-waving.
    John Reynolds indicates the models are not accurate without quantifying how inaccurate they are or at a minimum providing a citation. That most models have not precisely predicted the recent global temperature trend may be true but what is the level of accuracy/inaccuracy and how is it to be measured? I don’t disagree with the overall sense that the climate models are imprecise - but my guess is just that - an intuition. I was kind of hoping for a more explicit exposition on how one evaluates the “usefulness”, “accuracy”, “validity” of these models.
    This is surely an implicit promise once Matt’s initial epistemological point is acknowledged - without knowing how to evaluate “accuracy” we have no way forward. Dr. Pielke’s challenge is one approach though Gavin seems to have is doubts. Can someone put us back on track?

  • 52. lucia  |  April 10th, 2008 at 10:52 pm

    I was kind of hoping for a more explicit exposition on how one evaluates the “usefulness”, “accuracy”, “validity” of these models.

    Bernie: How one evaluates all these depends on what you what a tool to do.

    With regard to climate science (or any anyfield) models are tools used to do something. But everyone as different goals.

    One of the difficulties we see is that the IPCC projections, and the way the projections are disseminated would suggest that IPCC document authors are suggesting their models and methods can be used to predict a number of features of great interest to voters and policy makers. (And yes, I use the word predict. Because dictionaries recognize these as synonyms.)

    One of the major features highlighted in IPCC document projections is GMST (global mean surface temperature).

    Matt seems to be discussing utility, validity and accuracy in predicting the metrics the IPCC actually discussed in detail in the published documents.

    Gavin seems to be discussing the general sorts of verifications undertaken by modelers to estimate the utility and validity for other features. He is also discussing features associated with tracking done where things went wrong. If you are trying to improve an AOGCM, it is important to know whether the mismatch between what happened on the real earth and what happened in the model earth is due to applying incorrect forcings over time, or to a problem with some sort of parameterization in the model, or other features.

    But it’s not entirely clear that’s as important when simply observing that, the final result of the modeling process overall is not skillful. That process begins with estimating the forcing and ends with graphs and tables predicting (or projecting) temperatures. Either one can come up with useful projections or one cannot. Either one can come up with meaningful uncertainty intervals or not.

  • 53. Briggs  |  April 11th, 2008 at 5:57 am

    All,

    Lucia, your summary is spot on. John Reynolds’s comments are also pertinent to verification questions, and well stated. The technical comments about code accuracy and efficiency by Dan Hughes and Frank K. give an idea why verification can fail for GCMs (’fail’ in the sense of the models not being ‘useful’).

    Bernie, I do apologize for the lack of precision. You might have been looking for a statements like “The CSIRO GCM, with respect to GMT, is miscalibrated at the 90% level at 38.8%” and so forth. At any rate, something meatier about what exactly is wrong with each or any of the models. There is some of this sort of thing in the literature, but, actually, there is shockingly little.

    Although this post is about multiple models, suppose there is just one model: pick a model from Hansen’s GISS group, so you know you are getting one of the better ones (I am not being sarcastic). Look at the forecast from that model for next year’s GMT. It will be X degrees. Do you, or does anybody, have 100% confidence that the temperature will be exactly X degrees? If not, then you have my point: simply stating “X degrees” is misleading because it does not give any indication of uncertainty, and therefore of usefulness. Of course, nobody ever does believe “X degrees” but many act as if the uncertainty that does exist is trivial.

    Ok, what is ‘useful’? If you want to see some (very) technical material on this, click over to my resume page and look for any of my papers on skill, verification, scoring, measurement error, or ROC curves. And go to my friend Tilmann Gneiting’s page and look at almost any of his recent papers. None of his or my papers is easy going, they are all exceedingly mathematical, and make no attempt to explain things to a general audience. Still, what ‘useful’ means is described in great detail. I will be writing a summary for all this material, and going further, too, in talking about what a model is: this is where I will defend my claim that George Box’s “all models…” statement is false. I have a bit of that in my paper on Broccoli and Splenetic Fever, if you want a head start.

    Briggs

  • 54. Bernie  |  April 11th, 2008 at 7:20 am

    Lucia, Matt:
    Many thanks. I think you put us back on track.

  • 55. Gavin  |  April 11th, 2008 at 7:25 am

    Both Briggs and Lucia have declared that the GCMs are not skillful and that they routinely overpredict changes. GIven the neither have done any analysis of what models actually show, I’m a little puzzled as to where this certainty is coming from. In fact, they are not correct - the models are skillful compared to a naive assumption of no change and they are useful for certain metrics.

    Possibly they are being led astray by the thought experiment examples they have brought up. First off, the climate can be thought of as have a component that is changing due to an external forcing a(F) - where F is forcing, and a is the (uncertain) function that calculates the change that would occur because of F.
    But there is another component - e - the internal variability - which is chaotic and depends on exactly what the weather is doing. The atmospheric component of ‘e’ is only predictable over a few days, while for the ocean part, there might be some predictability for a few months to a few years (depending on where you are).

    So for any climate metric (whether it’s GM SAT, or the hear content of the oceans, or the temperature of the lower stratosphere), it’s evolution is:

    C(t) = a(F(t), F(t-1) …) + e(t)

    (i.e. the temperature depends on the history of the forcing and a stochastic component - which itself depends on the past trajectory). Climate models (as they were used in AR4) only claim skill in the first, forced, part of the equation. Given that ‘e’ is not zero, the usefulness of the model for any one metric depends on the relative magnitude of ‘a’ and ‘e’. If the forced part is 10 times the size of the stochastic part, then it would be useful to have a good estimate of ‘a’. If it was the other way around, then your estimate of ‘a’ may well be very good, but it wouldn’t be useful.

    In practice, climate models cannot estimate ‘a’ without having some realisation of ‘e’ as well. Since these models do not assimilate real world data as part of their simulations, the model ‘e’ will be completely uncorrelated to the real wor