George Box said, “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”
This is usually misremembered as the equivalent, and pithier, “All models are wrong, but some are useful.”
Both versions are false. This is because the premise “All models are wrong” is false: all models are not false. Here is why.
First, readers should keep in mind that what follows is a philosophical argument. My burden is solely to show that there exists at least one model that is true. Whether the model I show is useful or interesting to you or not is irrelevant.
In math, if somebody starts a theorem with the statement, “For all x…” and later comes along a fellow who shows an x that, when plugged into the theorem, ends up in a different place than predicted by the theorem’s conclusion, then that theorem has been invalidated. In simpler words, it is false.
It might be that there is only once such x that invalidates the theorem, and that for all other x the theorem holds. But you cannot say, “The theorem is practically true.” Saying a theorem—or argument—is practically true is like saying, “She is practically a virgin.”
But all that is just boilerplate, because it’s going to turn out that many models are true and useful.
Suppose Model A states, “X will occur with a probability that is greater than 0 or less than 1.” And let Model B state that “X will occur”, which of course is equivalent to “X will occur with probability 1” (I’m using “probability 1” in its plain-English, and not measure-theoretic, sense).
Now, if X does not occur, Model B has been proved false. The popular way to say it is that Model B has been falsified. If X does occur, Model B has been proved true. It has been truified, if you like.
How about Model A? No matter if X occurs or not, Model A has not been falsified or truified. It is impossible for Model A to be falsified or truified.
It is a logical fact that if you cannot prove something true, you also cannot prove it false. We cannot prove Model A is true, therefore we also cannot prove it is false. Box’s premise assumes we have proof that all models are false. But here is a model which is not known to be true or known to be false.
Therefore, Box was wrong. We do not know that all models are false.
Many will object that I only allowed X to have one chance to occur. And the more training you have had in science, the temptation to ask for more chances irresistibly grows stronger: frequentist statisticians will insist on there being infinite chances for X, which requires that X be embedded in some uniquely definable sequence.
Resist this urge! There are many unique events which we can model and predict. Example: Hillary Clinton wins the 2012 presidential election. There is only one Hillary Clinton and she only has one chance to win the 2012 election (if you don’t like that, I challenge you to embed this event into a uniquely defined infinite sequence, along with proof that your sequence is the correct one).
For fun, let’s grant X more chances to occur. First know that all probability models fit the schema of Model A. (Note: there is no difference between probability and statistics models; the apparent division is artificial; however, you do not need to believe that to follow my argument. There is also no difference between probability and mathematical or physical models: models are models; you don’t have to believe this yet either.)
But it doesn’t matter how many times a probability model makes a prediction for X: it can never be proved true or be proved false, except in one case. Excepting this case, the more numbers of chances for X doesn’t change the conclusion that Box was wrong.
The exception is that a probability model—or any other kind of model—can be proved false if you can prove the premises of that models are false. Sometimes this is easily done. For example (see the post from two days ago), we know that grade point can only live on 0 to 4, but we use a normal distribution (premise) to model it. We know, given the properties of normal distributions, that it is false. Yet the model might still be a useful approximation (what happens here is that we internally change the model to an approximate version, which can be true).
And some probability models have true premises. Casinos operate under this belief. So does every statistician who has ever done a simulation: what is that but a true model? Physicists are ever after true models. And so on.
If Box had said, “Most models are wrong, but some are useful,” or, “All the models that I know of are wrong, but some are useful” then what he said would have been uncontroversial. I suggest that it is these modifications that most people actually hear when presented with Box’s “theorem.” Since it’s mostly true—to statisticians playing with normal distributions, anyway—they feel it’s OK to say completely true. However, they never would accept that lack of rigor in any of their own work.
That’s all we can do in 800 words, folks. We’ll surely come back to this topic.
Update The model for a unique-physical-measurable event has been objected to on the grounds of simplicity. These kinds of models do not have to be accepted for the main argument to be true (though I believe they are sufficient).
First recall that simulations are true models all. To add the premise “and this simulation is meant to represent a certain physical system” might make the model false. This has been discussed in Bernardo and Smith in depth (open versus closed universes, etc.), and I ask the reader to rely on those gentlemen for particulars.
However, even these extensions are not needed. Consider this: to say “all” models are false implies that a rigorous proof of this exists. So if I have a Probability Model A (for, let us suppose, a non-unique, physical, measurable, observable event), you cannot, in finite time prove that this model is false. No observation can do so, nor can any collection of observations.
The best you might do is to say “Model A is probably false.” But to say “probably” means to admit that a proof of its falsity does not exist. And again, to say “practically false” is equivalent to say “practically a virgin.”
So again, not all models are false. And all probability models are not—or, that is, they cannot be proven so.
Neither can, it might interest the reader to learn, climate models. They cannot be proven false, even if their predictions do not match observations. This is why we ask for a model to demonstrate skill (see the prior post).
An old colleague of mine use to say “A model is a theory you don’t believe in.”
Box would have been better off if he said that ‘all models that ATTEMPT TO PROVIDE INFORMATION are wrong.’ There are models that ‘are not informative’ that can be right; Model A is an example. It’s when a model tries to add information to the universe that they mess up.
Always fun and stimulating.
“It is a logical fact that if you cannot prove something true, you also cannot prove it false.”
Is that a typo? It’s clearly not true. Any provably false statement refutes it.
Hi William,
No — error rules.
The scientific method, with its characteristic Bayesian-based belief in its models/theories, works perfectly well without assuming the law of the excluded middle. And those types of models are the ones of interest here.
All scientific knowledge is based on experiment. Since we, for practical reasons, can’t experimentally demonstrate that ALL models are false. Such a statement is non-scientific. Put one way or the other. That is, we don’t scientifically know that there is at least one model/theory that is true.
But, of course, there is always a probability that my analysis (or any criticism of it) is in error. 🙂
George
“All models are false” has a much deeper implication: that a phenomenon can never be fully explained even if the initial conditions are fully known. IOW: science is a futile enterprise. This implication arises because, for a full explanation of a phenomenon to occur, an exact model must exist. If the statement is true, any endeavor to find the cause of discrepancies with the current model will ultimately fail regardless of any improvements made.
I’m not aware of the context surrounding Boxer’s statement. It’s occurred to me that 1) it was meant to taken colloquially and 2) it may have been tongue-in-cheek. The colloquial meaning of all (meaning “mostly”) can be seen in the statements “grass is green” and “the sky is blue.” There are obvious contradictions to this: wheat is golden in its final stages; my own lawn has patches of brown; and the sky doesn’t look blue at night or in bad weather.
Hmmm … I see I said “Boxer” instead of “Box”. My mind works in evil ways. I was initially going to make a comment connecting the Box statement with the Boxer Rebellion which I graciously spared the readers but my twisted mind obviously wanted some of it to get out. Life is never dull if you can provide your own entertainment.
To bring this discussion full circle, Gavin’s contention (with his snarkiness removed) is that climate models can be “tested” when there is “no track record of predictions”. The method he opines could be used is to divide historic data into two sets, one to build the model and another to test it.
But this method has a flaw. Any randomly chosen subset is a mirror, a mapping if you will, of the whole set. There is no distinctly separate, unique, independent data to use for the testing.
The historical data (in whole or in part) contain zero future data. Therefore the model is NOT tested.
Take for example a horse race handicapping system (model) which includes data on every race the horses have ever run, their times, the track conditions, etc. In other words, complete information about prior races is incorporated into the model. You can purchase these tout sheets at the track, by the way.
But no matter how expertly compiled, the tout sheets have limited skill at predicting the winners of future horse races. If they did work, the touters would be rich. But they aren’t rich — they are miserably poor and forced to sell their worthless tout sheets for pennies to the gullible.
Similarly, climate models have no crystal ball filled with future data, whether or not they use past data in the model building process. Gavin claims they do NOT, btw.
[Y]ou imply that a climate model built today somehow secretly knows about the temperature and rainfall evolution of the last 30 years or more and that each year we add one more set of annual values to the models to make them better. This is a nonsense. The climate models are not statistical models trained on simplistic indices – not even close.
In other words, real world data is NOT used to build climate models, according to Gavin. Climate models are “tested” against past data, with varying results, often laughable, but they fail miserably at predicting the future. So miserable are the predictions that climate modelers eschew the word “prediction” and substitute the word “projection”.
In his defense, Gavin claims that climate modeling is not his thing:
Personally, I am not involved in any of these efforts and have yet to be convinced that they will show any useful skill…
I concur. The models have not shown skill at predicting the future, any more than horse racing tout sheets have. It would not be rational to gamble our economic future on a horse race. Similarly, it is irrational to gamble our economic future on demonstrably unskillful climate models.
It’s like saying all samples contain fewer members than the population. One could so re-define the population such that a it only holds a small finite number of members that can be analyzed. But then you sort of finesse the ordinary notion of sampling.
If Box had said, “all models are incomplete” I think there would be less confusion. And to the extent that the whole point of the model is to develop a simpler, easier to understand or faster operating, subset of reality,it HAS to be “wrong” — less than contiguous at all points to reality.
Mike D.
But this method has a flaw. Any randomly chosen subset is a mirror, a mapping if you will, of the whole set. There is no distinctly separate, unique, independent data to use for the testing. The historical data (in whole or in part) contain zero future data. Therefore the model is NOT tested
The hold-out method is a way of testing a model’s probable performance but it presupposes that the training data are representative of the population. As long as the presupposition holds, there is no philisophical difference betwen a set which is a partition of data on hand and another obtained at a later time. Insisting of “future” data is unnecessarily restrictive. Any data not previously processed by the model during training is for all intent and purpose the equivalent of “future” data.
I had the good fortune to study with Dr. Box, and I’m afraid you’ve misconstrued is aphorism. You have somehow managed to conflate “All models are wrong” with “All models are false” and then went on your merry way skewering your strawman.
I can assure you from first hand interaction with Dr. Box, that “All models are wrong” means simply, “All models have error”. In the silly example you state, the “unfalsifiable” Model A isn’t really even a model. It’s freshman coffe-klatch doo-doo.
Mike B,
Box’s statement has been taken by many, even most, to mean that “all” models are in fact false. The paper I linked yesterday will show that. However, even accepting that Box really meant “All models have error”, then he is still wrong. All models do not. That is, there are some that do not. For example, any mathematical theorems with true premises and a valid conclusion are without error.
Now, if he meant that all valid probability models—which are models with true premises and uncertain conclusions—have conclusions that are uncertain, then he was stating the truth. But this is nothing more than a restatement that valid probability models have true premises and uncertain conclusions. Nothing controversial in that, of course.
To the “doo-doo.” Can you show, instead of merely claim, how my example is logically false? Any model, of a unique event, will be in the form I gave. The model will not be general because the event is not. The event can only happen once.
The difficulty, I think, is that in thinking about these topics we tend to jump right to the most complicated situation and only keep that in mind. Part of this—as always—is my fault. A model is synonymous with a theory, or a “law”, or a theorem, or even an argument.
But let me say that Box was a brilliant man and his time series work is rightly known by all. I’m not picking on Box. I am showing how one statement he has made, and has possibly been misconstrued, is false. I accept from you that Box didn’t mean it in the way that people take it.
Uncle Mike is on the right path, as always. The trouble with “hold out” data is this. People build probability models on one set and test it on the “hold out”. The model that works best is said to be most probably true. And that even might be true, for some cases, but what it also means is that the “hold out” data has been used in constructing the eventual model. To truly demonstrate skill, that model will have to make useful predictions of independent (not yet known) data.
DAV,
Right. I often—today I did not—say that “future” data is not necessarily taken somewhere forward in time, but is data not yet know. Or not yet conditioned on.
POUNCER,
If Box had said, “All models are incomplete” he would still be wrong, and for the same reasons I gave Mike B. Remember, everybody, it’s that darn “All” that is getting him in trouble.
George,
You’re right. My shorthand has done me in. Here’s a better way of saying it: If we can prove S is true, then we have also proved ~S is false. Likewise, if we have proved S is false, we have proved ~S is true. The tilde means “not.”
George Crews,
I accept most of this except for the bit about “All scientific knowledge is based on experiment.” There’s that “All” again, and it makes the statement false. It completely rules out all a priori knowledge, for example (like axioms; and even more, which I hope to cover soon).
There are two kinds of hold-out data. Let me use a forestry example to explain.
If you cruise a stand by taking 100 plots, use 50 to build a local growth model, and then test the model with the other 50, they are all all the same data. Just change your “n”; it really doesn’t change anything else. You can do fancy bootstrapping, but it’s all a chimera and a waste of time.
On the other hand, if you take 100 new plots from a completely different stand in a different county, you could theoretically use that new data to test your model. They are independent data, more or less. But then your area of inference changes. Now you have a two-stand model instead of a one-stand model. Again, the old data are the old data.
The point of a growth model is to predict future growth. That’s its utility. Past growth is no mystery. It is what it is.
So the only real way to test your growth model is to return to the stand 10 years later and remeasure. The ten years of growth are new data, and a fair test of the model.
I have done it both ways. The first way is self-delusional, and possibly other-delusional if you can get others to buy into it. The second way is intellectually, logically, and pragmatically honest.
And to thrown out a sniping comment, just to be churlish, Gavin spoke at a forestry conference earlier this month. He was billed thusly:
“Dr. Gavin Schmidt is a climate modeler at the NASA Goddard Institute for Space Studies on New York and is interested in modeling past, present and future climate. He works on developing and improving coupled climate models and, in particular, is interested in how their results can be compared to paleoclimatic proxy data.”
Makes me wonder about his claim, “Personally, I am not involved in any of these [climate modeling] efforts.”
You is or you isn’t. It’s a black and white, on/off situation.
George quoted “It is a logical fact that if you cannot prove something true, you also cannot prove it false.†and Briggs replied: “You’re right. My shorthand has done me in.”
I have to quibble. If you cannot, you can not. If you cannot prove a proposition true, it may be true or false; if you can prove it false, you can’t –of course– prove it true, but if you can’t prove it false… I think, in science, this means: it’s metaphysical.
One reading of that term’s meaning is not verifiable; another is, definitional…
The hold-out method is a way of testing a model’s probable performance but it presupposes that the training data are representative of the population. You can slice up the data any way you please as long as the partitions remain representative. Training/testing with unrepresentative samples invalidates the procedure. All modelling make this presupposition as well.
Now, if the data changes over time, such that samples taken now are possibly unrepresentative of samples taken after a time lapse, how long is enough time to test the model?
DAV, perhaps you were thinking of Boxer in “Animal Farm” whose constant complaint was, “I don’t understand.”
I’m with Mike B on this. You have redefined ‘model’ to mean any logical statement which allows you to be cute, but provides no insight at all.
To call something a model must imply that there is some reality and that you are attempting to approximate that reality. This approximation can be based on a statistical model, or on fundamental physics, or a mixture of both. But this does not have anything to do with axiom based mathematical or logical systems. No model in this sense is ‘true’ since reality is always more complex than can be encapsulated in any model system. Box was therefore correct.
(PS. Mike D., I am indeed a climate modeller, and have never suggested otherwise. You are confusing a particular experimental methodology for making short term predictions with climate modelling in general. I am not involved in the former, but plenty involved in the latter).
Gavin correctly notes that models are by nature synthetic aposteri. Briggs incorrectly includes analytical apriori statements in his definition of models.
Gavin, Mike B,
See the original post for an update. The questions are important enough to be part of the main post.
“While there might be plenty of practical shades of use and definition, there is no logical difference between a theory and a model. The only distinctions that can be drawn certainly are between mathematical and empirical theorems. In math, axioms—which are propositions assumed without evidence to be true—enable strings of deductions to follow. Mathematical theories are these deductions, they are tautologies and, thus, are true.” Wm. Briggs
You are now arguing that the assertion “all models are wrong” is an empirical conclusion which has been reached prematurely — until all models have been tested, the conclusion is not proved true and must be treated as false.
I would suggest instead that “all models are wrong” as Box intends is a tautology. By definition a model is distinct from reality — wrong — in ways that are intended to be useful. Sometimes the usefulness fails. But if here the term “model” refers to the conclusion about all possible conclusions, then the statement is self-referential and can’t be either true or false.
POUNCER,
Sorry; in a hurry.
Not empirical at all. Take a look at the update.
William
While your demonstration is correct and indeed the statement taken at its face value (what one should always do with scientific statements) is wrong , I believe that there is and has always been a SEVERE problem of vocabulary when the talk is about climate models .
I will try in the following to evacuate many common misconception and show what the climate “models” actually do or do not .
.
1) There where a climate model says prediction , you must understand simulation
2) There where a climate model says experiment you must understand computer run
3) There where a climate model says skill you must understand consistency
4) There where climate model says projection you must understand a simulation under imposed constraints and prescribed scenarios
.
The biggest misconception is that climate models “solve” equations . They don’t by even the most liberal use of the word solve . That’s why they don’t need accurate initial and boundary condition and that’s why they cannot BY DESIGN make any deterministic predictions . What they do instead is to obey the conservation laws (mass , energy and momentum) within a given spatio-temporal discrete grid . Actually they even struggle with that and use some tricks (f.ex unphysical viscous dissipation dear to G.Browning) to avoid divergences or amusing oddities like negative ice concentrations .
.
The second misconception is that they use the same physics . They don’t . Some are ocean-atmosphere coupled and some not . Some have atmospheric chemistry inside and some have not . Some are coupled to the carbon cycle and some are not . Some have high resolution (small grid cells) and some have very low resolution . Well yes , all try to conserve energy but that’s the very least one would expect 🙂
.
The third misconception is that they are complex . They are not . Any student with knowledge in fluid mechanics and thermodynamics would understand in 2 months the physical principles . The computational side and the code is of course vast but that’s not physics or mathematics . However as I will show below this doesn’t imply that the people understand why the model does what it does . Actually they don’t really .
.
The fourth misconception is that they can be easily “tuned” . They cannot . It’s mostly because they are just based on primitive conservation laws on a discrete grid . There’s not much room for “tuning” there . What plays the “tuning” role is the subgrid parametrization and coupling . Subgrid parametrization is the alpha and omega of climate models . Because many relevant physical processes happen at smaller scales than the grid , they are represented by specific ad hoc formulas . Of course experiments (understand runs) are done to check what different parametrizations do so it is a kind of “tuning” but once it’s done it normally doesn’t change .
.
So now what is all this story about verifying “predictive skills” ?
Well if you use my dictionary above , it translates to “simulation consistency” 🙂
And it is indeed exactly that .
As I said the people do not understand why , in detail , the model does what it does . That’s why they understand even less why different models have very different behaviours .
And the very ambitious goal is to try to analyze what is the reason for the differences .
A simple mind like yours or mine would tell them that the models give different results because they use very different physics and very different resolutions .
But they have invented a very special statistical concept of “ensemble mean” that I find absurd and that would deserve a special post .
So they hope that by doing a bazillion of runs with different models they would find the Holy Grail – the separation of “internal” and “external” variabilities .
This separation doesn’t make much sense either especially when dealing with chaotic systems that are never in equilibrium but well I guess the climate science has never heard about out of equilibrium systems .
In any case all that will provide work for , I estimate , several hundred people during 3 years .
Not counting as many workshops and meetings as possible in nice and entertaining places