All Models Are Not Wrong
George Box said, “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”
This is usually misremembered as the equivalent, and pithier, “All models are wrong, but some are useful.”
Both versions are false. This is because the premise “All models are wrong” is false: all models are not false. Here is why.
First, readers should keep in mind that what follows is a philosophical argument. My burden is solely to show that there exists at least one model that is true. Whether the model I show is useful or interesting to you or not is irrelevant.
In math, if somebody starts a theorem with the statement, “For all x…” and later comes along a fellow who shows an x that, when plugged into the theorem, ends up in a different place than predicted by the theorem’s conclusion, then that theorem has been invalidated. In simpler words, it is false.
It might be that there is only once such x that invalidates the theorem, and that for all other x the theorem holds. But you cannot say, “The theorem is practically true.” Saying a theorem—or argument—is practically true is like saying, “She is practically a virgin.”
But all that is just boilerplate, because it’s going to turn out that many models are true and useful.
Suppose Model A states, “X will occur with a probability that is greater than 0 or less than 1.” And let Model B state that “X will occur”, which of course is equivalent to “X will occur with probability 1” (I’m using “probability 1” in its plain-English, and not measure-theoretic, sense).
Now, if X does not occur, Model B has been proved false. The popular way to say it is that Model B has been falsified. If X does occur, Model B has been proved true. It has been truified, if you like.
How about Model A? No matter if X occurs or not, Model A has not been falsified or truified. It is impossible for Model A to be falsified or truified.
It is a logical fact that if you cannot prove something true, you also cannot prove it false. We cannot prove Model A is true, therefore we also cannot prove it is false. Box’s premise assumes we have proof that all models are false. But here is a model which is not known to be true or known to be false.
Therefore, Box was wrong. We do not know that all models are false.
Many will object that I only allowed X to have one chance to occur. And the more training you have had in science, the temptation to ask for more chances irresistibly grows stronger: frequentist statisticians will insist on there being infinite chances for X, which requires that X be embedded in some uniquely definable sequence.
Resist this urge! There are many unique events which we can model and predict. Example: Hillary Clinton wins the 2012 presidential election. There is only one Hillary Clinton and she only has one chance to win the 2012 election (if you don’t like that, I challenge you to embed this event into a uniquely defined infinite sequence, along with proof that your sequence is the correct one).
For fun, let’s grant X more chances to occur. First know that all probability models fit the schema of Model A. (Note: there is no difference between probability and statistics models; the apparent division is artificial; however, you do not need to believe that to follow my argument. There is also no difference between probability and mathematical or physical models: models are models; you don’t have to believe this yet either.)
But it doesn’t matter how many times a probability model makes a prediction for X: it can never be proved true or be proved false, except in one case. Excepting this case, the more numbers of chances for X doesn’t change the conclusion that Box was wrong.
The exception is that a probability model—or any other kind of model—can be proved false if you can prove the premises of that models are false. Sometimes this is easily done. For example (see the post from two days ago), we know that grade point can only live on 0 to 4, but we use a normal distribution (premise) to model it. We know, given the properties of normal distributions, that it is false. Yet the model might still be a useful approximation (what happens here is that we internally change the model to an approximate version, which can be true).
And some probability models have true premises. Casinos operate under this belief. So does every statistician who has ever done a simulation: what is that but a true model? Physicists are ever after true models. And so on.
If Box had said, “Most models are wrong, but some are useful,” or, “All the models that I know of are wrong, but some are useful” then what he said would have been uncontroversial. I suggest that it is these modifications that most people actually hear when presented with Box’s “theorem.” Since it’s mostly true—to statisticians playing with normal distributions, anyway—they feel it’s OK to say completely true. However, they never would accept that lack of rigor in any of their own work.
That’s all we can do in 800 words, folks. We’ll surely come back to this topic.
Update The model for a unique-physical-measurable event has been objected to on the grounds of simplicity. These kinds of models do not have to be accepted for the main argument to be true (though I believe they are sufficient).
First recall that simulations are true models all. To add the premise “and this simulation is meant to represent a certain physical system” might make the model false. This has been discussed in Bernardo and Smith in depth (open versus closed universes, etc.), and I ask the reader to rely on those gentlemen for particulars.
However, even these extensions are not needed. Consider this: to say “all” models are false implies that a rigorous proof of this exists. So if I have a Probability Model A (for, let us suppose, a non-unique, physical, measurable, observable event), you cannot, in finite time prove that this model is false. No observation can do so, nor can any collection of observations.
The best you might do is to say “Model A is probably false.” But to say “probably” means to admit that a proof of its falsity does not exist. And again, to say “practically false” is equivalent to say “practically a virgin.”
So again, not all models are false. And all probability models are not—or, that is, they cannot be proven so.
Neither can, it might interest the reader to learn, climate models. They cannot be proven false, even if their predictions do not match observations. This is why we ask for a model to demonstrate skill (see the prior post).