Not all models are false. So much we have discussed before.
To say a model is false is to show that one of the premises which comprise the model is false. There are thus certainly many false models. But then there are many true ones. Casinos positively rely on them—and make, as my sister would say, beaucoup bucks doing so. It would be too tedious to rehearse the premises of dice throws and the like. Regular readers will know them off their hearts. But for the freshly minted, this series is pretty good showing some true models.
In order to say a thing is false we must have proof in the form of a logically valid argument. It is no good whatsoever to say, “Ah, the model can’t be true.” This isn’t a valid argument, even though, stripped down, it is the most popular one.
One such proof would be in the form, “The third premise in the model is false because of this true condition.” If such proof isn’t forthcoming, we don’t know the model is false. And some models we can even prove are true, as just said. These start with accepted premises and move to a conclusion the probability of which can be deduced. This probability can even be 1, as in mathematical theorems.
Mathematicians, even of the statistical variety, are comfortable with proof and use it often. But not when it comes to saying this or that model is false.
A true model is not falsified when it says, “The probability of C is X” where C is some proposition and X is some number between 0 and 1, because no observation of C or not-C can ever falsify that model. If a model says “The probability of C is 0″ or 1, or these numbers are in its boundaries, then that model can be falsified if it says C is impossible but C obtains. But it cannot always be falsified if it says C is certain and we don’t see C—unless contained in C are other indicators, such as timeliness.
A model cannot be falsified if it says “The probability of C is exceedingly small” and then C is later found to be be true, because the model did not say C was impossible. It is here where many mistakes are made.
Now many, many models statisticians use are false. Every “normal” model is, which includes all regressions and so forth, and these comprise the bulk of workaday models. Every normal model eventually says, “The probability of C is 0″ where C is almost any proposition, and where C’s “pop up” with regularity. For example, in modeling temperature a normal is often used, and this model must say, “The probability the temperature will be T is 0″ for any T. Yet we will certainly see some T, and when we do we have immediately falsified the model.
Since normal and other “continuous” models, which are models incorporating mathematical infinities and used on finite realities, are used, they are always false. This may be where the perception that “all” models are false originated.
The second portions of Box’s quip is that false models can be useful. Here is where it gets interesting. Take any normal model which is used in practice. If any thought about its validity is made it will be realized the model is false. But it will still be used. Mostly because of inertia or custom, but also because of the sense that the model is “close enough.”
It is that unquantified “close enough” which is fascinating. The average user of statistical models never considers it, at least not seriously. And the statistician is usually satisfied if his math works out or that his simulations are pretty (not worrying that many of these run dangerously close to, or actually are, instances of circular logic).
It is true that sometimes “close enough” is indeed close enough. But since many models don’t get real checking on truly independent evidence, many times “close” isn’t even in the general vicinity.
This is why predictive inference is such a good idea. I often give examples using regression which show both classical frequentist and Bayesian results are “good” in the sense indicated in those theories. But when the model is used in its real-life sense—i.e. its predictive mode, making probability statements about the same observables that were modeled, which after all is the reason for the model in the first place—then it becomes glaringly obvious the models stink worse than a skunk on the side of the road in August.
Examples to come!