We have in our hands (via the predictive posterior) the probabilities that Susy’s GPA is less than any number between 0 and 4, given that we accept the normal model and our assumption that the sample we chose is probative. At the end, Susy will have a definite GPA. Suppose she is a star and earned straight As: a GPA of 4 (this happens). We ignore the falsification of the normal model when it said the probability for this event is 0. We instead want to judge the model by other criteria.

Suppose the model said that the probability of Susy’s GPA less than 4 was 95% (I’ve seen real data that gives answers like this). The probability leakage is therefore not insignificant: the probability (given our other knowledge) should be 100%.

Ignoring the two big problems (leakage and probability 0 for real event), how did the model do? Hard to say. Our model said the probability of Susy’s GPA being greater than 4 was 5%. If we had another model (who knows from whence it came) that said that this probability was 20%, would we say that this model was better? Did we get anybody to make a bet with us on our model? We are back to the problem we had in Part I. Performance is in part a subjective measure, and where it is logic and probability have nothing to say. We only have one model so we can’t even look at how likely this model is to be true given Susy’s observation, as we did with the E_{i} after we saw that C (George wears a hat) was true.

We do have another model, the one at the end of Part III; the finite, non-parametric, deduced model. That model would have given us a definite probability of Susy’s GPA equaling 4. And since that model is known to be true given the evidence we had, and assuming our sample was probative, we knew before we saw Susy’s GPA that the model was going to perform well. But we’re ignoring this model for the purposes of discussion.

Seems we’re stuck with just the prediction and with whatever the consequences we had for the one-time use of the model. But if we’re going to use this model to predict not just Susy’s, but James’s, Katie’s, and Harry’s GPA, for a whole group of freshman, then we have more tools at our disposal. There are a suite of measures that we can apply to the model’s predictions—we’d look for calibration, sharpness, maybe skill. We’d know that if we took at model and calibrated it, then the calibrated version would be better than its non-calibrated version no matter what decision a person made using the model. Etc., etc. All these have technical definitions which aren’t of interest to us today (but see tomorrow).

The point is that we can take these performance measures and ask another probability question: what are the chances that this model will perform as well or as poorly in the future? And we’d know how to answer this question. Yes, we’d have another model, but this one quantifying our uncertainty about our first model’s performance. It all works.

And if we had a suite of original models, each vying for the title of best predictor of GPAs, we’d know how to calculate the probability that each is true model. We’d know how to combine the models to create one overall prediction, too. We’d also be able to say how probative our sample was and if another sample was better. That is, if we adopt the predictive stance advocated here, life would be good, or at least straightforward.

But before we leave, there is one more avenue we haven’t explored. Some may be dissatisfied with the idea of a “true model.” I showed some examples where a true model was deduced given certain facts. The discrete, finite urn model was deduced and was true, given our knowledge of an urn with N dichotomous objects. This model said there was a certain probability that there would be a certain number of 1s in the urn, etc.

But wouldn’t a *true* model, a truly true model, have told us that there was probability 0 for all possibilities except for the actual event, i.e. for the urn with exactly the actual number of 1s and 0s? Wouldn’t a true model tell us there was 100% probability of Susy’s GPA being exactly what it turned out to be? Don’t true models predict perfectly? Before I answer that, make sure you understand that this series was not limited to textbook probability and statistics models. It was for any model—physical, mathematical, probabilistic—where we infer conclusions from given assumptions. Everything here goes for physical models, for climate models, for mathematical theorems, just as it works for probability models.

We can say that a truly true model predicts perfectly, and that if a model is anything short of perfection it is not true. Fine, if you like: it is just a definition. But we still have to have words to describe the multitude of models that do not predict perfectly, which is every model short of mathematical theorems. Even if the universe is deterministic and perfectly perfect models can exist, we humans have imperfect knowledge and must make predictions in the face our uncertainty.

In those cases where we have fully and explicitly delineated all the information that we have (like the urn, but not like Susy’s GPA), then there is then a limit on the model: one will be best and true, and others will be weaker and not true. We can’t always delineate all information, as we couldn’t in Susy’s GPA example, but only because we had doubts about the sample: the structure of the finite possible GPAs was not uncertain. For these cases, it always remains a possibility that a better model exists—where by better we mean in the sense outlined above (calibration, etc.)—but only because we get better at delineating the proper probative information.

**Update** How this all relates to climate models is coming tomorrow!

Categories: Philosophy, Statistics