The models in Part I might not have “felt right” to you. But if that is so it is because your diet of probability examples has been too narrowly constricted. This is natural if you’ve received the regular course of statistical training, which consists of meals of data cooked with fixed recipes, but with little chemistry—to stretch a metaphor past the breaking point—about where the recipes arise.
Recall that we wanted know the truth of the proposition C = “George wears a hat.” We can’t know the truth of this, or any, proposition without respect to some evidence. We had three possible models, or three sets of evidence, and these gave us three different probabilities that C was true (this was coincidental; they could have all given the same probability).
But where did the models/evidence come from, how did they arise? I made them up. But that is nothing: the rules of probability work regardless of the provenance of the evidence or models. Logic and probability are only concerned with the connections between statements, not with the statements themselves. This is often forgotten.
Most day-to-day statistical models deal with “data” that is collected in experiments or incidentally. These data are modeled, which is a dangerous shorthand to say that our uncertainty in the values of certain propositions related to the data are quantified given some evidence (like our E). It’s dangerous because the shorthand can be, and sometimes is, used to reify the models. People say “X is normally distributed”, which is a reification of the proper sentence, “Our uncertainty in the values of X is characterized by a normal distribution.” X is caused to take the values it did. Normal distributions do not cause anything.
Even in cases where refication is not suspect, the so-called normal model is used. Where did it arise? Well, we can always use any model we want, just like when we made up the Martian syllogism. If for instance our M(eta evidence) = “The normal model is true” then (via circularity, but still validly) the normal model is true. Because of habit and ease this M is used more than any other. This does not mean that EN is always the best or most useful or that given other M is EN even probable.
It is often possible to deduce a true model given simpler known-to-be true facts. Suppose we know that there are N balls in an urn. It’s always balls in urns; but use your imagination to substitute other examples. The N balls can be labeled only “0″ or “1″ (red or blue, success or failure, etc., etc.). We do not know how many of the N are 0 and how many 1; it could be that none are 0 or none are 1.
The proposition of interest—which I am making up, as we make up all propositions of interest—is C = “There are M 1s” where I will substitute a number between 0 and N for M. I could, say, choose M = 1000 > N, but then I would be able to deduce, given the evidence, that C is false. Suppose I wanted C = “There are no 1s” (i.e. M = 0).
Given the evidence we have—and accepting no other: a key proscription to remember—we use the statistical syllogism and deduce that the probability C is true is 1/(N+1). Perhaps this is intuitive: there are N balls and thus N + 1 chances for 1s (none, just 1, just 2, etc.). This model is the true model given the evidence that there are N balls and they can be labeled only 1 or 0.
Once we are comfortable writing out the full evidence and probability statements we can use simplifications, like saying “the probability M = 1 is 1/(N+1).” Or we might say that “M has a uniform distribution.” For now, we stick with the original language which is cumbrous but ever accurate.
Now suppose we take out n1 + n0 = n < N balls and notice that n1 are labeled 1 and n0 are labeled 0. This new evidence can and should modify our model about the remaining N – n balls. There’s some math involved (see this paper), but the deduced model about our uncertainty in the number of 1s left is called a negative hypergeometric or beta-binomial. If we take a out still more balls, leaving some in the urn, the probability that the remainder are labeled 1 is still the same model (but updated to account for our new information about the labels on these new balls).
Once we have taken all balls out, the probability any remaining ones being labeled 1 or 0 is 0: this is deduced given our model. We obviously no longer need the model for future use, just as we no longer needed a model after we saw George wearing a hat.
Incidentally, those familiar with statistical lingo will note the complete absence of (continuously valued) parameters, little Greek letters that are usually necessary to full specify a model. Parameters weren’t needed in the Martian hat example either. We don’t need them because everything is written in terms of observable evidence. The urn example can show us how parameters arise, interestingly.
If we let N grow large, towards the limit, then the distribution which characterizes our uncertainty in the number of remaining 1s is still beta-binomial but suddenly parameters are present which take the place of observational evidence. We could have just said “N will be large” and used the beta-binomial with continuous-valued parameters to start, by taking “priors” on the parameters and so forth, but these actions would be an approximation to the model we deduced as true when N was finite—and N will always will be finite for real-world examples.
Ideally, all statistical models should begin the “urn” way: stating what is finitely observable and working out the math for the finite case, only taking the limit at the end to see if useful approximations could be made (Jaynes warned us about the dangers of misapplying limits prematurely). It would also end arguments about the influence of “priors.” Priors wouldn’t be needed, except as the arise as the deduced natural limits of properly described observational processes.
Update How this all relates to climate models is coming!