Recapitulation: we have Pr(p|qm) where p is a proposition of interest, q the evidence we have compiled in the form of observations and so forth, and m is the “I believe”, i.e. the model, the whole-cloth thing which says, “I believe the uncertainty of p in the presence of q is characterized by this parameterized probability distribution.”
We began with p = “Tomorrow’s high temperature will be 72F” but learned that if our “I believe” was m = “A normal distribution with parameters a and b” then the probability of p is 0 no matter what a and b were and regardless of q. Bummer. So we switched to p = “Tomorrow’s high temperature will be greater than 72F”, which might be given a non-zero probability with this m.
But we haven’t yet found joy because we know nothing about a and b, which are absolutely required if we want to stick with this m. Bayesians make one guess, frequentists another. Both do so by appending m with extra “I believes” which justify the guesses.
Two crucial points. One, there is nothing universally or automatically wrong with adding “I believes”, for these might be true (with respect to other relevant evidence about q and p). People can guess correctly. Of course, they can also guess incorrectly. Two, regardless of the truth of the guesses (of both m and the parameters), assuming they and q are true produces the correct probability of p.
This second fact is difficult to keep in mind because critics of the probability of p will often attack the value of the probability itself, which (assuming no calculation errors, which are rare) are never wrong. What might be wrong are q, m, or the m’s appendages.
Example: q = “Just a certain proportion of Martians wear hats and George is a Martian”, m = “I believe this certain proportion is 0.6″, with p = “George wears a hat”, then Pr(p|qm) = 0.6, a correct and true deduction. Nothing in the world can make this conditional probability false (and all probability is conditional).
Q or m can be false. But if they are it has to be with respect to some other evidence, for truth, falsity, and probability are all conditional (even the most humble truth is known at least with respect to our intuitions). We could assess m by counting all Martians who wear hats and divide by the number of Martians. This proportion will either be 0.6 or not. No sample of Martians which don’t include all can falsify or verify m, though. No observation can falsify the first clause of q, but direct measurement can tell us about the second. And then there’s the observation that there are no Martians!
Back to temperature. We’ve used the observations (part of q) to make guesses of a and b and used m to deduce the probability of p; say, 93%. Now this 93% applies to p, of course, but suppose upon observation that p was false—this is another condition, not q or m, but a new observation, such that Pr(p|obs) = 0; that is, the temperature was less than 72F. The 93% was still true, but it doesn’t feel “close”.
It’s at this point we must “invert” the problem and ask what the observation says about q or m, and not about p (questions about p are trivially answered; either Pr(p|obs) = 1 or Pr(p|obs) = 0, assuming the observation is measured without error). That is, we want to know things like how true q or m are with respect to the observation. Three things are possible.
First, m (which includes the guesses of its parameters) might be true but q false or unlikely. For instance, suppose q contained measurements about temperatures in July for Tucson but we applied q and m for January in Detroit. A silly error no one would make? The opposite: misapplying the circumstance is the most common error, and also the most unrecognized. This is because q nearly always contains tacit premises, evidence which is in the investigators’ minds but which isn’t written in the formal statistical modeling phase, and these absences cause people to forget it’s there. This is the danger of shorthand notation.
Remember: the analysis is valid if m is true and for the kind of situations that match q. Psychologists, sociologists, educationists and the like abuse their models by forming q’s from (usually) nearby college students and then implying that q actually represents all human beings in all cultures in all times. This is why it is mandatory to fully specify q, a step which is rarely or never done. Some researchers take stabs at it, but unless their objects of study are controllable, in-the-small physical events (which don’t have volatile personalities), then they usually do a poor job of it.
Second, q might be true but m (or its parameters) might be false or unlikely. This is an easy situation because there are simple ways of verifying models, as long as the model is used regularly enough to build up evidence about its performance. Terms like “proper scores”, “calibration”, and the like come to play. Think of it as plotting instances of p against its predictions: if these line up, m is more likely true; if not, then not.
Funny thing about this is because most of statistical practice is psychotically fascinated about the parameters of m, this verification step is rarely done (I only ever see it in physics models). Researchers are confident that by telling you their guesses of parameters that they have told you everything you need to know, a practice which implies that m and q are—of course!—true.
Third, both m and q are false. These are difficult situations because one hardly knows where to begin. We’ll know something has gone kerplooey because the probabilities of p won’t match up to the real true states of p, but we won’t know why. We might automatically blame m, the usual step, and try to “fix” it, but since q is broken too, these efforts will largely be in vain.
Last word: all of what was said applies to all those probability models you—you—use daily, and not just in those that are written up formally in some obscure journal. Every time you make any judgment of uncertainty, you are applying “probability models” with some q in mind. You just weren’t aware of it.