I hope all understand that we are not just discussing statistics and probability models: what is true here is true for all theories/models (mathematics, physics, chemistry, climate, etc.). Read Part I.
Suppose for premises we begin with Peano’s axioms (which themselves are true given the a priori), from which we can deduce the idea of a successor to a number, which allows us to define what the “+” symbol means. Thus, we can eventually hypothesize that “2 + 2 = 4″, which is true given the above premises. But the hypothesis “2 + 2 = 5″ is false; that is, we have falsified that hypothesis given these premises. The word falsified means to prove to be false. There is no ambiguity in the word false: it means certainly not true.
Now suppose our premises leads to a theory/model which says that, for some system, any numerical value is possible, even though some of these values are more likely than another. This is the same as saying no value is impossible. Examples abound. Eventually, we see numerical values which we can compare with our theory. Since none of these values were impossible given the theory, no observation falsifies the theory.
The only way a theory or model can be falsified is if that theory/model says “These observations are impossible—not just unlikely, but impossible” and then we see any of these “impossible” observations. If a model merely said a set of observations were unlikely, and these unlikely observations obtained, then that model has not been falsified.
For example, many use models based on normal distributions, which are probability statements which say that any observation on the real line is possible. Thus, any normal-distribution model can never be falsified by any observation. Climate models generally fall into this bucket: most say that temperatures will rise, but none (that I know of) say that it is impossible that temperatures will fall. Thus, climate models cannot be falsified by any observation. This is not a weakness, but a necessary consequence of the models’ probabilistic apparatus.
Statisticians and workers in other fields often incorrectly say that they have falsified models, but they speak loosely and improperly and abuse the words true and false (examples are easy to provide: I won’t do so here). None of these people would say they have proved, for example, a mathematical theorem false—that is, that they have falsified it—unless they could display a chain of valid deductions. But somehow they often confuse unlikely with false when speaking of empirical theories. In statistics, it is easy enough to see that this happens because of the history of the field, and its frequent use of terms like “accepting” or “rejecting” a hypothesis, i.e. “acting like” a model has been proved true or falsified. However, that kind of language is often used in physics, too, where theories which have not been falsified are supplanted wholly by newer theories.
For a concrete example, take a linear regression model with its usual assumptions (normality, etc.). No regression model can be falsified under these premises. The statistician, using prior knowledge, decides on a list of theories/models, here in the shape of regressors, the right-hand-side predictive variables; these form our premises. Of course, the prior knowledge also specifies with probability 1 the truth of the regression model; i.e. it is assumed true, just as the irascible green men were. That same prior knowledge also decides the form of these models (whether the regressors “interact”, whether they should be squared, etc.). To emphasize, it is the statistician who supplies the premises which limits the potentially infinite number of theories/models to a finite list. In this way, even frequentist statisticians act as Bayesians.
Through various mechanisms, some ad hoc, some theoretical, statisticians will winnow the list of regressors, thus eliminating several theories/models, in effect saying of the rejected variables, “I have falsified these models.” This, after all, is what p-values and hypothesis testing are meant to do: give the illusion (“acting like”) that models have been falsified. This mistake is not confined to frequentism; Bayesian statisticians mimic the same actions using parameter posterior distributions instead of p-values; the effect, of course, is the same.
Now, it may be that the falsely falsified models are unlikely to be true, but again “unlikely” is not “false.” Recall that we can only work with stated premises, that all logic and probability are conditional (on stated premises). It could thus be that we have not supplied the premise or premises necessary to identify the true model, and that all the models under consideration are in fact false (with respect to the un-supplied premise). We thus have two paths to over-certainty: incorrect falsification, and inadequate specification. This is explored in Part III.