Model Selection and the Difficulty of Falsifying Probability Models: Part II

I hope all understand that we are not just discussing statistics and probability models: what is true here is true for all theories/models (mathematics, physics, chemistry, climate, etc.). Read Part I.

Suppose for premises we begin with Peano’s axioms (which themselves are true given the a priori), from which we can deduce the idea of a successor to a number, which allows us to define what the “+” symbol means. Thus, we can eventually hypothesize that “2 + 2 = 4”, which is true given the above premises. But the hypothesis “2 + 2 = 5” is false; that is, we have falsified that hypothesis given these premises. The word falsified means to prove to be false. There is no ambiguity in the word false: it means certainly not true.

Now suppose our premises leads to a theory/model which says that, for some system, any numerical value is possible, even though some of these values are more likely than another. This is the same as saying no value is impossible. Examples abound. Eventually, we see numerical values which we can compare with our theory. Since none of these values were impossible given the theory, no observation falsifies the theory.

The only way a theory or model can be falsified is if that theory/model says “These observations are impossible—not just unlikely, but impossible” and then we see any of these “impossible” observations. If a model merely said a set of observations were unlikely, and these unlikely observations obtained, then that model has not been falsified.

For example, many use models based on normal distributions, which are probability statements which say that any observation on the real line is possible. Thus, any normal-distribution model can never be falsified by any observation. Climate models generally fall into this bucket: most say that temperatures will rise, but none (that I know of) say that it is impossible that temperatures will fall. Thus, climate models cannot be falsified by any observation. This is not a weakness, but a necessary consequence of the models’ probabilistic apparatus.

Statisticians and workers in other fields often incorrectly say that they have falsified models, but they speak loosely and improperly and abuse the words true and false (examples are easy to provide: I won’t do so here). None of these people would say they have proved, for example, a mathematical theorem false—that is, that they have falsified it—unless they could display a chain of valid deductions. But somehow they often confuse unlikely with false when speaking of empirical theories. In statistics, it is easy enough to see that this happens because of the history of the field, and its frequent use of terms like “accepting” or “rejecting” a hypothesis, i.e. “acting like” a model has been proved true or falsified. However, that kind of language is often used in physics, too, where theories which have not been falsified are supplanted wholly by newer theories.

For a concrete example, take a linear regression model with its usual assumptions (normality, etc.). No regression model can be falsified under these premises. The statistician, using prior knowledge, decides on a list of theories/models, here in the shape of regressors, the right-hand-side predictive variables; these form our premises. Of course, the prior knowledge also specifies with probability 1 the truth of the regression model; i.e. it is assumed true, just as the irascible green men were. That same prior knowledge also decides the form of these models (whether the regressors “interact”, whether they should be squared, etc.). To emphasize, it is the statistician who supplies the premises which limits the potentially infinite number of theories/models to a finite list. In this way, even frequentist statisticians act as Bayesians.

Through various mechanisms, some ad hoc, some theoretical, statisticians will winnow the list of regressors, thus eliminating several theories/models, in effect saying of the rejected variables, “I have falsified these models.” This, after all, is what p-values and hypothesis testing are meant to do: give the illusion (“acting like”) that models have been falsified. This mistake is not confined to frequentism; Bayesian statisticians mimic the same actions using parameter posterior distributions instead of p-values; the effect, of course, is the same.

Now, it may be that the falsely falsified models are unlikely to be true, but again “unlikely” is not “false.” Recall that we can only work with stated premises, that all logic and probability are conditional (on stated premises). It could thus be that we have not supplied the premise or premises necessary to identify the true model, and that all the models under consideration are in fact false (with respect to the un-supplied premise). We thus have two paths to over-certainty: incorrect falsification, and inadequate specification. This is explored in Part III.


  1. Okay.

    A little pedantic for a note to yourself.

    Short version: variable seletion in a multiple regression is a different beast than axiomatic math.

    I fear Act II put some in the audience to sleep. 🙂

  2. So how about we define “likely” with some care. Then we can say that a given model predicts some event as “likely”. Subsequent observations could lead us to conclude that the event is, in fact, “unlikely”, perhaps it fails to occur at all. If our definition was done carefully enough why haven’t we proved the model false? Or have I simply smuggled in a p-value?

  3. zzzzzzzzzzzzzzzz … hUh? Wha? Just kidding.

    I hate those cliff-hanger endings. I’m guessing it’s inadequate specification that’s the primary cause of over-certainty. In fact, I.m certain of it.

    FWIW: I think things highly unlikely rank alongside things falsified. Sure the dryer could be eating my socks but I think it far more likely the ogre living in my closet is the real culprit. If I use the latter model and ignore the former, why isn’t that the effective equivalent to falsifying the former?

  4. @ DAV –

    LOL on the sock-eating dryer.

    Believe it or not, here in my hometown of St. Louis, there was a professor emeritus (of physics, IIRC) who did a series of articles on the scientific method in the local newspaper. One of the pieces was about how the scientific method would be applied to finding out what happened to his lost socks. He went through this entire analysis (it was really a logical analysis, not a scientific analysis), and concluded that either the dryer or the washing machine must be eating his socks!

    I wrote a response asking him if he’d ever taken apart his dryer (easy) or his washer (a little harder) to test his theory, and even offered to bring my tools over to help him. No response.

    BTW, having taken apart a few dryers, they don’t eat socks. Neither do washers.

    My own theory as to the most likely location of most lost socks?

    The sock drawer.

    Parsimony rules again.

  5. I agree with Rich, but I will formulate the question more specifically.

    I we have a model that predicts that the proablilty of A is very low ( 50%) why is this model not falsified?

  6. My question got munged some how.

    If we have a model that predicts that the probablity of A is very low, less than 1%, but the observed probability of A is high, greater than 50%, why has this model not been falsified?

  7. Dr. B,

    Contrary to the protests of commenters above, I think your exposition is EXCELLENT! Cut-and-paste directly into your next book.

    For you all who think unlikely is the same as false, witness the investment banking crash/fiasco/recession. The models used by hedgers and derivative insurers concluded that extreme market perturbations were so unlikely as to be impossible. And now here we are.

    Tell your impossibility story to lottery winners. Their likelihood of winning gazillions was less than a gazillion to 1. Yet they sip champagne on their yachts while you still grub for a square meal.

    The unlikely can happen. The impossible cannot. That’s the key difference.

  8. @ the other Mike

    There is more to it than the difference between the impossible and the unlikely. Because models can be false yet still accurate enough to be very useful. Newtonian mechanics, for instance.

    Furthermore, if we’re only interested in “false” in the sense in which you use it, would we ever abandon ANY theory? There has to be a standard other than “impossibility” in order to conduct business, live your life, etc. Otherwise, we’d never go outside for fear of being run down by a tofu delivery truck, hit by a meteorite, or attacked by an elephant.

  9. True and False feel so . . . Classical. There was a time when we believed that an ultmate truth existed. What was measured is what was there. Modern science says that every measurement has an error — the act of measuring actually introduce a errors. We don’t have true and false — we have varying degrees of uncertanty. It is so unsettling.

    Mike D,

    We always knew that extreme movements were possible. 10/17/87 was a “20 standard deviation event” (whatever that means). Over the next 20 years financial leverage multiplied. We can’t say that 2008 was likely, but perhaps it was inevitable.

    Mike B,

    There is small black hole in the dryer that sucks the socks accros the universe. There is no trace of sock remnats in the dryer. I am working on a “wettener.” It will pull socks from wherever they go back to this corner of the unverse. I intend to sell unmatched socks to pay for my reseach expenses.

    On a related note, have you noticed that ball point pens have a critical mass? Get enough of them in one place, and pens stop dissapearing.

  10. OK, this is making more sense.

    In Part I, I think what was missing in the middle (“The theory or model is always assumed to be true: not just likely, but certain.”) was an explicit statement of the context: a given p-value calculation is done with respect to an associated theory or model, which for purposes of the calculation is always assumed to be true.

    We are all predisposed to over-certainty; inadequate specification (in all its forms) is only part of the problem.

    I have plenty of (unfortunate) real world experience with low-probability medical events that actually happen. From where I sit, our culture’s propensity to consider “low p” as “falsified” is quite unsettling… what happens when society decides “sorry, you’re unlikely to recover” and refuses treatment?

    This topic has significant real-world implications.

  11. Dear Doug and Other Mike,

    There is a branch of statistics know as “extreme value analysis” that attempts to “model” the distance reaches of distribution tails. That methodology is useful in planning for unlikely events such as floods.

    But “utility” is not the topic of discussion. Falsification is. Probabilistic models advance theories that cannot be falsified. Reliance on p-values or other probability measures, and/or operator specifications of a model can (and do) lead to erroneous statements of certainty.

    No doubt you can be 95% confident that a flood will not wash your town away. Then along comes the 1,000-year precipitation event, and down the river you go.

    See Aesop Fable: the Ant and the Grasshopper. The Precautionary Principle may have many flaws, but on the other hand the Boy Scout motto is Be Prepared. Don’t make the mistake of assuming your model reveals truth certain when the darn thing cannot be falsified.

  12. Mike D: “For you all who think unlikely is the same as false …” I don’t think anybody equates unlikely with false. The point was that a model that predicts that a certain event is unlikely is falsified if the event occurs frequently. But I begin to believe that a practical application of this would require a definition of “unlikely” that amounts to no more than saying, “the event occurs with a probability less than p” and it’s just a p-value really.

  13. There is indeed a vast amount of overconfidence in many papers. I get a kick out of straight lines drawn through data plots that resemble the aftermath of a tornado. Few things in life are linear. I’ve never seen any paper try to explain why the data plots are so cloudy given the ‘true’ (and usually linear) model. Many papers also seem to imply only two outcomes but rarely are real life things two-sided.

    Mike D,

    Pragmatism is the rule in life. It’s pragmatism that dictates preparedness for every possibility is a waste of time. There is really nothing wrong with taking the most probable route. I can ignore the ogre in my closet as his only interaction (so far) is his kleptomaniac behavior with my socks. Briggs is being pedantic again. Nothing wrong with that either. A welcome reminder of things far too often forgotten.

    Mike B.,

    Interesting hypothesis but my socks don’t have drawers. However the ogre in my closet does (fortunately).

  14. @Mike B: Actually, the socks are probably sticking to other clothing. I don’t lose as many socks as some people seemingly do, but when I do I eventually find them clinging inside a sweatshirt or some pants or some other large (and different-materialed) clothing that they were washed with.

Leave a Comment

Your email address will not be published. Required fields are marked *