What Is A True Model? What Makes A Good One?: Part III

We have had two cases so far: arbitrary models for counterfactual Martians (Part I) and a deduced model for an urn holding dichotomous objects (Part II). The logic was identical for both, as it will be for all models. Now for an example closer to home for many users of statistics: a normal model.

Suppose we are interested in what will be the grade point average of young Susy after her first year of college. Our proposition of interest might be C = “Susy’s GPA will be G” where we can substitute for G any value between 0 and 4.

Strike that: not any value. Not all values between 0 and 4 are possible. Susy will receive a finite number of definite grades. Her grade point therefore, just as any freshman’s, can take only one of a finite number of values. Just what these are depends on the point system used at the college and the maximum and minimum number of classes that could be taken. For our purposes it’s only important to show that the values Susy’s GPA can take belong to a finite set which can be delineated (i.e. known or deduced).

However, like most other analysts we’ll ignore this information and substitute something we know is false (given the evidence we just described) and say that Suzy’s grades can take any value between 0 and 4. Given no other evidence, we are in a bind because the number of values between 0 and 4 is (uncountably!) infinite. The deduced probability of Susy taking any value—given the model that any value is possible—is 0. Screwy, ain’t it? That’s math for you.

And that is the math. Which shows you that the approximations and assumptions we use to make the math work out creates absurdities when applying that math to real-world problems. The number of possible values Susy’s GPA can take might be large, even very large, but they are still finite. If we had delineated, given the grade point system of the school, the exact set we could have used the statistical syllogism to say that, given this information, the probability of C was 1/N, where N was the cardinality (size) of the set.

To explain this man-made paradox further: suppose we do have the actual set which has N elements, e1, e2, …, eN. Given the information we have (and only this information), we deduce that the probability of Susy’s GPA equaling any ei is 1/N. But given we assume Susy’s GPA can take any value, the probability of any ei = 0. We conclude that the probability (given the information, etc.) of e1 = 0, of e2 = 0, …, of eN = 0. Summed, the probability of seeing that which we must see (relative to the true information) is 0. We are saying that Susy cannot have a GPA. Once we see the actual GPA printed on her report card, we have to admit that it isn’t possible that we’re seeing what we’re seeing, for the probability (given any number is possible) that the number we see is really there is 0.

Again, that’s math for you. The good news is that once you’ve assimilated this, you’re ready for normal distributions. But before we get to them, let me tell you what should be done. We first deduce the actual set of possible GPAs. If we like, we can introduce the round-off rule, which then forms part of our model. The round-off rule states that we round all GPAs to the nearest hundredth. We do this because we know any decision we make on GPA is indifferent to numbers different by less than a hundredth. This was our choice because of the extra-logical, extra-statistical decisions we will make. We could instead round to the nearest tenth, or thousandth, or we needn’t round at all; whatever we like.

The statistical syllogism says, in absence of any other information (we have no other), the probability of Susy’s GPA equaling ei = 1/N for any i in 1 to N. We are done! This is the final answer.

Unless we want to use the information about previous college freshman’s GPAs. We make the assumption that a sample of freshman is “like” Susy, or that Susy is “like” them. That is, we assume the information in their GPA is probative to Susy’s GPA. Is it? Maybe, maybe not. It is an assumption we make, part of the model we assume is true.

Perhaps the sample of GPAs are from a majority of “Business” majors and Susy is training to be a biologist. Is the sample probative? We just assume that it is. Later we can learn whether this assumption was useful. But for now, it’s assumption all the way.

So let’s assume. We have the old GPAs, we have the delineated set of possible GPAs, we have the result of the statistical syllogism; we have, then, all we need. The math gives us a discrete probability distribution which, after we plug the sample in, gives us the probability that Susy’s GPA takes any of the ei. These probabilities are deduced (as all probabilities are). They are true given our assumptions. The only iffy assumption we made is that the sample we used is probative.

No parameters are needed, so no “priors” are needed nor are “posteriors” given. There is no test or p-value. We only have probabilities for each ei, and no other. It’s probability all the way, from simple assumptions to deduced answers. We got just what we wanted: the probability of Susy’s actual GPA (possibly rounded) taking any value (in the set of ei). Isn’t probability as logic great?

On to normal distributions!

Update How this all relates to climate models is coming!

9 Comments

  1. Ken

    RE: “How this all relates to climate models is coming!”

    COMMENT: The National Review On-Line has a nice summary of what the “Climategate” e-mails are showing … including a synopsis of just how few “climate scientists” are actually involved–a rather small, tiny even, group that has a documented pattern of selective use of data, etc.: http://www.nationalreview.com/articles/284137/scientists-behaving-badly-jim-lacey?pg=1

    While one is naturally duty-bound to ensure the models one develops or consults are accurate, or at least the uncertainties & shortfalls are appreciated, having some insight into the modelers themselves provides clues to model flaws that might not otherwise be apparent.

  2. JH

    … the probability of Susy’s GPA equaling ei = 1/N for any i in 1 to N.

    What is your N? Mine is 4,000,001 because I write my program to display 6 digits after the decimal point. Wow, our estimates of the above probability are quite different!

    To avoid the round-off problem, in this case, what people should do is to estimate the probability of “her GPA is less than or equal to a number” instead of the probability of “her GPA equaling to a number.”

    So, nonparametric modeling using an empirical distribution! No parameters, no priors.

  3. Marty

    Douglas Adams used the same argument in “Hitchhikers Guide to the Galaxy”, to prove that the population of the universe is zero (approximately).

  4. Briggs

    JH,

    Your comment about writing a “program to display 6 digits” does not appear to be relevant. I offered a proof that no matter what grading system you or any professor develop, it will necessarily be finite and discrete. After you tell us your grading scheme, whatever finite/discrete form it takes, then we cannot disagree. There is no disagreement possible if we both accept as evidence that your grades can only be one of a certain set.

  5. Briggs

    Gary,

    This comment grew so long that I might make it its own post.

    If Brown does not give GPAs, then obviously this example does not apply to Brown. If they only print out “satisfactory” or “no-credit” then we are in the urn model, actually.

    The only point of this example is to demonstrate discreteness and finiteness and how it might be modeled.

    Susy will have a systolic blood pressure, even if no GPA. Nowadays this will be measured by (say) an electronic sphygmomanometer. Assume it is error free. No matter the cleverness of the electronic engineers, this device will only give us measurements that are discrete and belong to a finite set of possible values. It will have a minimum (probably 0), a maximum, and a number of values in between. These values are necessarily finite and discrete. In particular, it doesn’t matter how many “significant digits” the device allows: the number of these digits will be finite.

    It is no objection to say that, yes, they may be finite and discrete, but their number is so large that we may as well treat the measurements as belonging to the continuum. This objection, you will note, admits all I wanted to prove: that the measurements are finite and discrete. It is an entirely separate question of how to handle our uncertainty in these values.

    And then I have outlined a way in which we can handle our uncertainty. If it seems to you that the cardinality of the set of possible SBP is too large, that is nothing. The method I have developed works regardless of size.

    We must not be confused about what probability is and what we want to do with it. Suppose the sphygmomanometer spits out measurements to “6 significant digits” (after the decimal). 120.000001 mmHg, 120.000002 mmHg, etc.

    Given this knowledge, and the knowledge of the device limits (0 mmHg, some maximum mmHg), we then model the probability that C = “Susy’s SBP takes the value 120.010184 mmHg” or whatever C is of interest to us. Remember: we make up the C. It doesn’t matter that this C appears absurd: if that is so it is only because we know that few would be interested in blood pressures to this level. To say that this C is absurd is certainly no objection to the probability model. It all works.

    When we create this C, we are saying, “Yes, dammit, to me there is a huge decisionable difference between 120.010183 mmHg and 120.010185 mmHg; that’s why I want this C.” To this person, the difference is important and that is that. The probability model still works.

    Of course, most of us would probability introduce the round-off rule as part of our model, which in this case rounds SBP to the nearest (say) 5 mmHg. That is, to anybody making a decision about SBP is indifferent two values separated at less than 5 mmHg. If the device maximum is about 300 mmHg (I haven’t seen any even this high), then with the round-off rule, we only have 60 or so discrete, finite values to model.

    In other words, all models should be tailored for the decisions which they will inform.

  6. Rich

    I barely read the barely readable paper. When I got to the word, “obviously” I knew I was doomed.

  7. JH

    Does “program to display 6 digits” represent the point I am trying to make in my comment to you? It’s to demonstrate that the discretization of a theoretically continuous variable has the rounding problem and to offer a solution.

    of course, I knew Susy lied when she told me that her math GPA is 3 2/3 because it’s a rational number and there are infinite of them though countable… and it’s impossible based on the probability defined in this post.

    The reason the probability that a continuous variable will equal to a specific value is DEFINED to be zero is just like the reason that 0! is defined to be one. And it doesn’t mean that a specific value can’t occur.

    Is my viewpoint only relevant if it disagrees with you?

Leave a Reply

Your email address will not be published. Required fields are marked *