Read Part I, Part Paradox, Part II
Recapitulation: we have Pr(p|qm) where p is a proposition of interest, q the evidence we have compiled in the form of observations and so forth, and m is the “I believe”, i.e. the model, the whole-cloth thing which says, “I believe the uncertainty of p in the presence of q is characterized by this parameterized probability distribution.”
We began with p = “Tomorrow’s high temperature will be 72F” but learned that if our “I believe” was m = “A normal distribution with parameters a and b” then the probability of p is 0 no matter what a and b were and regardless of q. Bummer. So we switched to p = “Tomorrow’s high temperature will be greater than 72F”, which might be given a non-zero probability with this m.
But we haven’t yet found joy because we know nothing about a and b, which are absolutely required if we want to stick with this m. Bayesians make one guess, frequentists another. Both do so by appending m with extra “I believes” which justify the guesses.
Two crucial points. One, there is nothing universally or automatically wrong with adding “I believes”, for these might be true (with respect to other relevant evidence about q and p). People can guess correctly. Of course, they can also guess incorrectly. Two, regardless of the truth of the guesses (of both m and the parameters), assuming they and q are true produces the correct probability of p.
This second fact is difficult to keep in mind because critics of the probability of p will often attack the value of the probability itself, which (assuming no calculation errors, which are rare) are never wrong. What might be wrong are q, m, or the m’s appendages.
Example: q = “Just a certain proportion of Martians wear hats and George is a Martian”, m = “I believe this certain proportion is 0.6”, with p = “George wears a hat”, then Pr(p|qm) = 0.6, a correct and true deduction. Nothing in the world can make this conditional probability false (and all probability is conditional).
Q or m can be false. But if they are it has to be with respect to some other evidence, for truth, falsity, and probability are all conditional (even the most humble truth is known at least with respect to our intuitions). We could assess m by counting all Martians who wear hats and divide by the number of Martians. This proportion will either be 0.6 or not. No sample of Martians which don’t include all can falsify or verify m, though. No observation can falsify the first clause of q, but direct measurement can tell us about the second. And then there’s the observation that there are no Martians!
Back to temperature. We’ve used the observations (part of q) to make guesses of a and b and used m to deduce the probability of p; say, 93%. Now this 93% applies to p, of course, but suppose upon observation that p was false—this is another condition, not q or m, but a new observation, such that Pr(p|obs) = 0; that is, the temperature was less than 72F. The 93% was still true, but it doesn’t feel “close”.
It’s at this point we must “invert” the problem and ask what the observation says about q or m, and not about p (questions about p are trivially answered; either Pr(p|obs) = 1 or Pr(p|obs) = 0, assuming the observation is measured without error). That is, we want to know things like how true q or m are with respect to the observation. Three things are possible.
First, m (which includes the guesses of its parameters) might be true but q false or unlikely. For instance, suppose q contained measurements about temperatures in July for Tucson but we applied q and m for January in Detroit. A silly error no one would make? The opposite: misapplying the circumstance is the most common error, and also the most unrecognized. This is because q nearly always contains tacit premises, evidence which is in the investigators’ minds but which isn’t written in the formal statistical modeling phase, and these absences cause people to forget it’s there. This is the danger of shorthand notation.
Remember: the analysis is valid if m is true and for the kind of situations that match q. Psychologists, sociologists, educationists and the like abuse their models by forming q’s from (usually) nearby college students and then implying that q actually represents all human beings in all cultures in all times. This is why it is mandatory to fully specify q, a step which is rarely or never done. Some researchers take stabs at it, but unless their objects of study are controllable, in-the-small physical events (which don’t have volatile personalities), then they usually do a poor job of it.
Second, q might be true but m (or its parameters) might be false or unlikely. This is an easy situation because there are simple ways of verifying models, as long as the model is used regularly enough to build up evidence about its performance. Terms like “proper scores”, “calibration”, and the like come to play. Think of it as plotting instances of p against its predictions: if these line up, m is more likely true; if not, then not.
Funny thing about this is because most of statistical practice is psychotically fascinated about the parameters of m, this verification step is rarely done (I only ever see it in physics models). Researchers are confident that by telling you their guesses of parameters that they have told you everything you need to know, a practice which implies that m and q are—of course!—true.
Third, both m and q are false. These are difficult situations because one hardly knows where to begin. We’ll know something has gone kerplooey because the probabilities of p won’t match up to the real true states of p, but we won’t know why. We might automatically blame m, the usual step, and try to “fix” it, but since q is broken too, these efforts will largely be in vain.
Last word: all of what was said applies to all those probability models you—you—use daily, and not just in those that are written up formally in some obscure journal. Every time you make any judgment of uncertainty, you are applying “probability models” with some q in mind. You just weren’t aware of it.
Yes, statistics uses probabilities of the form Pr(q|p), just like English uses 26 letters and Chinese uses radicals. The critical question is how to establish the value Pr(q|p) with data available. That is, HOW do you find the probability values of 0.6 and 0.93 based on your evidence? Is it true deduction? It’s definitely not as straightforward as one sees in die or coin toss examples.
What is statistics? As stated Section 1.1 in almost every intro stat course, it is the science of collecting, organizing, analyzing, presenting and interpreting data. The tools include but not limited to the basic descriptive statistics that simply describes the sample data, e.g., graphics (no probability involved but can be help in postulating a probability model) demonstrated here and frequency distributions. They can be complicated inference statistics involving probabilistic modeling, e.g., this paper explains how a model is postulated for a particular data set. I don’t think Dave has done it by guessing.
This reminds me of the story about the scientist and the statistician. The statistician thinks it is the scientist who knows the data to be normally distributed. The scientist thinks it is the statistician!
Where am I going wrong?
Isn’t there a 4th possibility? That q and m are both true and the falsity of p is part of the 7%?
Leo,
If q and m are true, they must be true with respect to something. That something, whatever it is, may have nothing to do with p, hence the probability assigned to p may still be screwy.
JH,
Exactly right. The textbooks do say those kinds of things. Which is exactly why they start so badly: the misdirection is there from the beginning. Good point.
Oh, no, Mr. Briggs, I don’t think the textbooks start badly and misdirect at all. Statistics is indeed the science of collecting, organizing, analyzing, presenting and interpreting data. After all, you claim the following in this post.
As I say, the best analysis often is no analysis at all: simple counts, tables, and pictures give a good feel of the situation and are less likely to lead to over-certainty.
So, doesn’t then make sense to start with simple data collection methods and data organization via pictures and simple counts and tables in an introductory statistics course?
BTW, I think Mayo, a philosopher of probability and statistics, is correct in saying that
You can prove me wrong by showing me how you derived the probability 0.8 of a Romney win based on whatever data you had!!!
The reason why fields like sociology don’t express how their study population differs from the general population or even (* shudder *) testing their models because they have no accountability. If they were developing models for say the stock market and stood to lose money things would likely change.
Many years ago, when I took an entry cognitive psychology course, the one thing that struck me as odd was the apparent lack of interest in what was being done and how it related to the field of study. Instead, the entire focus seemed to be on who and when. More of a social club than a serious endeavor.
Many of the papers you rail against come from similar fields which have no real interest in genuine research. Until people in these fields have their feet held to the fire — an unlikely occurrence — things won’t change.
Link to part 1 not working.
Gary,
Thanks. It’s fixed.