William M. Briggs

Statistician to the Stars!

Author: Briggs (page 152 of 416)

Teaching Journal: Day 7

The joke is old and hoary and so well known that I risk the reader’s ire for repeating it. But it contains a damning truth.

Most academic statistical studies are like a drunk searching for his keys under a streetlight. He looks there not because that is where he lost his keys, but because that is where the light is.

To prove this comes these four quotations from Jacqueline Stevens, professor of political science at Northwestern University (original source):

In 2011 Lars-Erik Cederman, Nils B. Weidmann and Kristian Skrede Gleditsch wrote in the American Political Science Review that “rejecting ‘messy’ factors, like grievances and inequalities,” which are hard to quantify, “may lead to more elegant models that can be more easily tested…”

Professor Tetlock’s main finding? Chimps randomly throwing darts at the possible outcomes would have done almost as well as the experts…

Research aimed at political prediction is doomed to fail. At least if the idea is to predict more accurately than a dart-throwing chimp…

I look forward to seeing what happens to my discipline and politics more generally once we stop mistaking probability studies and statistical significance for knowledge.

If our only evidence is that “Some countries which face economic injustice go to war and Country A is a country which faces economic injustice” then given this the probability that “Country A goes to war” is some number between 0 and 1. And not only is this the best we can do, but it is all we can do. It becomes worse when we realize the vagueness of the term “economic injustice.”

I mean, if we cannot even agree on the implicit (there, but hidden) premise “Economic injustice is unambiguously defined as this and such” we might not even be sure that Country A actually suffers economic injustice.

But supposing we really want to search for the answer to the probability that “Country A goes to war”, what we should not do is to substitute quantitative proxies just to get some equations to spit out numbers. This is no different than a drunk searching under the streetlight.

The mistake is in thinking that not only that all probabilities are quantifiable (which they are not), but that all probabilities should be quantified, which leads to false certainty. And bad predictions.

Incidentally, Stevens also said, “Many of today‚Äôs peer-reviewed studies offer trivial confirmations of the obvious and policy documents filled with egregious, dangerous errors.”

Modeling, which we begin today in a formal sense, is no different than what we have been doing up until now: identifying propositions which we want to quantify the uncertainty of, then identifying premises which are probative of this “conclusion.” As the cautionary tale by Stevens indicates, we must not seek quantification just for the sake of quantification. That is the fundamental error.

A secondary error we saw developed at the end of last week: substituting knowledge about parameters of probability models as knowledge of the “conclusions.” This error is doubled when we realize that the probability models should often not be quantified in the first place. We end up with twice the overconfidence.

Now, if our model and data are that “Most Martians wear hats and George is a Martian” the probability of “George wears a hat” is greater than 1/2 but less than 1. That is the best we can do. And even that relies on the implicit assumption about the meaning of the English word “Most” (of course, there are other implicit assumptions, including definitions of the other words and knowledge of the rules of logic).

This ambiguity—the answer is a very wide interval—is intolerable to many, which is why probability has come to seem subjective to some and why others will quite arbitrarily insert and quantifiable probability model in place of “Most…”

It’s true that both these groups are free to add to the premises such that probabilities of the conclusions do become hard-and-fast numbers. We are all free to add any premises we like. But this makes the models worse in the sense that they match reality at a rate far less than the more parsimonious premises. That, however, is a topic for another day.

Homework

Read about all this. More is to come. In another hurry today. Get your data in hand by end of the day. Look for typos.

Teaching Journal: Day 6

(I’m assuming you have been reading previous posts. If not, do so.)

We still want this:

     (1) Pr (Distance > 1 meter | normal with m and s specified) = something

Actually, we don’t; not really. We want somebody to tell us (1) or something like it. The customer doesn’t really care that it was a normal distribution that was used. What we really want are the exact list of premises which all us to say

     (2) Pr (Distance > 1 meter | oracular premises) = 0 or 1

or, that is, we want the oracular premises which tell us the precise distance the boule will be from the cochonette. We want this:

     (2′) Pr (Distance = x meters | oracular premises) = 1

where the x is filled in. But oracular premises don’t exist for most of life. We have to suffice ourselves with something less. This is why we can live with the premise that our uncertainty in the distance is quantified by a normal (or some other) distribution.

We can of course say, “It isn’t really a normal distribution” but this is a conclusion from probability argument, and as we recall all probability propositions are conditional on premises. What are the premises which tell us “It isn’t really a normal distribution” is true? Well, these are easy: we have them (look in the book; Chapter 4). Call this list NN (for “not normal”). That is, Given NN, it is true that “It isn’t really a normal distribution.”

But we do not list NN in (1), (2), or (2′). If we did, we could not compute any numbers. The premises would be self-negating. Just as we do not add the premise “There are no Martians” to the argument “All Martians wear hats and George is a Martian.” Well, we could add it of course. It is up to us, as adding any premise to a list in an argument is always up to us. But the point is this: Given just the original “All Martians…” the conclusion “George wears a hat” is deduced (and is probability 1). And given just the “We use a normal with a specified m and s” the probability the “Distance > 1 meter” is deduced (and is some number).

Incidentally, both the “All Martians…” and the “We use a normal…” are therefore models. So we can see that the word “model” is just another way to say “list of premises.”

When last we left our customer, he had just met a frequentist and a classical Bayesian to which he had put (1). Both the frequentist and the Bayesian declined to answer (1). Instead, the pair starting going on about the value of m (and maybe s, too) by discussing “confidence” and “credible” intervals. None of which are the least interest to the customer, who still wants to know (1). Or questions like (1), questions that have to do with actual distances of actual balls.

The frequentist declines to help, but if pressed might utter something about a “null” hypothesis that “m isn’t 0.” We’ll figure that out later. The classical Bayesian, if he can be jarred awake, can help. What he can do is to say, “Given the data and that I used a normal distribution, and given the assumptions which provides me the same numerical answers as the frequentist, I can say that I don’t know the precise value of the pair—the pair, I say—of (m,s), I can take my uncertainty of them into account to answer (1).”

What this now-modern Bayesian does is to say (m,s) = (m-value 1, s-value 1) with some probability, that (m,s) = (m-value 2, m-value 2) with some probability, and so on for each possible value that (m,s) can take. He knows these from the credible intervals he just calculated. Now for each of these values, he plugs in the guess of (m,s) and calculates (1). Then he takes all the possible values of (1) and weights them by the probability (m,s) take each of these values. In the end he produces

     (3) Pr (Distance > 1 meter | normal and past data) = the answer.

There is no more talk of m and s, which are of no interest to anybody, most specifically the customer. There is only the answer to the question the customer wanted. Notice that this answer is still conditional on the “model”, the normal distribution. It is also conditional on the past data, which is no surprise.

But this means that if originally assumed the premise, “Our uncertainty in the distance is quantified by a gamma distribution” the answer to (3) will be different. Just as it would be different if we began with a Weibull (say) or any other mathematical probability distribution.

Which probability distribution is the “right” one? Well, that is a conclusion to a probability argument. Which premises will we supply to ascertain the probability that that normal, or gamma, or whatever, is the “right” one? That again is up to us. We’ll talk more about this in detail at another time. But for now first suppose we have the evidence/premises, “I have three probability models, normal, gamma, and Weibull. Just one of these is the right one to quantify uncertainty in distance.” Given just this information, the probability that any is right is 1/3.

We could then take this information and compute a (3) for each model, then weight the three answers (the three numerical answers to (3)) to produce this

     (4) Pr (Distance > 1 meter | assumptions about distributions and past data) = better answer.

Notice that there is no talk about which distributions make up (4). They disappeared just as the m and s disappeared when we went from (1) to (3).

The point: every statistical problem the modern Bayesian does is just like this. He attempts to answer the actual questions real customers ask him.

Homework

Check for typos.

Wine tour today.

Also, have your spreadsheets ready for tomorrow.

Another Try With A New Look

Unless there is a general revolt, this is it. Tweaks can of course be made—fonts darkened or lightened, background colors shaded, some widgets shifted. But this is it.

One of the big reasons I had to switch is because the old format was difficult to use on phones, tablets, and the like. This one looks fantastic on my HTC, and from what I can tell, soars on iPads. It is also swell on screen. And all is automatic. I mean, there shouldn’t have to be any “pinching” or “tapping” to have the words show properly,

There certainly isn’t anything fancy about this theme, but then we don’t really do fancy. Focus is still on the words and the occasional graphic.

I want to put a guide at the bottom of the comments to show the allowable tags. The blockquote is annoyingly always italicized. I don’t like while in a post the arrows to previous and more recent posts. The images on the right bar with rounded edges have the wrong color for their edges. Things like that. They’ll get fixed.

This would have all been done earlier, but the class is taking all my time.

Speaking of the class, has anybody collected any data?

Teaching Journal: Day 5

Let’s make sure we grasped yesterday’s lesson. Emails and comments suggest we have not. These concepts are hardest for those who have only had classical training.

We want to know something like this: what is the probability the boule will land at least 1 meter from the cochonette? Notice that this is an observable, measurable, tangible question. A natural question, immediately understandable, not requiring a degree in statistics to comprehend. Of course, it needn’t be “1 meter”, it could be “2 meters” or 3 or any number which is of interest to us.

Now, as the rules of logic admit, I could just assume-for-the-sake-of-argument premises which specify a probability distribution for the distance the boule will be from the cochonette. Or I could assume the uncertainty in this distance is quantified by a normal distribution. Why not? Everybody uses these creatures, right or wrong. We may as well, too.

A normal distribution requires two parameters, m and s. They are NOT, I emphasize again, the “mean” and “standard deviation.” They are just two parameters which, when given, fully specify the normal and let us make calculations. The mean and standard deviations are instead functions of data. Everybody knows what the mean function looks like (add all the numbers, divide by the number of numbers). It isn’t of the slightest interest to us what the standard deviation function is. If you want to know, search for it.

Since I wanted to use a normal—and this is just a premise I assumed—I repeat and you should memorize that this is just a premise I assumed—since, I say, I want to use a normal, I must specify m and s. There is nothing in the world wrong with also assuming values for these parameters. After all (you just memorized this), I just assumed the normal and I am getting good as assuming.

With m and s in hand, I can calculate this:

     (1) Pr (Distance > 1 meter | normal with m and s specified) = something

The “something” will depend on the m and s I choose. If I choose different m and s then the “something” will change. Obviously.

The question now becomes: what do statisticians do? They keep the arbitrary premise “The normal quantifies my uncertainty in the distance” but then add to it these premises, “I observed in game 1 the distance D1. In game 2 I observed the distance D2 and so on.”

These “observational” premises are uninteresting by themselves. They are not useful, unless we add to them the premise, the quite arbitrary premise, “I use these observations to estimate m and s via the mean and standard deviation.” This is all we need to answer (1). That is, we needed a normal distribution with the m and s specified and any way we guess m and s give us values for m and s (right?). It matters naught to (1) how m and s are specified. But without the m and s specified, (1) CANNOT be calculated. Notice the capitals.

Here is what the frequentist will do. She will calculate the mean (and standard deviation; but ignore this) and then report the “95% confidence interval” for this guess. We saw yesterday the interpretation of this strange object. But never mind that today. The point is the frequentist statistician ignores equation (1) and instead answers a question that was not asked. She contents herself with saying “The mean of the distances was this number; the confidence interval is this and such.”

And this quirky behavior is accepted by the customer. He forgets he wanted to know (1) or assumes the statement he just received is a sort of approximate answer to (1). Very well.

Here is what the classical Bayesian will do. The same thing as the frequentist. In this case, at least. The calculations the Bayesian does and the calculation the frequentist does, though they begin at different starting points, end at the same place.

The classical Bayesian will also compute the mean and he will also say “The mean is my best guess for m.” And he will also compute the exact same confidence interval but he will instead call it a credible interval. And this in fact represents a modest improvement, even though the numbers of the interval are identical. It is an improvement because the classical Bayesian can then say things like this, “There is a 95% chance the true value of m lies inside the credible interval” whereas the frequentist can only repeat the curious tongue twister we noted yesterday.

The classical Bayesian, proud of this improvement and pleased the numbers match his frequentist sister’s, also forgets (1). Ah well, we can’t have everything.

There is one more small thing. The classical Bayesian also recognizes that his numbers will not always match his frequentist sister’s. If for instance the frequentist and classical Bayesian attack a “binomial” problem, the numbers won’t match. But when normal distributions are used, as they were here and as they are in ordinary linear regression, statisticians are one big happy family. And isn’t that all that matters?

No.

Homework

You should have been collecting your data by now. If not, start. We’ll only be doing ordinary linear regression according to the modern slogan: Regression Right Or Wrong!

Teaching Journal: Day 4

Today is the quietest day, a time when all is still, a moment when nary a voice is raised and, quite suddenly, appointments are remembered, people have to be seen, the room empties. Because this is the day I introduce the classical confidence interval, a creation so curious that I have yet to have a frequentist stick around to defend it.

Up until now we have specified the evidence, or premises, we used (“All Martians wear hats…”) and this evidence has let us deduce the probabilities of the conclusions (which we have also specified, and always will, and always must; e.g. “George wears a hat”).

But sometimes we are not able to use the premises (data, evidence) in a direct way. We still follows the rules and dictates of logic, of course, but sometimes the evidence is not as clear as it was when we learned that “Most Martians wear hats.”

The game of petanque is played by drawing a small circle into which one steps, keeping both feet firmly planted. A small wooden ball called a cochonette is tossed 6 to 10 meters downstream. Then opposing teams take turns throwing manly steel balls, or boules, towards the cochonette trying to get as close as possible to it. It is not unlike the Italian game of bocce, which uses meek wooden balls.

Now I am interested in the distance the boule will be from the cochonette. I do not know, before I throw, what this distance will be. I therefore want to use probability to quantify my uncertainty in this distance. I needn’t do this in any formal way. I can, as all people do, use my experience in playing and make rough guesses. “It’s pretty likely, given all the games I have seen, the boule will be within 1 meter of the cochonette.” Notice the clause “given all the games I have seen”, a clause which must always appear in any judgment of certainty or uncertainty, as we have already seen.

But I can do this more formally and use a store-bought probability distribution to quantify my uncertainty. How about the normal? Well, why not. Everybody else uses it, despite its many manifest flaws. So we’ll use it too. That I’m using it and accepting it as a true representation of my uncertainty is just another premise which I list. Since we always must list such premises, there is nothing wrong so far.

The normal distribution requires two parameters, two numbers which must be plugged in else we cannot do any calculations. These are the “m = central parameter” and the “s = spread parameter.” Sometimes these are mistakenly called the “mean” and “standard deviation.” These latter two objects are not parameters, but are functions of other numbers. For example, everybody knows how to calculate a numerical mean; that is just a function of numbers.

Now I can add to my list of premises values for m and s. Why not? I already, quite arbitrarily, added the normal distribution to the list. Might as well just plug in values for m and s, too. That is certainly legitimate. Or you can act like a classical statistician and go out and “collect data.”

This would be in the form of actual measurements of actual distances. Suppose I collect three such measurements: 4cm, -15cm, 1cm. This list of measurement is just another premise, added to the list. A frequentist statistician would say to himself, “Well, why don’t I use the mean of these numbers as my guess for m?” And of course he may do this. This becomes another premise. He will then say, “As long as I’m at it, why don’t I use the standard deviation of these numbers as my guess for s?” Yet another premise. And why, I insist, not.

We at least see how the mistake arises from calling the parameters by the names of their guesses. Understandable. Anyway, once we have these guesses (and any will do) we can plug them into our normal distribution and calculate probabilities. Well, only some probabilities. The normal always—as in always—gives 0 probabilities for actual observable (singular) events. But skip that. We have our guesses and we can calculate.

The frequntist statistician then begins to have pangs of (let us say) conscience. He doubts whether m really does equal -3.3cm (as it does here) and whether s really does equal 10.2cm (as it does here). After all, three data points isn’t very many. Collecting more data would probably (given his experience) change these guesses. But he hasn’t collected more data: he just has these three. So he derives a statement of the “uncertainty” he has in the guesses as estimates of the real m and s. He calls this statement a “95% confidence interval.” That 95% has been dictated by God Himself. It cannot be questioned.

Now the confidence interval is just another function of the data, the form of which is utterly uninteresting. In this example, it gives us (-10cm to 3.3cm). What you must never say, what is forbidden by frequentist theory, is to say anything like this, “There is a 95% chance (or so) that the true value of m lies in this confidence interval.” No, no, no. This is disallowed. It is anathema. The reason for this proscription has to do with the frequentist definition of probability, which always involves limits.

The real definition of the CI is this: if I were to repeat this experiment (where I measured three numbers) an infinite number of times, and for each repetition I calculated a guess for m and a confidence interval for this guess, and then I kept track of all these confidence intervals (all of them), then 95% of them (after I got to infinity) would “cover”, or contain, the real value of m. Stop short of infinity, then I can say nothing.

The only thing I am allowed to say about the confidence interval I actually do have (that -10cm to 3.3cm) is this: “Either the real value of m is in this interval or it isn’t.” That, dear reader, is known as a tautology. It is always true. It is true even (in this case) for the interval (100 million cm, 10 billion cm). It is true for any interval.

The interval we have then, at least according to strict frequentist theory, has no meaning. It cannot be used to say anything about the uncertainty for the real m we have in front of us. Any move in this direction is verboten. Including finite experiments to measure the “width” of these intervals (let he who readth understand).

Still, people do make these moves. They cannot help but say something like, “There is (about) a 95% chance that m lies in the interval.” My dear ones, these are all Bayesian interpretation. This is why I often say that everybody is a Bayesian, even frequentists.

And of course they must be.

Homework

Typo patrol away!

Find, in real-life, instances where the normal has been used with confidence intervals. Just you see if whoever used the interval interpreed it wrong.

Older posts Newer posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑