William M. Briggs

Statistician to the Stars!

Page 152 of 415

Another Try With A New Look

Unless there is a general revolt, this is it. Tweaks can of course be made—fonts darkened or lightened, background colors shaded, some widgets shifted. But this is it.

One of the big reasons I had to switch is because the old format was difficult to use on phones, tablets, and the like. This one looks fantastic on my HTC, and from what I can tell, soars on iPads. It is also swell on screen. And all is automatic. I mean, there shouldn’t have to be any “pinching” or “tapping” to have the words show properly,

There certainly isn’t anything fancy about this theme, but then we don’t really do fancy. Focus is still on the words and the occasional graphic.

I want to put a guide at the bottom of the comments to show the allowable tags. The blockquote is annoyingly always italicized. I don’t like while in a post the arrows to previous and more recent posts. The images on the right bar with rounded edges have the wrong color for their edges. Things like that. They’ll get fixed.

This would have all been done earlier, but the class is taking all my time.

Speaking of the class, has anybody collected any data?

Teaching Journal: Day 5

Let’s make sure we grasped yesterday’s lesson. Emails and comments suggest we have not. These concepts are hardest for those who have only had classical training.

We want to know something like this: what is the probability the boule will land at least 1 meter from the cochonette? Notice that this is an observable, measurable, tangible question. A natural question, immediately understandable, not requiring a degree in statistics to comprehend. Of course, it needn’t be “1 meter”, it could be “2 meters” or 3 or any number which is of interest to us.

Now, as the rules of logic admit, I could just assume-for-the-sake-of-argument premises which specify a probability distribution for the distance the boule will be from the cochonette. Or I could assume the uncertainty in this distance is quantified by a normal distribution. Why not? Everybody uses these creatures, right or wrong. We may as well, too.

A normal distribution requires two parameters, m and s. They are NOT, I emphasize again, the “mean” and “standard deviation.” They are just two parameters which, when given, fully specify the normal and let us make calculations. The mean and standard deviations are instead functions of data. Everybody knows what the mean function looks like (add all the numbers, divide by the number of numbers). It isn’t of the slightest interest to us what the standard deviation function is. If you want to know, search for it.

Since I wanted to use a normal—and this is just a premise I assumed—I repeat and you should memorize that this is just a premise I assumed—since, I say, I want to use a normal, I must specify m and s. There is nothing in the world wrong with also assuming values for these parameters. After all (you just memorized this), I just assumed the normal and I am getting good as assuming.

With m and s in hand, I can calculate this:

     (1) Pr (Distance > 1 meter | normal with m and s specified) = something

The “something” will depend on the m and s I choose. If I choose different m and s then the “something” will change. Obviously.

The question now becomes: what do statisticians do? They keep the arbitrary premise “The normal quantifies my uncertainty in the distance” but then add to it these premises, “I observed in game 1 the distance D1. In game 2 I observed the distance D2 and so on.”

These “observational” premises are uninteresting by themselves. They are not useful, unless we add to them the premise, the quite arbitrary premise, “I use these observations to estimate m and s via the mean and standard deviation.” This is all we need to answer (1). That is, we needed a normal distribution with the m and s specified and any way we guess m and s give us values for m and s (right?). It matters naught to (1) how m and s are specified. But without the m and s specified, (1) CANNOT be calculated. Notice the capitals.

Here is what the frequentist will do. She will calculate the mean (and standard deviation; but ignore this) and then report the “95% confidence interval” for this guess. We saw yesterday the interpretation of this strange object. But never mind that today. The point is the frequentist statistician ignores equation (1) and instead answers a question that was not asked. She contents herself with saying “The mean of the distances was this number; the confidence interval is this and such.”

And this quirky behavior is accepted by the customer. He forgets he wanted to know (1) or assumes the statement he just received is a sort of approximate answer to (1). Very well.

Here is what the classical Bayesian will do. The same thing as the frequentist. In this case, at least. The calculations the Bayesian does and the calculation the frequentist does, though they begin at different starting points, end at the same place.

The classical Bayesian will also compute the mean and he will also say “The mean is my best guess for m.” And he will also compute the exact same confidence interval but he will instead call it a credible interval. And this in fact represents a modest improvement, even though the numbers of the interval are identical. It is an improvement because the classical Bayesian can then say things like this, “There is a 95% chance the true value of m lies inside the credible interval” whereas the frequentist can only repeat the curious tongue twister we noted yesterday.

The classical Bayesian, proud of this improvement and pleased the numbers match his frequentist sister’s, also forgets (1). Ah well, we can’t have everything.

There is one more small thing. The classical Bayesian also recognizes that his numbers will not always match his frequentist sister’s. If for instance the frequentist and classical Bayesian attack a “binomial” problem, the numbers won’t match. But when normal distributions are used, as they were here and as they are in ordinary linear regression, statisticians are one big happy family. And isn’t that all that matters?



You should have been collecting your data by now. If not, start. We’ll only be doing ordinary linear regression according to the modern slogan: Regression Right Or Wrong!

Teaching Journal: Day 4

Today is the quietest day, a time when all is still, a moment when nary a voice is raised and, quite suddenly, appointments are remembered, people have to be seen, the room empties. Because this is the day I introduce the classical confidence interval, a creation so curious that I have yet to have a frequentist stick around to defend it.

Up until now we have specified the evidence, or premises, we used (“All Martians wear hats…”) and this evidence has let us deduce the probabilities of the conclusions (which we have also specified, and always will, and always must; e.g. “George wears a hat”).

But sometimes we are not able to use the premises (data, evidence) in a direct way. We still follows the rules and dictates of logic, of course, but sometimes the evidence is not as clear as it was when we learned that “Most Martians wear hats.”

The game of petanque is played by drawing a small circle into which one steps, keeping both feet firmly planted. A small wooden ball called a cochonette is tossed 6 to 10 meters downstream. Then opposing teams take turns throwing manly steel balls, or boules, towards the cochonette trying to get as close as possible to it. It is not unlike the Italian game of bocce, which uses meek wooden balls.

Now I am interested in the distance the boule will be from the cochonette. I do not know, before I throw, what this distance will be. I therefore want to use probability to quantify my uncertainty in this distance. I needn’t do this in any formal way. I can, as all people do, use my experience in playing and make rough guesses. “It’s pretty likely, given all the games I have seen, the boule will be within 1 meter of the cochonette.” Notice the clause “given all the games I have seen”, a clause which must always appear in any judgment of certainty or uncertainty, as we have already seen.

But I can do this more formally and use a store-bought probability distribution to quantify my uncertainty. How about the normal? Well, why not. Everybody else uses it, despite its many manifest flaws. So we’ll use it too. That I’m using it and accepting it as a true representation of my uncertainty is just another premise which I list. Since we always must list such premises, there is nothing wrong so far.

The normal distribution requires two parameters, two numbers which must be plugged in else we cannot do any calculations. These are the “m = central parameter” and the “s = spread parameter.” Sometimes these are mistakenly called the “mean” and “standard deviation.” These latter two objects are not parameters, but are functions of other numbers. For example, everybody knows how to calculate a numerical mean; that is just a function of numbers.

Now I can add to my list of premises values for m and s. Why not? I already, quite arbitrarily, added the normal distribution to the list. Might as well just plug in values for m and s, too. That is certainly legitimate. Or you can act like a classical statistician and go out and “collect data.”

This would be in the form of actual measurements of actual distances. Suppose I collect three such measurements: 4cm, -15cm, 1cm. This list of measurement is just another premise, added to the list. A frequentist statistician would say to himself, “Well, why don’t I use the mean of these numbers as my guess for m?” And of course he may do this. This becomes another premise. He will then say, “As long as I’m at it, why don’t I use the standard deviation of these numbers as my guess for s?” Yet another premise. And why, I insist, not.

We at least see how the mistake arises from calling the parameters by the names of their guesses. Understandable. Anyway, once we have these guesses (and any will do) we can plug them into our normal distribution and calculate probabilities. Well, only some probabilities. The normal always—as in always—gives 0 probabilities for actual observable (singular) events. But skip that. We have our guesses and we can calculate.

The frequntist statistician then begins to have pangs of (let us say) conscience. He doubts whether m really does equal -3.3cm (as it does here) and whether s really does equal 10.2cm (as it does here). After all, three data points isn’t very many. Collecting more data would probably (given his experience) change these guesses. But he hasn’t collected more data: he just has these three. So he derives a statement of the “uncertainty” he has in the guesses as estimates of the real m and s. He calls this statement a “95% confidence interval.” That 95% has been dictated by God Himself. It cannot be questioned.

Now the confidence interval is just another function of the data, the form of which is utterly uninteresting. In this example, it gives us (-10cm to 3.3cm). What you must never say, what is forbidden by frequentist theory, is to say anything like this, “There is a 95% chance (or so) that the true value of m lies in this confidence interval.” No, no, no. This is disallowed. It is anathema. The reason for this proscription has to do with the frequentist definition of probability, which always involves limits.

The real definition of the CI is this: if I were to repeat this experiment (where I measured three numbers) an infinite number of times, and for each repetition I calculated a guess for m and a confidence interval for this guess, and then I kept track of all these confidence intervals (all of them), then 95% of them (after I got to infinity) would “cover”, or contain, the real value of m. Stop short of infinity, then I can say nothing.

The only thing I am allowed to say about the confidence interval I actually do have (that -10cm to 3.3cm) is this: “Either the real value of m is in this interval or it isn’t.” That, dear reader, is known as a tautology. It is always true. It is true even (in this case) for the interval (100 million cm, 10 billion cm). It is true for any interval.

The interval we have then, at least according to strict frequentist theory, has no meaning. It cannot be used to say anything about the uncertainty for the real m we have in front of us. Any move in this direction is verboten. Including finite experiments to measure the “width” of these intervals (let he who readth understand).

Still, people do make these moves. They cannot help but say something like, “There is (about) a 95% chance that m lies in the interval.” My dear ones, these are all Bayesian interpretation. This is why I often say that everybody is a Bayesian, even frequentists.

And of course they must be.


Typo patrol away!

Find, in real-life, instances where the normal has been used with confidence intervals. Just you see if whoever used the interval interpreed it wrong.

Teaching Journal: Day 3

In the real, physical class we learned to count yesterday. Elementary combinatorics, I mean. Figured out what “n!” and “n choose k” and the like meant and how that married with probability and produced a binomial distribution. Pure mechanics. (We begin Chapter 4 today.)

I trust, dear reader, if you don’t already know these things you can read over the class notes to learn, or can find one of hundreds of internet sites which have this sort of information. It is of some interest and we will later use the binomial for this and that. If my Latex renderer is working, you should be able to see this formula for the binomial (if not, even Wikipedia gets this one right):

     Binomial = {n\choose k} p^k (1-p)^{n-k}

Now the thing of interest for us is that we must have some evidence, or list of premises, or propositions taken “for the sake of argument”, that we assume are true and which state, E = “There is some X which can and must take one of two states, a success or a failure, and the chance X is a success is always p; plus, X can be a success anywhere from k = 0, 1, 2, …, n times.” Then, given this E, the formula above gives the probability that in n attempts we see k successes.

An example of an X can be X = “A side shows 6.” Now given the evidence (we have seen many times before) Ed = “There is a six-sided object to be tossed, just one side is labeled ’6′, and just one side can show” we know that

     Pr(X | Ed) = 1/6 = p.

Notice that if n = k = 1, then the binomial formula just reproduces p.

Here is where it gets tricky, and where mistakes are made. Notice that Ed is evidence we assume is true. Whether it really is true (with respect to any other external evidence) is immaterial, irrelevant. Also notice—and here is the juice; pay attention—Ed says nothing about a real, physical die. We are still in the realm of pure logic. And logic is just a study of the relation between propositions: it is silent on the nature of the propositions themselves.

So for instance, let Em = “Just one-sixth of all Martians wear a hat” and let Y = “The next Martian to pass by wears a hat.” Thus

     Pr(Y | Em) = 1/6 = p.

For the next n Martians that pass by, we could calculate the probability that k = 0, or k = 1, or … k = n of them wear a hat. Even though, of course, given our observational evidence that there are no Martians, no Martian will ever pass by.

(If you are sweating over this, remove “Just one-sixth of” from Em; Pr(Y|Em) = 1 and then we can still us the binomial to calculate the probability that the next k of n Martians wears a hat.)

Probability, like all logical statements, are measures of information, and information between propositions. The propositions do not have to represent real, physical objects. Pick up any book of introductory logic to convince yourself of this.

Where people go wrong in statistics in not starting with the reminder that probability is logical, a branch of logic. Thus they confuse Ed with saying something about real dice. They ask questions like, “How do we know the die isn’t weighted? How do we know how it’s tossed? How much spin is imparted? What kind of surface is the die tossed onto? What about the gravitational field into which the die is tossed? Is there a strong breeze?”

All of those (and many, many more) are excellent questions to ask about real dice, but all of them are absolutely irrelevant to Ed and to our absolute, deduced knowledge that Pr(X | Ed) = 1/6.

It is only later, after we learn the formal rules of probability, some basic mechanics, but more importantly after we have fully assimilated the interpretation of probability, do we invert things and ask questions like, “Given that we have seen so many real-life tosses of this real-life die, in this certain real-life situation, what is the probability we will see a 6 on the very next roll?”


Make sure you see the difference between that last question and the one above using the binomial using Ed or Em.

Read over Chapter 2 and be sure you understand the four most basic rules of probability (the mechanics stuff).

Correct any typos in this post.

Teaching Journal: Day 2

As might have been obvious from yesterday, the truth, falsity, or the somewhere-in-betweenness of any conclusion-hypothesis-proposition can only be assessed with reference to a list of assumptions, premises, data. That is, you cannot know the status of any proposition without some list of premises. Different premises lead to different statuses.

In particular, this means that you cannot ask, “What is the probability of X?”, where X is some proposition. For example, you cannot ask, X = “What is the probability that I roll a six on a die?” This probability does not exist. Similarly, you cannot ask, Y = “What is the probability that Socrates is mortal?” This probability also does not exist. There are no unconditional probabilities, no unconditional arguments of any kind.

If I assume that E = “All men are mortal and Socrates is a man” then I can claim that “It is certain, given E, that Y”, or that “If I assume it is true, regardless whether or not it is or that I can know it is, that all men are mortal and Socrates is a man, then the probability that Socrates is mortal is 1.” Or I can write:

    Pr( Y | E ) = 1.

But I cannot write:

    Pr( Y ) = something,

for that is forever unknown. There just is no such thing as an unconditional probability, just as there are no such things as unconditional logical arguments, just as there are no such things as unconditional mathematical theorems, and so on. If you find yourself disagreeing, have a go at creating the probability of some hypothesis that does not make reference to any assumptions/premises/data.

(It’s only true that in most textbooks probability is written as if it were unconditional. While this makes life for the author and for typesetters, it ends up producing confusion about the nature of probability.)

It’s important to understand that E is only that which we assume is true. It matters not one whit whether E—with respect to some other set of premises—really is true, or false, or somewhere in between. Logic concerns itself only with the connections between premises and conclusions. The premises and conclusions are something exterior, something given to us.

Another hoary example. Let Ed = “A six-sided object, just one side of which is labeled 6, will be tossed and only one side can show.” Then if X = “A 6 shows”,

    Pr( X | Ed ) = 1/6.

We have deduced—just as we do with all probabilities—the probability that X will be true. Notice that this says nothing about real dice in any real situation. This is just a logical argument, no different in nature from the premise “All Martians wear hats and George is a Martian” which lets us deduce that “George wears a hat.” This conclusion with respect to this evidence is true, it’s probability is 1; and this is so even though we know, with respect to observational evidence, that there are no Martians.

Now if we write X = “A Buick shows”, we can write

    0 < Pr( X | Ed ) < 1

We are stuck because our evidence says nothing about a Buick. There may be a Buick on one of the other five sides, there may not. The evidence is mostly mute on this subject. Except if we suppose there is an implicit call to the contingent nature of this object being tossed. If we assume that, then we can at least say the probability is not 0 and not 1, but it may be anywhere in between. But we can also make the argument that Ed should be interpreted more strictly. If it is the case, then the best we can do is this:

    Pr( X | Ed ) = unknown.

Probability cannot be relative frequency. For example, given “Half of all Martians wear hats and George is a Martian” which lets us deduce that the probability “George wears a hat” is 0.5. But there is no relative frequency of this “experiment.” This one counter-example is enough to show that the relative frequency interpretation of probability is false (it doesn’t show it has things backwards; for that, read the book, paying attention to the references).

Probability cannot be subjective in the following sense. If we accept that “All men are mortal and Socrates is a man” then the probability that “Socrates is mortal” is 1. Even if we don’t want it to be. And above the probability that “George wears a hat” cannot be anything but 0.5. Probability only appears to be subjective in some instances because we often are bad at listing the premises we hold when assessing probabilities. However, if we agree on the exact list of premises (and on the rules of logic) then we must agree on the probabilities deduced.

Probability cannot always be quantified. If we accept that “Some men are mortal and Socrates is a man” then the probability that “Socrates is mortal” is something between 0 and 1, but the exact number cannot be deduced.


Why doesn’t adding “the six-sided object is weighted” or “the six-sided object is fair” to Ed change the probability that Pr( X | Ed ) = 1/6?

List the exact premises you hold in your Pr (“Barack Obama will be re-elected” | your premises).

Change the evidence “All men are mortal and Socrates is a man” so that the probability of “Socrates is mortal” is bounded between two fixed numbers.

In a tearing hurry today. Did not have time to check for typos!

« Older posts Newer posts »

© 2014 William M. Briggs

Theme by Anders NorenUp ↑