# Teaching Journal: Day 5

Let’s make sure we grasped yesterday’s lesson. Emails and comments suggest we have not. These concepts are hardest for those who have only had classical training.

We want to know something like this: what is the probability the *boule* will land at least 1 meter from the *cochonette*? Notice that this is an observable, measurable, tangible question. A natural question, immediately understandable, not requiring a degree in statistics to comprehend. Of course, it needn’t be “1 meter”, it could be “2 meters” or 3 or any number which is of interest to us.

Now, as the rules of logic admit, I could just assume-for-the-sake-of-argument premises which specify a probability distribution for the distance the *boule* will be from the *cochonette*. Or I could assume the uncertainty in this distance is quantified by a normal distribution. Why not? Everybody uses these creatures, right or wrong. We may as well, too.

A normal distribution requires two parameters, m and s. They are *NOT*, I emphasize again, the “mean” and “standard deviation.” They are just two parameters which, when given, fully specify the normal and let us make calculations. The mean and standard deviations are instead functions of data. Everybody knows what the mean function looks like (add all the numbers, divide by the number of numbers). It isn’t of the slightest interest to us what the standard deviation function is. If you want to know, search for it.

Since I wanted to use a normal—and this is just a premise I assumed—I repeat and you should memorize that this is just a premise I assumed—since, I say, I want to use a normal, I must specify m and s. There is nothing in the world wrong with also assuming values for these parameters. After all (you just memorized this), I just assumed the normal and I am getting good as assuming.

With m and s in hand, I can calculate this:

(1) Pr (Distance > 1 meter | normal with m and s specified) = something

The “something” will depend on the m and s I choose. If I choose different m and s then the “something” will change. Obviously.

The question now becomes: what do statisticians do? They keep the *arbitrary* premise “The normal quantifies my uncertainty in the distance” but then add to it these *premises*, “I observed in game 1 the distance D_{1}. In game 2 I observed the distance D_{2} and so on.”

These “observational” premises are uninteresting by themselves. They are not useful, unless we add to them the premise, the quite arbitrary premise, “I use these observations to estimate m and s via the mean and standard deviation.” This is all we need to answer (1). That is, we needed a normal distribution with the m and s specified and any way we guess m and s give us values for m and s (right?). It matters naught to (1) how m and s are specified. But without the m and s specified, (1) *CANNOT* be calculated. Notice the capitals.

Here is what the frequentist will do. She will calculate the mean (and standard deviation; but ignore this) and then report the “95% confidence interval” for this guess. We saw yesterday the interpretation of this strange object. But never mind that today. The point is the frequentist statistician *ignores* equation (1) and instead answers a question *that was not asked.* She contents herself with saying “The mean of the distances was this number; the confidence interval is this and such.”

And this quirky behavior is accepted by the customer. He forgets he wanted to know (1) or assumes the statement he just received is a sort of approximate answer to (1). Very well.

Here is what the classical Bayesian will do. *The same thing as the frequentist.* In this case, at least. The calculations the Bayesian does and the calculation the frequentist does, though they begin at different starting points, end at the *same place.*

The classical Bayesian will also compute the mean and he will also say “The mean is my best guess for m.” And he will also compute *the exact same confidence interval* but he will instead call it a *credible interval.* And this in fact represents a modest improvement, even though the numbers of the interval are *identical*. It is an improvement because the classical Bayesian can then say things like this, “There is a 95% chance the true value of m lies inside the credible interval” whereas the frequentist can only repeat the curious tongue twister we noted yesterday.

The classical Bayesian, proud of this improvement and pleased the numbers match his frequentist sister’s, also forgets (1). Ah well, we can’t have everything.

There is one more small thing. The classical Bayesian also recognizes that his numbers will not always match his frequentist sister’s. If for instance the frequentist and classical Bayesian attack a “binomial” problem, the numbers won’t match. But when normal distributions are used, as they were here and as they are in ordinary linear regression, statisticians are one big happy family. And isn’t that all that matters?

No.

**Homework**

You should have been collecting your data by now. If not, start. We’ll only be doing ordinary linear regression according to the modern slogan: Regression Right Or Wrong!

Here’s something I snarfed from Bruce Schneier’s security blog:

The basic problem is the average haul from a bank job: for the three-year period, it was only Â£20,330.50 (~$31,613). And it gets worse, as the average robbery involved 1.6 thieves. So the authors conclude, “The return on an average bank robbery is, frankly, rubbish. It is not unimaginable wealth. It is a very modest Â£12,706.60 per person per raid.”“Given that the average UK wage for those in full-time employment is around Â£26,000, it will give him a modest life-style for no more than 6 months,” the authors note. If a robber keeps hitting banks at a rate sufficient to maintain that modest lifestyle, by a year and a half into their career, odds are better than not they’ll have been caught. “As a profitable occupation, bank robbery leaves a lot to be desired.”

Worse still, the success of a robbery was a bit like winning the lottery, as the standard deviation on the Â£20,330.50 was Â£53,510.20. That means some robbers did far better than average, but it also means that fully a third of robberies failed entirely.Would you care to comment on the final sentence?

(From here: http://www.schneier.com/blog/archives/2012/06/economic_analys.html)

in the example yesterday, you have a negative distance?

This concept of modeling distances in your petanque game with a normal distribution gets me. There are so many other distribuitons that I would choose before that. Perhaps in some future post you may explain how do choose the correct distribution to model the problem at hand.

Well, when you describe a normal distribution, those parameters are its mean and standard deviation. Nothing to it.

However, I understand what you are saying. Youâ€™d like to limit the use of â€œmeanâ€ as a

statistic,a number ora numbercalculated based on the sample, for a normal distribution!In the context of Bayesian analysis, e.g., normal-normal model, we assign a normal prior to the parameter with a specific mean value, and then have a normal posterior with a specific mean value that is a weighted average of two numbers. After all, thatâ€™s why itâ€™s calledstatisticalanalysis.Heck, you can even all it a â€œcentral parameter, â€œ just know that it means a different thing in quantile regression.

You see, you need to explain how Bayesian analysis is done, at least, step-by-step, if you canâ€™t go into technical details. What you say will make more sense.

Hmmm. â€œWhy not, everybody usesâ€¦â€ I thought only my teenager girls use this kind of reasoning.

So normal distribution can capture you judgment about what youâ€™ll see!

Since theoretically the distant in this example would have a lower bound (or upper bound, depending what a negative distance is computed), offhandedly, I might consider a Weibull distribution. Logical? don’t know, as not everyone uses it.

R: Bayesian inference on a normal mean with a normal prior

http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=Bolstad:normnp

The idea that any competent statistician would assume a normal distribution for the *distances* in a bocce game is ludicrous. What might be more believable is that they would assume the *coordinates* of the initial boule are normally distributed with means matching those of the target (or perhaps slightly off it if we allow the possibility of a systematic targeting error).

Actually, on finding the example in your book, I see that you are using “distance” from the boule to mean the (signed) amount of overshoot (and I see that you kind of implied that in the previous blog post by giving a negative number for one of the data points). And if the distances from the target are generally small compared to the length of the shot, then I might be more inclined to consider the implications of a normal model for the amount of overshoot. So I grant you that it might indeed be a common assumption – but not one made without due consideration (both of the appearance of the actual data and of the nature of the sources of randomness).