# Teaching Journal: Day 6

(I’m assuming you have been reading previous posts. If not, do so.)

We still want this:

(1) Pr (Distance > 1 meter | normal with m and s specified) = something

Actually, we don’t; not really. We want somebody to tell us (1) or something like it. The customer doesn’t really care that it was a normal distribution that was used. What we really want are the *exact* list of premises which all us to say

(2) Pr (Distance > 1 meter | oracular premises) = 0 or 1

or, that is, we want the oracular premises which tell us the precise distance the *boule* will be from the *cochonette*. We want this:

(2′) Pr (Distance = x meters | oracular premises) = 1

where the x is filled in. But oracular premises don’t exist for most of life. We have to suffice ourselves with something less. This is why we can live with the premise that our uncertainty in the distance is quantified by a normal (or some other) distribution.

We can of course say, “It isn’t really a normal distribution” but this is a conclusion from probability argument, and as we recall all probability propositions are conditional on premises. What are the premises which tell us “It isn’t really a normal distribution” is true? Well, these are easy: we have them (look in the book; Chapter 4). Call this list NN (for “not normal”). That is, Given NN, it is true that “It isn’t really a normal distribution.”

But we do not list NN in (1), (2), or (2′). If we did, we could not compute any numbers. The premises would be self-negating. Just as we do not add the premise “There are no Martians” to the argument “All Martians wear hats and George is a Martian.” Well, we could add it of course. It is up to us, as adding any premise to a list in an argument is always up to us. But the point is this: Given just the original “All Martians…” the conclusion “George wears a hat” is *deduced* (and is probability 1). And given just the “We use a normal with a specified m and s” the probability the “Distance > 1 meter” is *deduced* (and is some number).

Incidentally, both the “All Martians…” and the “We use a normal…” are therefore *models*. So we can see that the word “model” is just another way to say “list of premises.”

When last we left our customer, he had just met a frequentist and a classical Bayesian to which he had put (1). Both the frequentist and the Bayesian declined to answer (1). Instead, the pair starting going on about the value of m (and maybe s, too) by discussing “confidence” and “credible” intervals. None of which are the least interest to the customer, who still wants to know (1). Or questions like (1), questions that have to do with actual distances of actual balls.

The frequentist declines to help, but if pressed might utter something about a “null” hypothesis that “m isn’t 0.” We’ll figure that out later. The classical Bayesian, if he can be jarred awake, *can* help. What he can do is to say, “Given the data and that I used a normal distribution, and given the assumptions which provides me the same numerical answers as the frequentist, I can say that I don’t know the precise value of the pair—the *pair*, I say—of (m,s), I can take my uncertainty of them into account to answer (1).”

What this now-modern Bayesian does is to say (m,s) = (m-value 1, s-value 1) with some probability, that (m,s) = (m-value 2, m-value 2) with some probability, and so on for each possible value that (m,s) can take. He knows these from the credible intervals he just calculated. Now for each of these values, he plugs in the guess of (m,s) and calculates (1). Then he takes all the possible values of (1) and weights them by the probability (m,s) take each of these values. In the end he produces

(3) Pr (Distance > 1 meter | normal and past data) = the answer.

There is no more talk of m and s, which are of no interest to anybody, most specifically the customer. There is only the answer to the question the customer wanted. Notice that this answer is still conditional on the “model”, the normal distribution. It is also conditional on the past data, which is no surprise.

But this means that if originally assumed the premise, “Our uncertainty in the distance is quantified by a gamma distribution” the answer to (3) will be different. Just as it would be different if we began with a Weibull (say) or any other mathematical probability distribution.

Which probability distribution is the “right” one? Well, that is a conclusion to a probability argument. Which premises will we supply to ascertain the probability that that normal, or gamma, or whatever, is the “right” one? That again is up to us. We’ll talk more about this in detail at another time. But for now first suppose we have the evidence/premises, “I have three probability models, normal, gamma, and Weibull. Just one of these is the right one to quantify uncertainty in distance.” Given just this information, the probability that any is right is 1/3.

We could then take this information and compute a (3) for each model, then weight the three answers (the three numerical answers to (3)) to produce this

(4) Pr (Distance > 1 meter | assumptions about distributions and past data) = better answer.

Notice that there is no talk about which distributions make up (4). They disappeared just as the m and s disappeared when we went from (1) to (3).

The point: every statistical problem the modern Bayesian does is just like this. He attempts to answer the actual questions real customers ask him.

**Homework**

Check for typos.

Wine tour today.

Also, have your spreadsheets ready for tomorrow.

“He attempts to answer the actual questions real customers ask him.”

Good golly, no! The real customer very rarely knows what questions are the right ones to ask. 90% percent of the job is figuring out what the customer really wants (needs) to know and explaining to the customer why.

Unless you just want to humor him and take his money. In which case it doesn’t matter whether you are a frequentist or a Bayesian. Take the money and run.

Or you could drill down to the customer’s real problems, which are rarely cut and dried. In which case you better be an analyst in multiple disciplines.

I’ll bet the customer really wants to know the distance and not it’s probability which amounts to saying “I guess it will be X” and

~~he~~it** is likely floored that you need to guess at your guess: “(4) Pr (Distance > 1 meter | assumptions about distributions and past data) = better answer.” I know what your doing but the customer probably doesn’t want to hear it.JH,

That’s true in engineering also. At CMU, if you didn’t explain your answers in English, you got a big fat 0. My problem was learning English since I grew up in Pittsburgh. It wasn’t until much later I discovered that wasn’t a requirement at many schools — at least not from the reactions I got from recent grads.

** I could have used the contraction of she-he-it but sheit in some parts of the USA means pretty much the same thing as duh-yam.

—

Are you still in NY? I’ve heard that wine tours in Manhattan have something to do with travel to Miami. They sound expensive both in time and money.

—

Somewhat OT: are subscripts and superscripts allowed? I can’t remember. If the “**” came out superscript then my question is answered.

Also, just a suggestion: could you make the edit box taller say by reducing the size of the ID area? This one only shows about 3 sentences at a time. It’s like looking through a slit.

And one more: if you ever re-instate the list of allowable codes, try to make them buttons that will insert in-out pairs to minimize run away formatting.

Sorry, meant “Uncle Mike” and hot “JH”

So the right answer will involve integrating three probability distributions with multiple parameters over the range +- infinity. For every distance of interest.

When the alternative is to consult a table, “Values of the Normal distribution”, once perhaps we have a reason why people prefer frequentism?

DAV, I miss you too. ^_^

(3) Pr (Distance > 1 meter | normal and past data) = the answer.

No one can prevent you or practitioners or anyone else from assuming normality. Assuming normality for the sake of being able to calculate the probability is a grave mistake, imo.

Statistics starts from propositions on data/evidence. The key question is how to determine a probability distribution (PD) using available data, although the choice may not be unique. The choice is also subject to the rules of logic and the axioms of probability. For example, if the distance takes only non-negative values, the probability rules dictate that P(distance <0) = 0. For a skillful player, I imagine the distance would have a skewed-right distribution, so I probably would choose a PD that can accommodate the skewness.

Whatâ€™s an empirical probability? Simply estimate the probability by the sample proportion of data that are greater than (or equal to) 1. What can be more frequentist than this? Is it not as good as a Bayesian estimate? Itâ€™s straightforward and easy to understand and calculate. No worries about assuming wrong PDs on the prior and the likelihood function. An answer to Richâ€™s question of why people prefer frequenstist methods.

If you want to assume a PD, one can also plug in frequentistâ€™s estimates of the parameters, say, sample mean and standard deviation, and calculate the probability accordingly.