# The Lady Tasting Tea: Bayes Versus Frequentism; Part IV (finally the end!)

*Read Part I, Part II, Part III. This is the missing Part, which was promised a year ago. We’re all tired of this subject, and there are so many other things to talk about, so today is the last installment. I’ll even try and make it interesting.*

It is finally time to reveal what happened. Our good lady guessed M = N = 8 cups: she got them all right. (Though some reports claim she got M = 6 right, missing one milk-first and one tea-first guesses). Remember our goal: we want to know whether or not she “has the ability.” Repeat that before reading further.

**Frequentist answer**

The frequentist calculates this:

(4) Pr ( T(M,N) > = t(M,N) | she does not have ability),

where T(N) and t(N) are the same mathematical function of the data, but where the t(M,N) is the value of the statistic *we actually observed* and T(M,N) is the value of the statistic in repetitions of the trial, where these repetitions are embedded in an infinite sequence of trials.

T(M,N) and t(M,N) are called “statistics”; they are not unique; their use is not deduced. Indeed, for this experiment we have (at least) our choice of the binomial and Fisher’s exact statistics. For the former, (4) = 0.0039 and for the latter (4) = 0.014. We could have easily expanded this list to other popular test statistics, each providing different solutions to equation (4). Fishing around for a test statistic which gives pleasing results is a popular pastime (we want the statistic or statistics which give 0.05 or less for (4), this being a magic number).

Which of these is *the* correct test statistic? Neither. Fisher’s test could be used if the lady knew she was getting exactly 4 cups of each mixture, the binomial could be used if she didn’t; but other choices exist. (It is the lady’s evidence that matters, not yours.) In any case, we have two p-values. Can they help answer our original question? They cannot. Equation (4) is not equation (3), which again is:

(3) Pr ( “She has the ability” | “M = N” & “Experimental set up”).

In no way is (4) a proxy for (3); it is even *forbidden* in frequentist theory to suppose that it is. Classical theory merely says that if (4) is less than the publishable limit we “reject” the theory “she does not have the ability”. That is, we claim that “she does have the ability” is false, which necessarily makes “she has the ability” true.

But recall that “she has the ability” had multiple interpretations. Which of these is the frequentist saying is the right one? Well, none of them and all of them. Actually, the answer the frequentist will give when posed this question is usually a variant of, “Is that the bus? I must run.” However, there is still the “agnostic” model; see below.

Incidentally, if she got two wrong, (4) is 0.24 for Fisher’s and 0.14 for the binomial.

**Bayesian answer**

The Bayesian cannot answer (3) without first deciding what “She has the ability” means. If he decides, in advance, it means “She always guesses correctly” then as long as M = N this theory has probability 1, i.e. (3) = 1. If M < N then (3) = 0. And that is that.

If we decide it means “She always guesses at least N/2 correctly” then as long as M > = N/2, (3) = 1, else it is 0. And similarly for any other interpretation.

That means that if we have one fixed interpretation and are willing to entertain no other, then as long as the observations are consistent with this theory, we must continue to believe it is *certainly* true. And if the evidence is not consistent, we will have falsified our interpretation and thus it must believe it is *certainly* false. But if we have falsified it, this does *not* mean we have given a boost to some other theory because, of course, we have already said that there were no other theories.

Please pause here an ensure you understand this. It is a serious and fundamental point.

In order to have non-extreme probabilities attached to a model’s truth, we must have more than one model in contention. One model alone is either true or false: this is a tautology, which is why it does not provide additional evidence (a tautology attached the premises of any argument does not—cannot—change the probability of the conclusion).

So suppose we have decided that “has the ability” means either M_{1} = “always guesses correctly” or M_{1} = “guesses at least N/2 correctly”. Good arguments, after all, can be made for both. Before we see the experiment, based on these arguments, we must assign a probability either is true. If our evidence is *only* that we have these two to pick from, then we would assign probability 1/2 to each (this can be derived through the symmetry of individual constants, a subject for another day).

Now if we see M = N – 1 (which is > N/2) then we have still falsified M_{1}; this necessarily makes the probability of M_{2} as 1. And if we see M < N/2 we have falsified both—*leaving no alternative*. But if M = N then since this evidence is consonant with both models, we have not changed the probability that either is true.

This is it; this is *the* answer no matter how many interpretations we initially consider.

**Agnostic model**

The one possibility left is the agnostic model of Part III. Suppose the lady got M = 0 right in N = 40 cups (say). Would you say she “has the ability”? Sort of: she appears to be a perfect negative barometer. If you knew somebody who was always wrong about picking stocks, he would be as useful to you as somebody who was always right.

So we leave ourselves agnostic about her ability and say it could be guessing anything from 0 to N, as described in Part III. At the end, we remain agnostic but we are able to predict how well she will do in N’ *new* trials. This is important because even if we are agnostic, there are different forms of agnosticism. That is, we are assuming uniform agnosticism, but it may be that a better model might be one which allows different performance for milk-first and tea-first cups (as described in Part I). And it could be that milk-first and tea-first cups differ, but her palate fatigues after W cups. And so on and on for all the other possible models.

Do you see? Being agnostic has *not* excused us from formulating a model—which we can test and verify on new data. This is natural in Bayes and not so in frequentism (see the papers linked in Part III). But enough is enough. On to something new tomorrow!

one milk-first and one two-first guesses

one milk-first and one tea-first guess

there use is not deduced

their use is not deduced?

Sorry, better go take my medicine…

Big Mike,

My enemies are out in force this morning.

Good thing I waited to re-ask my question. Why isn’t (3):

Pr(has ability|E) + Pr(not has ability|E) = 1

or

Pr(has ability|E) = 1 – Pr(not has ability|E)

and Pr(not has ability|M=N & experimental setup) = 1/(2^N) (or at least something non-zero)

so Pr(has ability|M=N & experimental setup) = 1 – 1/(2^N)

or is there another assumption in (3):

Pr(has ability|M=N & experimental setup & has ability) = 1

so that the possibility that M=N arose from something other than ability can be ignored.

After re-reading, I’ll attempt to answer my own question. The Bayesian is no longer answering the original question. While it’s reasonable to say that the lady has ability iff she always guesses correctly, it’s not true that she always guesses correctly iff she has the ability. So by redefining the question, it’s possible to sat that Pr(ability)=1, even though we’re no longer talking about tasting tea, we’re talking about guessing.

mt,

Not quite.

Pr(has the ability | E) = 1

and so

Pr(no has the ability | E ) = 0.

as long as E contains information about the experiment and the knowledge the we only have

oneinterpretation of “has the ability”.The number 1/2^N is the answer to an entirely different question (or questions), one of which is

Pr(M = N | E, no causal-path guessing).

The key is this: if you only consider one interpretation, i.e. one model, then either you will confirm that model (it remains probability 1) or you falsify it (it has, conditional on the observations, probability 0).

Remember also: “has the ability or not has the ability” is a tautology.

Remind me why we stopped considering, “she just guessed’ as an alternative explanation.

I think my issue is with the model, which is being hidden by redefining “has the ability”. I think E in this case is really:

M=N & experimental setup & if all guesses in the experiment are correct then the person has the ability.

Which means the Bayesian doesn’t care that Pr(guess correct|no ability) > 0, the simplifying assumption allows him to be absolutely certain of his result, regardless if whether or not it’s correct in reality.

mt,

I see the confusion. See tomorrow’s (Thursday’s) post. I’ll put up a supplement.

Just checking: Prof Briggs wrote:

“That is, we claim that â€œshe does have the abilityâ€ is false, which necessarily makes â€œshe has the abilityâ€ true.”

There is a “not” missing in this sentence, isn’t there?

The null is “she does not have ability” which we reject if we are sufficiently satisfied (95%!) – which necessarily makes “she has the ability” true.