All Probability Is Conditional: An Answer To Senn; Part IV

A strictly incorrect way of writing Bayes’s theorem.

Read Part III.

Still with me? Hope so, because we’re only on the second page of Senn’s article (but don’t fret; we’ll be skipping most of it).

Review: in logical-probability Bayes (as in all Aristotelian logic), we begin with a list of premises (data, observations, evidence, or other synonymous term) and a proposition which is hoped to be related to the premises; from the premises we deduce the probability the proposition is true. Not all premises are sufficient to guarantee a numerical value, nor any value, nor any precise number: the probability could be stated merely in words, nonexistent, an interval, or a precise number.

Senn writes “we let \Pr(A) stand for the so-called marginal probability of an ‘event’, ‘statement’ or ‘hypothesis’ A and we let \Pr(B|A) stand for the conditional probability…of B given A.” (Note: I have edited the notation ever so slightly so that it will render well on a web page; I have not changed any meaning.)

As before, I already disagree. There just is no such thing as unconditional probability, or probability without respect to any evidence, thus it never makes sense to write “\Pr(A).” We can write (say) \Pr(A|E) which is the probability of A (a proposition) given the evidence E (also a proposition, albeit possibly a complex one which includes data observations).

Example: A = “A ’6′ shows” and E = “We have a Martian breen, etc.” (see the previous part for an explanation). But it makes no sense to ask, “What is the probability a ’6′ appears” without reference to something—whether it be a breen, a die, or something else.

Failure to recognize this creates another stumbling block in understanding probability. Probability is usually introduced as unconditional, and the complexity of conditioning follows some time later: but this is a mistake. There just is no such thing as unconditional probability. Just as there is no unconditional truth (of a proposition, we always at least refer to our intuition or faith if not a list of premises.)

Of course, inside a given problem, once we have E in hand, and it’s agreed to by all and always understood to be there, and for the simple ease of notation, there is no harm in writing \Pr(A). But this should not be done until one is well used to logical probability, else it seems like probability is a thing and not a measure of knowledge.

Don’t think this is a big deal? Oh, boy, is it ever, as we’ll see.

Senn introduces H_i for a hypothesis, i.e. some proposition, indexed by i, as there may be more than one proposition which is true in some situation. He adds a superscript T to indicate we believe H_i is true, but I’ll skip this complication. Whenever we see shorthand like \Pr(H_i|E) it means “The probability H_i is true given E.”

Senn then writes (with my change in notation) “We suppose that we have some evidence E . If we are Bayesians we can assign a probability to any hypothesis H_i and, indeed, to the conjunction of this truth, H_i \& E, with evidence E.”

Subjective Bayesians would agree: logical probability Bayesians do not. Subjective Bayesians are as free with numerical probabilities as politicians are with other people’s money. Subjectivists “feel” probability is a matter of emotion because they fail to write down the conditioning premises. They may arrive at a number (and bet using it: subjectivists are inveterate gamblers; at least in theory), but this does not mean the numbers they produce have any bearing on the probabilities unless it can be demonstrated their (unwritten) premises imply these (and no other) numerical values. There is much more to the errors subjectivists make, but given that they usually only make them in theory and not in many problems, where they usually agree with LPBs, we’ll let these go until another day.

LPBs are more Socratic and admit their ignorance. The original series proves the LBPs are right: we can not always discover a probability, and so we can not always discover a numerical value for a probability. We can still manipulate the symbols until we get the form of an answer, but that doesn’t make the answer right. I believe Senn would agree with that.

Thus we can always write Bayes’s theorem, which Senn does:

\Pr(H_i | E) = \frac{\Pr(H_i) \Pr(E | H_i )}{\Pr(E)}.

Spot the trouble? What could it possibly mean to say \Pr(E)? or even \Pr(H_i)? If we knew \Pr(H_i) then we would know \Pr(H_i) and thus we wouldn’t have to bother with any kind of experiment or other evidence or indeed anything. We’d just write the answer down!

We’re in just as much trouble with \Pr(E). How can we ask about the probability of evidence we just witnessed? It has to be a probability of 1, right?, or it wouldn’t have happened!

The problem does not lie in Bayes’s theorem, which nobody disputes (how could they?), but in the way it is written and what the symbols mean. Senn is right that Subjective Bayesians can (and do) say anything, but that doesn’t mean what they say has any bearing on reality (I’ll let you provide the politician comparison).

About that, more next time.


Probability Isn’t “Fair”: An Answer To Senn; Part III

Varying degrees of “fairness.”

A quick reminder that we’re trying to unpack the meaning of the “is fair” in the proposition “This die is fair,” and trying to deduce the probability this proposition is true given (and only given) the evidence “This die has been rolled five times and showed five ’6′s.” See the previous installment for why.

“Is fair” can take one of several definitions. Our predicament arises from not being clear which, and by mixing versions at different stages of the problem.

Meaning 1: In any finite number of tosses, the proportion of observed tosses will match the probabilities deduced from the first example; i.e., the observed proportions will show 1/6 ’1′s, 1/6 ’2′s, and so on, or whatever is closest to these if the number of tosses is not divisible by six.

Assuming Meaning 1, and given our evidence, we deduce the probability the proposition is true is 0; it is false. If the proposition were true, we should have seen some combination of five numbers with one missing (e.g. ’6′, ’3′, ’5′, ’1′, ’4′); the missing could have been any number between ’1′ and ’6.’ (I keep the quotes around the outcomes to help us recall these are labels and not numbers.)

Meaning 2: In any finite number of tosses, the proportion of observed tosses will approximately equal the probabilities deduced from the first example; i.e., the proportions will approximately show 1/6 ’1′s, 1/6 ’2′s, and so on.

Assuming Meaning 2, and given our evidence, we deduce the probability the proposition is true is not calculable. The probability is unknown—because “approximately” is not defined. If “approximately” means (and I do not jest) “Leave me alone, I’m tired of playing dice” then the proposition is true, because the observed frequencies are more than close enough for somebody who doesn’t give a damn about dice. If you fail to appreciate this example, you are in for tough times ahead; so pause here and make sure this sinks in.

If “approximately” means “not varying more than 5% from” then the proposition is deduced to be false because, of course, the observed proportions have differed by more than 5%. But if “approximately” means “not varying more than 90% from” then the proposition is deduced to be true, because the observed variations are within this bound.

Who gets to decide what “approximately” means? Well, you do; as does Senn; as do I. Fights start over things like this. What is the one and only definition of “approximately”? There isn’t one! It depends on the situation. As we saw, for some it could mean “Leave me alone”, for others, say casinos, it would be much tighter.

Think this ambiguity bad? It’s even worse than this.

Meaning 3: In any finite number of tosses greater than or equal to 6, the proportion of observed tosses will equal the probabilities deduced from the first question; i.e., the proportions will be 1/6 ’1′s, 1/6 ’2′s, and so on, or whatever is closest to that if the number of tosses is not divisible by six.

Given this and our evidence, the proposition is not true or false (1 or 0) but somewhere in between because we haven’t yet reached the limit of 6 tosses. Kind of. If the die were tossed just one more time (for 6 in total), then there is no way the observed proportions could equal the deduced probabilities. The proposition would then be false. But the die hasn’t been tossed just one more time. It could be tossed 100 more times. Who knows? But we still have the feeling, based on the observations, that the future tosses won’t bring the final proportions in line with the deduced probabilities (I keep repeating deduced to remind us these are not subjective guesses nor are they estimates).

Our evidence and assumed definition isn’t proof the proposition is false, especially if we consider it with respect to Meaning 4, which is the same as 3 but with “approximately” put in usual place. Nor is the proposition true. But we also don’t seem in a strong position to quantify the probability. Nothing in the world wrong with that. Not all probability is quantifiable. See the original series for why.

If we insist on asking the original question, we’re left trying to understand what “is fair” could mean. We need to settle on a definite, unambiguous meaning before we can progress. And then even if we do we’re going to be left will all kinds of nagging questions about real dice.

Real dice have weight distributed unevenly. There’s no way to create perfect balance. We can prove this easily: displaying the numbers, which are of different shape, creates an imbalance, however minuscule. It might be possible to engineer a die down to the level of a quark, so that each side is precisely the same number of quarks across, and that the mass of the die is uniform at the Planck scale (except for the surface where the displays are). In practice, for macro-scale dice, this is impossible. But maybe some physicist will figure it out for some tiny thing. Even then, he won’t be sure that the strings which comprise the quarks are the “same length” everywhere and uniformly (if that even makes sense to say).

But even supposing we have this toy, we have the problem of tossing it. How? Onto what kind of surface? From what height? How much spin? With what downward force? In what gravitational field? After all, if we want to discuss tossing a “fair” die, all these things have to be considered. Tossing is part of “fairness.”

It is at this point it dawns on us that we’re on a fool’s errand. If the die were perfect, as we imagine (and as a logical die is in effect), and if the environmental conditions and forces were known precisely, then we’d know—before tossing—exactly what the outcome would be. Indeed, if the forces did not vary, the die would land the same way each time.

Point is, just by our knowledge of physics we know that any real die and its tossing environment isn’t “fair” in any complete physical sense. There’s no point to the original question. No real die (or its tosses) is “fair” in this sense. The proposition is contingent.

We’re asking the wrong question. What we really want to know is if the die is “fair enough”, and to answer that requires, as above, knowing what decisions we want to make regarding the die.

What we can do is to deduce the probability of seeing any arrangement of observations, either before seeing any observations whatsoever, or conditional on our initial knowledge of six-sided (logical) objects supplemented by a set of observations specified by the evidence. (We do this using Bayes’s theorem: see the next Parts.)

In other words, we can then make statements like this, “Given our evidence about six-sided objects and the old observations, the probability of seeing departures of future observed proportions at least as great as X% from the deduced probabilities is Y.” If Y exceeds a threshold, then we act as if the die is not “fair”, but if it is less than this threshold, we say it is. The threshold varies depending on the application. For the person sick to death of dice, X is unimportant and Y is quite low. Casinos want a small X and large Y for obvious reasons.

We’ll never have 100% certainty that any real die is “fair” in this (final) sense that Y = 0 (for vanishingly small X), because we knew before we started that question dealt with a contingent matter, and we are never 100% certain of contingent matters (though we can be 1 – ε certain).

And you’ll notice that nowhere did we confuse the observed proportions—i.e. the relative frequencies—as probabilities. We knew the probabilities and used them to discern whether the relative frequencies were in line with the them; this is what we meant by “fairness.”

We have proved what we set out to show. That we don’t, at least for the kinds of examples that Senn provided, need two kinds of probability. The one kind—probability as logic—was enough.

Yet there is still more to understand. Stick around!

Update We could also form statements like this: “Given our evidence and old observations, in the next n throws, there is probability Y of seeing X ’1′s” and so forth. In other words, this and the previous example are predictions, statements of uncertainty of the future (or of that which is as yet unseen).

There Is Only One Kind Of Probability: An Answer To Senn; Part II

What are the chances a green die will land on top?

Read Part I. Some of this material is explained in detail in this series.

Just after the introduction, Senn starts his argument by claiming an “important distinction between two types of probabilities: direct and inverse.”

The distinction is simply explained by an example. The probability that five rolls of a fair die will show five sixes is an example of a direct probability—it is a probability from model to data. The probability that a die is fair given that it has been rolled five times to show five sixes is an inverse probability: it is a probability from data to model.

If we accept this distinction and example as written, we are already lost; all the standard confusions are there.

If probability is all of one sort, then there is no distinction between “direct” and “inverse” kinds. Our candidate is logical probability, in which, as in just-plain-logic, there is only evidence (equivalently, premises), a proposition to be considered with respect to that evidence, and a probability this proposition is true deduced from the evidence.

Let’s begin by rewriting the examples. The evidence is what? Trouble starts with the words “fair die”. This is taken to mean that we have a real, physical, tangible object which must, when tossed, results in equal chance of any side face up. This is asserted and not proved. It is a dictate. It sets in the mind a view of an actual die, of the kind that cannot (or at least does not) exist. Once this die is imagined, objections immediately arise: what if it isn’t “fair”? Can real dice be “fair”? What about imperfections? The confusion between asserting a probability and wondering whether the asserted probability equals the “real” probability, i.e. the long-run frequency of tosses, is already ineradicable. It becomes impossible to keep in mind what the real question is.

Start over rewriting all as a logical argument. “We have a six-sided (logical) object, just one side of which is labeled ’1′, just one side of which is labeled ’2′, and so on up to ’6′, which when tossed must show just one of these sides.” No physical, real die is implied, though because of the ubiquity of dice-like examples, people usually think one is. So if you find yourself unable to imagine a logical, i.e. non-physical, six-sided object, change it to a six-state Martian bleen, a device which is activated by tentacle and displays each time it is activated on a screen one and only one of the figures (translated into English) ’1′, ’2′, etc. There is no hint—as in no hint—of the workings of this device. All—as in all—we know is that the device when activated can show one of ’1′ through ’6′; how it does so is a mystery.

I stress again (and again) that since there are no Martians, there are no bleens. Any imperfections we imagine in a bleen are our own creations and are not part of the evidence supplied. The key to LPB is that we must—as in must—use only the evidence supplied, and all of it, in our deductions of probability. What is not directly implied from the given evidence must—as in, well, you get the idea—be ignored.

Now using the statistical syllogism (which itself can be deduced from simpler principles), we deduce the probability a ’6′ shows on one activation of a bleen, just as we can deduce the probability of five ’6′ activations. Or we can deduce anything which can happen in any (for now stick to finite) number of activations.

We are done with the first example which ends with at a conditional probability; i.e. a probability deduced from given, fixed evidence. All probability is likewise conditional. If you think not, see the series linked above for examples, or see Part III tomorrow for more on this.

Notice that I do not use the word “model”. It isn’t needed. Not here, and in far fewer cases than usually thought.

Senn’s second (“inverse”) example is also confusing. This asks the probability the following proposition is true: “This die is ‘fair’.” The only written evidence is “This die has been rolled five times and has showed five ’6′s.” That we are dealing with a real, physical die is implied from the words, but it is never stated. But suppose this is wrong and Senn meant a logical die or a breen: then where would we be?

Right where we started. If this is the logical “die” or breen, then we start by knowing the chance each number is displayed is 1/6. We end there, too. We have deduced “fairness.”

So we must be talking of a physical, rea-life die. Our task is to interpret this proposition with regard to the given observations.

This evidence is easy and means just what it says: five rolls, five ’6′s of some real die. The proposition is less clear. The subject makes sense: “This die” means some real, actual physical die. The difficulty is with the verb: “is fair.”

Ah, fairness. From youth we are told that there is nothing finer! Indeed, fairness is so fine that we discuss it next time.


Bayes Is More Than Probably Right: An Answer To Senn; Part I

Stephen Senn very kindly answered a post I wrote on p-values (Unsignificant Statistics: Or Die P-Value, Die Die Die) by sending me his “You May Believe You Are a Bayesian But You Are Probably Wrong” (in Rationality, Markets and Morals).

Since I will be teaching at Cornell these two weeks, and the topics are the same, I will use part of this time to answer his paper in depth.

It would be best to start here Subjective

Stephen Senn

Stephen Senn

Versus Objective Bayes (Versus Frequentism): Part I, since that series explains matters in greater detail.

Probability

Senn went wrong before he even began, with his title: “You May Believe You Are a Bayesian But You Are Probably Wrong.” If you are only “probably wrong” about your belief then you also might be right. And if you were certainly wrong, then we would have a proof which says so. A proof is a string of deductions, i.e. a valid and sound argument, which begins with obviously true premises (agreed to by all) and ends at a proposition we must believe—even if we don’t want to.

Senn does not have, nor does he claim to have, a proof which shows being a Bayesian is certainly wrong. It is only his best guess that this philosophy is wrong. Probably wrong. So here we are, already at probability. What could Senn mean by his probabilistic statement “probably wrong”? (Besides the pun, I mean.) It can’t be any kind of frequentist statement, as in “I’ve collected a ‘random sample’ of Bayesian philosophies, itself embedded in an infinite sequence of such philosophies, and the mean of this sample (considering errors in theory equal to zero) tends towards zero.” That makes no kind of sense, as I’m sure Senn would agree, but it would have to if probability was frequentism.

Bayesian philosophy, at best, comes in a finite number of flavors. It could be that some of these are false (I agree subjectivism, as it is usually understood, is), but in no way can we imagine any individual theory as being embedded in an infinite sequence of theories, which is required for frequentist theory to hold. No: either we can prove each theory true or false, or our evidence is not (yet?) sufficient, and thus we are only probably sure each theory is true or false. This sounds like a Bayesian statement, no? (If so, do we fail because of self-reference? Well, no, because we can build this theory from simpler propositions.)

It could be that Senn took a subjective Bayesian tack when he formed his title, or perhaps he took a logical probability, or objective, Bayesian one. (Incidentally, I’ll call this latter theory LPB for short.) Or he could have meant some as yet unknown (or at least unidentified) theory. Whatever it was, it couldn’t have been frequentism, as shown.

His leading candidate is eclecticism (Senn is not frequentist), which is one of two things. One is no belief at all. It means “I’ll do whatever I want whenever it seems good to me.” There is no theory here to disprove, nor prove. To say “I’m an eclectic” this way means “I don’t want to argue for anything, just against things.” Since we go nowhere engaging with this “theory”, we pass on to number two. This is to say, “I’ll take a little of that, some of this, and some of the other.” Here we have several sub-theories. As such, this kind of eclecticism is actually a whole theory (the compilation of sub-theories) which might be true or false. Thus Senn might have used Bayes for his title and he might use frequentism for (say) dice tosses.

Senn recalls that Fisher himself was “skeptical” of attempts to unify probability. Hacking, another Big Cheese, in line with other well-aged curds, is of the same opinion. Why should we have a theory? Why not many? The obvious answer to this is that there is that which is true and that which is false and we should seek the truth. If it turns out a theory of probability works for all kinds of uncertainty, we’re stuck with it. If it must be that several theories are true, then we must accept them all. But it’s wrong to use desire or suspicion as proof there are many and not one theory.

Senn himself proved that frequentism is out (and forever) as a complete theory of probability because it cannot handle propositions like his “probably wrong.” But this isn’t proof that Bayes everywhere right; not yet. Senn’s later examples might be sufficient to show all versions of Bayes are wrong, in which case some other theory must be true.

But we’ll have to see next time, because we’re already out of space, and because next topic isn’t simple.