# William M. Briggs

### Statistician to the Stars!

#### Page 149 of 559

Another Wikipedia picture that draws the eye but says little.

Class is almost over! Just three days (including today) left. I’m way behind reading comments.

All logic, of which probability is an example, begins with a fixed list of premises from which we attempt to write a conclusion which either follows from these premises or which is as true as can be under these same premises. This is direct probability.

Thus, given “All men are mortal and Socrates is a man” the conclusion which is true with respect to these is “Socrates is a mortal.” This has probability 1. Or given “Most Martians wear hats and George is a Martian” the conclusion which has the highest probability is “George wears a hat.” It has the probability, assuming also as a premise that most means greater than half up to all, greater than half and up to one. Not to mince about, this is an interval and not a number.

The key to direct probability: the premises are fixed, unflinching and unchangeable, and accepted as is by all. Except for the exception mentioned next, there are rarely (except academic) disagreements over its uses.

There other kind of direct probability is when the premises are just as fixed but the proposition at the end, the conclusion as was, is variable. So we could ask, if desired, the probability “Socrates has a happy marriage” with regard to the first premises—or even the second. Not much would come of it, of course, but the thing could be done. And is done. This is the origin of several popular fallacies.

Then there is inverse probability which starts with the “conclusion” desired or desirable proposition and seeks for premises which make it true or probable. This can always be done. The proof is trivial. Supposing we want to know the truth of “Socrates has a happy marriage”. By supplying the premise “Socrates has a happy marriage” the conclusion has probability one.

This won’t be a satisfying maneuver for most, but it hasn’t stopped many from using it. For those in the habit of producing circular arguments the practice is to word the “premise” differently but equivalently and to pad it with extraneous words. For example, “Through my long years of reading history I, even I, have determined that that most well known philosopher was united in domestic bliss.” Therefore, etc.

More fallacies enter via indirect probability than by direct probability, and more arguments are started over it. Keep our example. Which are the best premises? Well, how much do we know of Xanthippe? If Plato is any guide, we know she had to be led away from Socrates’s prison wailing and weeping. Sounds like the behavior of a loving wife. But then we have the man’s own words, “By all means, marry. If you get a good wife, you’ll become happy; if you get a bad one, you’ll become a philosopher.”

Just which of these, and the many others we can consider, are the right premises? Scholars differ. But we must keep in mind that we can always find a set of premises which makes the conclusion/proposition as true or false as we like. The game then becomes arguing over the list.

Some of the premises which appear in the list will also make appearances in lists of other arguments, arguments which could be related to the proposition at hand. Knowledge, it has been said, is a tangled web. When the same premises are found all about the same village, as it were, it strengthens their support for inclusion. But this is because we are considering a meta-argument with conclusion “This premise should be included.”

People excel at discovering confirmatory premises. But they stink at admitting negative ones. Note admitting and not discovering. It’s often easy enough to find negative premises, but it’s painful to allow them into the list.

And then even if we have all the right premises, we’re not too good at logically tying them together, except in the narrowest of circumstances. Like mathematicians tackling theorems or chemists chasing how much of compound is enough.

An excellent example of inverse probability is a criminal trial. The proposition/conclusion is agreed to by all. “The defendant is guilty.” The hope is, people being what they are and admitting their soaring skills at confirmation bias, that because there will be those in favor of and against the proposition, that all the relevant premises will be discovered.

Because of the vast complexity of these premises, we can only thank God that the probability the conclusion is true is not only not required to be quantified, but it is forbidden to be so.

Lazy eights everywhere.

From his page 55 (as before slightly edited for HTML/LaTex):

Consider the case of a binary event where the two outcomes are success, S, or failure F and we suppose that we have an unknown probability of success $\Pr(S) = \theta$. Suppose that we believe every possible value of $\theta$ is equally likely, so that in that case, in advance of seeing the data, we have a probability density function for $\theta$ of the form $f(\theta)$ = 1.

And $\theta$ lives on 0 to 1. “Suppose we consider now the probability that two independent trials will produce two successes. Given the value of $\theta$ this probability is $\theta^2$. Averaged over all possible values of $\theta$” this is 1/3 (the integral of $\theta^2d\theta$).

A simple argument of symmetry shows that the probability of two failures must likewise be 1/3 from which it follows that the probability of one success and one failure in any order must be 1/3 also and so that the probability of success followed by failure is 1/6 and of failure followed by success is also 1/6.

This is a contradiction or paradox and a glaring one which causes subjective Bayesians to cower (rightly). (I skip over the difficulties covered before with the idea of “independent trials”.) Where does the fault lie? Here:

Suppose that we believe every possible value of $\theta$ is equally likely…

What could that possibly mean? Nothing. Sure, it’s easy to write down a mathematical answer, but this does not make it a true or useful answer. First: how many numbers are there between 0 and 1? Uncountably many. It is impossible for any being short of God to assign a probability to each of these. Second: even if somebody could, because there are uncountably many answers, it is impossible that any should be the right one. Recall the probability of seeing any actual observation with any continuous (i.e. infinity-beholden) distribution is always 0, a daily absurdity to which we always shut our eyes.

We have jumped the infinity shark. Jaynes warned us about this (in his Chapter 15; though he didn’t always obey his own injunction). I think his caution goes unheeded because the calculus is so easy to demonstrate and to work with. What’s easier than integrating a constant?

As shown in the original series, we must begin with a real-world finite conception of each problem and only after we’ve sorted out what is what can we take a limit, and only then for the sake of ease and approximation. We must not fall prey to the temptation of reifing infinity.

(If there is sufficient interest, I’ll show the solution for Senn’s example another day: it’s a simple extension of the problem in the original series.)

Jaynes himself should have followed his own advice in the derivation of a (two-dimensional) normal distribution. He began with a premise (something like this; I don’t have the book to hand) when measuring a star’s position errors are possible in any direction. But he took “any direction” to mean a continuum of directions. This isn’t possible.

Suppose all we have to measure a star’s position (on a plane) is a compass which points only in the cardinal directions. Then our measured error can only be a finite number of possibilities. There would be nothing Gaussian about the probability distribution we use to quantify our uncertainty in this error. Right?

Next suppose we double the precision of our compass, so that it points eight directions. Still nothing Gaussian. Finally suppose we set the precision to whatever is the precision of today’s finest instrument. This would still be finite and non-Gaussian. We have nothing, and will never have anything, which can measure to infinite precision in finite time. This goes for star’s positions, salaries, ages, weight, and anything else you can think of. We’re always limited in our ability to see.

Acknowledging this “solves”—actually does away with—the long-standing problem of putting “flat priors” on (unobservable) parameters of distributions like the normal. These are called “improper” priors because they aren’t real probabilities, they’re only mathematical objects to which we assign an improper meaning. Since they aren’t real probabilities you’d guess people would abandon them. You’d guess wrong.

The other problem with infinite probabilities is measurement units: probabilities can change just by a change in unit, say from feet to centimeters, an absurdity if probability has a constant meaning. This problem also disappears when we remain this side of infinity.

Anyway, time to stop. Logical probability Bayes always lands on its feet. Plenty of mistakes enter with subjective Bayes, it’s true, or even in LPB when people (wrongly) insist on quantifying the unquantifiable. There are many misunderstandings when toying with infinity.

Wikipedia chart trying to say something about probability. Nice colors, but it’s screwy.

We’re almost done. Only one more after this.

There are examples without number of the proper use of Bayes’s Theorem: the probability you have cancer given a positive test and prior information is a perennial favorite. You can look these up yourself.

But be cautious about the bizarre lingo, like “random variable”, “sample space”, “partition”, and other unnecessarily complicated words. “Random variable” means “proposition whose truth we haven’t ascertained”, “sample space” means “that which can happen” and so on. But too often these technical meanings are accompanied by mysticism. It is here the deadly sin of reification finds its purchase. Step lightly in your travels.

Let’s stick with the dice example, sick of it as we are (Heaven will be the place where I never hear about the probability of another dice throw). If we throw a die—assuming we do not know the full physics of the situation etc.—the probability a ‘6’ shows is 1/6 given the evidence about six-sided objects, etc. If we throw the same die again, what is the probability a second ‘6’ shows? We usually say it’s the same.

But why? Short answer is that we do so when we cannot (or will not) imagine any causal path between the first and second throws.

Let’s use Bayes’s theorem. Write $E_D$ for the standard premises about dice (“Six-sided objects, etc.), $T_1$ means “A ‘6’ on the first throw”, and $T_6$ means “A ‘6’ on the second throw”. Thus we might be tempted to write:

$\Pr(T_2 | T_1 E_D) = \frac{\Pr(T_2 | E_D) \Pr(T_1 | T_2 E_D )}{\Pr(T_1 | E_D)}$.

In this formula (which is written correctly), we know $\Pr(T_1 | E_D) = 1/6$ and say $\Pr(T_2 | E_D) = 1/6$. Thus is must be (if this formula holds) $\Pr(T_2 | T_1 E_D) = \Pr(T_1 | T_2 E_D )$. This says given what we know about six-sided objects and assuming we saw a ‘6’ on the first throw, the probability of a ‘6’ on the second is the same as the probability of a ‘6’ on the first toss assuming there was a ‘6’ on the second toss. Can these be anything but 1/6, given $\Pr(T_1 | E_D) = \Pr(T_1 | E_D) = 1/6$? Well, no, they cannot.

But there’s something bold in the way we wrote this formula. It assumes what we wanted to predict, and as such it’s circular. It’s strident to say $\Pr(T_2 | E_D) = 1/6$. This assumes, without proof, that knowledge of the first toss does not change our knowledge of the second. Is that wrong? Could the first toss change our knowledge of the second. Yes, of course.

There is some wear and stress on the die from first to second throw. This is indisputable. Casinos routinely replace “old” dice to forestall or prevent any kind of deviation (of the observed relative frequencies from the probabilities deduced with respect to $E_D$). Now if we suspect wear, we are in the kind of situation where we suspect a die may be “loaded.” We solved that earlier. Bayes’s Theorem is still invoked in these cases, but with additional premises.

Bayes as we just wrote it exposes the gambler’s fallacy: that because we saw many or few ‘6’s does not imply the chance of the next toss being a ‘6’ is different than the first. This is deduced because we left out, or ignored, how previous tosses could have influenced the current one. Again: we leave this information out of our premises. That is, we have (as we just wrote) the result of the previous toss in the list of premises, but $E_D$ does not provide any information on how old tosses affect new ones.

This is crucial to understand. It is we who change $E_D$ to the evidence of $E_D$ plus that which indicates a die may be loaded or worn. It is always us who decides which premises to keep and which to reject.

Think: in every problem, there are always an infinite number or premises we reject.

If it’s difficult to think of what premises to use in a dice example, how perplexing is it in “real” problems, i.e. experiments on the human body or squirrel mating habits? It is unrealistic to ask somebody to quantify their uncertainty in matters which they barely understand. Yet it’s done and people rightly suspect the results (this is what makes Senn suspicious of Bayes). The solution would be to eschew quantification and rely more on description until such time we have sufficient understanding of the proper premises that quantification is possible. Yet science papers without numbers aren’t thought of as proper science papers.

Conclusion: bad uses of probability do not invalid the true meaning of probability.

Next—and last—time: the Trials of Infinity.

Medals for everybody!

A few years ago the boss of Guinness toured American micro-breweries and congratulated them for their enterprise, but he also gave them a spot of advice, which, paraphrased, was that they should concentrate on making just one great beer rather than on many indifferent ones.

Hosmer Winery, the first of our three stops along Cayuga Lake, had available for tasting at least two dozen varieties, and they produce a couple more. Knapp Winery & Vineyard Restaurant listed 38 wines and spirits. And Lucas Vineyards listed 24.

These statistics are worth mentioning because none of these wineries are major concerns: all exist on minimal acreage, so it’s a wonder how this many wines can be produced. After the tastings, somebody explained this by starting the undoubtedly scurrilous rumor that the native-grown grapes are supplemented with water and high fructose corn syrup.

And then none of the wines they sell are cheap. The least expensive bottles were $8.99 at Lucas, but the average is around fifteen bucks, topping out around thirty. Obviously, these places subsist on the tourist trade. Tastings are three to five bucks, so they’re breaking even there. But each shop sells tchotchkes or they have small restaurants. And everybody buys a bottle or two, just for the fun of it. In short, and with exceptions, you’re not going to these wineries for the wine. Instead, the trip is ventured for the sake of the trip, for the beautiful vistas on a gorgeous day. And to see the shining gold and silver glint in the sunlight. These reflections are provided by the multitudinous medals lining the walls. Since these wineries are by Ithaca—which Utne Magazine once called the “Most Enlightened City in America”—every wine is a winner. Each goes home with a prize and a hug. Following is a selection of my tasting notes. Hosmer A small barn with a vineyard not too much bigger. Specializes in the sweet stuff, especially Raspberry Bounce, a Faygo Redpop simulacrum. 2012 Dry Riesling. Smells like cheddar cheese from the supermarket. Sour. Except for the smell, indistinguishable from the 2011 Riesling, the “Double Gold winner.”$15.

2009 Lemberger. Dusty, sweet scent. Drank, but taste disappeared instantly. Immediately forgetful. $18. 2010 Cabernet Franc. Cheap barbershop cologne. Awful. Oh My God. Awful.$18.

Estate Red. Thin. Not sweet. Reasonable plonk. $10.50. (I bought two bottles; shared them out on bus.) Knapp The only place we visited with a distillery. Tasting room nicely decorated with barrels. We had their barbecue of overcooked chicken. They like it sweet too, advertising Jammin’ Strawberry which will “flood your palate and bombard your senses.” I believed them and didn’t try it. Cabernet Sauvignon ’11. Almost no smell? Sour, thin; dries the mouth. “King of reds.”$18.95.

Sangiovese ’10. What is this? Aha! Nail polish remover. Tastes of day old apple cider made with peels. $16.95. Meritage ’11. Like flat, not-too-sweet root beer.$22.95.

Pasta Red Reserve. Smells like road construction. Too sweet. $10.95. Brandy. Fumes good replacement for nose hair trimmer. Stings the tongue. Couldn’t swallow. Aged what? Two days?$24.95.

Serenity. Passable. Tasted like a bin red wine. $12.95. (Bought bottle, shared out over lunch.) Lucas For no apparent reason, a nautical-themed winery (it’s nowhere near any water). Sorority hangout? The picture of medals is from here. The wines were pre-selected for us. Miss Chevious. My Grandma Briggs would have liked this: but she never paid more than two dollars a bottle. Sour as vinegar.$8.99. (Apparently if you buy some, you won a sticker “I got Naughtie at Lucas”. Several bridesmaids parties had these. “Gold Medal Winner!”)

Blues 2010. Cheap. God. Muck. Undrinkable. $8.99. (Nobody in our party could finish.) Semi-Dry Riesling 2010. Compared with neighbors and, yes, Off! Smells just like the bug spray. Didn’t dare taste.$13.99. (“Gold Medal Winner!”)

Butterfly. Smells like one of those junior artists paint set; kind which have ten paints in little joined plastic pots. Tastes exactly like Play Doh. $8.99. (“Gold Medal Winner!”) Tug Boat Red. Smells and tastes like a red bank sucker, the kind tellers used to hand out to children.$8.99. (“Gold Medal Winner!”)

Cabernet Franc Limited Reserve 2009. Puts me in mind what a diet alcohol would taste like. (This wasn’t on the scheduled flight. I asked pourer if we could try something that wasn’t sweet. I asked for boldest, best red. He suggested this. “Gold Medal Winner!”)

I didn’t buy anything from Lucas, but took a nap in their grass out front while waiting for our bus.