# William M. Briggs

### Statistician to the Stars!

#### Page 151 of 561

Wikipedia chart trying to say something about probability. Nice colors, but it’s screwy.

We’re almost done. Only one more after this.

There are examples without number of the proper use of Bayes’s Theorem: the probability you have cancer given a positive test and prior information is a perennial favorite. You can look these up yourself.

But be cautious about the bizarre lingo, like “random variable”, “sample space”, “partition”, and other unnecessarily complicated words. “Random variable” means “proposition whose truth we haven’t ascertained”, “sample space” means “that which can happen” and so on. But too often these technical meanings are accompanied by mysticism. It is here the deadly sin of reification finds its purchase. Step lightly in your travels.

Let’s stick with the dice example, sick of it as we are (Heaven will be the place where I never hear about the probability of another dice throw). If we throw a die—assuming we do not know the full physics of the situation etc.—the probability a ‘6’ shows is 1/6 given the evidence about six-sided objects, etc. If we throw the same die again, what is the probability a second ‘6’ shows? We usually say it’s the same.

But why? Short answer is that we do so when we cannot (or will not) imagine any causal path between the first and second throws.

Let’s use Bayes’s theorem. Write $E_D$ for the standard premises about dice (“Six-sided objects, etc.), $T_1$ means “A ‘6’ on the first throw”, and $T_6$ means “A ‘6’ on the second throw”. Thus we might be tempted to write:

$\Pr(T_2 | T_1 E_D) = \frac{\Pr(T_2 | E_D) \Pr(T_1 | T_2 E_D )}{\Pr(T_1 | E_D)}$.

In this formula (which is written correctly), we know $\Pr(T_1 | E_D) = 1/6$ and say $\Pr(T_2 | E_D) = 1/6$. Thus is must be (if this formula holds) $\Pr(T_2 | T_1 E_D) = \Pr(T_1 | T_2 E_D )$. This says given what we know about six-sided objects and assuming we saw a ‘6’ on the first throw, the probability of a ‘6’ on the second is the same as the probability of a ‘6’ on the first toss assuming there was a ‘6’ on the second toss. Can these be anything but 1/6, given $\Pr(T_1 | E_D) = \Pr(T_1 | E_D) = 1/6$? Well, no, they cannot.

But there’s something bold in the way we wrote this formula. It assumes what we wanted to predict, and as such it’s circular. It’s strident to say $\Pr(T_2 | E_D) = 1/6$. This assumes, without proof, that knowledge of the first toss does not change our knowledge of the second. Is that wrong? Could the first toss change our knowledge of the second. Yes, of course.

There is some wear and stress on the die from first to second throw. This is indisputable. Casinos routinely replace “old” dice to forestall or prevent any kind of deviation (of the observed relative frequencies from the probabilities deduced with respect to $E_D$). Now if we suspect wear, we are in the kind of situation where we suspect a die may be “loaded.” We solved that earlier. Bayes’s Theorem is still invoked in these cases, but with additional premises.

Bayes as we just wrote it exposes the gambler’s fallacy: that because we saw many or few ‘6’s does not imply the chance of the next toss being a ‘6’ is different than the first. This is deduced because we left out, or ignored, how previous tosses could have influenced the current one. Again: we leave this information out of our premises. That is, we have (as we just wrote) the result of the previous toss in the list of premises, but $E_D$ does not provide any information on how old tosses affect new ones.

This is crucial to understand. It is we who change $E_D$ to the evidence of $E_D$ plus that which indicates a die may be loaded or worn. It is always us who decides which premises to keep and which to reject.

Think: in every problem, there are always an infinite number or premises we reject.

If it’s difficult to think of what premises to use in a dice example, how perplexing is it in “real” problems, i.e. experiments on the human body or squirrel mating habits? It is unrealistic to ask somebody to quantify their uncertainty in matters which they barely understand. Yet it’s done and people rightly suspect the results (this is what makes Senn suspicious of Bayes). The solution would be to eschew quantification and rely more on description until such time we have sufficient understanding of the proper premises that quantification is possible. Yet science papers without numbers aren’t thought of as proper science papers.

Conclusion: bad uses of probability do not invalid the true meaning of probability.

Next—and last—time: the Trials of Infinity.

Medals for everybody!

A few years ago the boss of Guinness toured American micro-breweries and congratulated them for their enterprise, but he also gave them a spot of advice, which, paraphrased, was that they should concentrate on making just one great beer rather than on many indifferent ones.

Hosmer Winery, the first of our three stops along Cayuga Lake, had available for tasting at least two dozen varieties, and they produce a couple more. Knapp Winery & Vineyard Restaurant listed 38 wines and spirits. And Lucas Vineyards listed 24.

These statistics are worth mentioning because none of these wineries are major concerns: all exist on minimal acreage, so it’s a wonder how this many wines can be produced. After the tastings, somebody explained this by starting the undoubtedly scurrilous rumor that the native-grown grapes are supplemented with water and high fructose corn syrup.

And then none of the wines they sell are cheap. The least expensive bottles were $8.99 at Lucas, but the average is around fifteen bucks, topping out around thirty. Obviously, these places subsist on the tourist trade. Tastings are three to five bucks, so they’re breaking even there. But each shop sells tchotchkes or they have small restaurants. And everybody buys a bottle or two, just for the fun of it. In short, and with exceptions, you’re not going to these wineries for the wine. Instead, the trip is ventured for the sake of the trip, for the beautiful vistas on a gorgeous day. And to see the shining gold and silver glint in the sunlight. These reflections are provided by the multitudinous medals lining the walls. Since these wineries are by Ithaca—which Utne Magazine once called the “Most Enlightened City in America”—every wine is a winner. Each goes home with a prize and a hug. Following is a selection of my tasting notes. Hosmer A small barn with a vineyard not too much bigger. Specializes in the sweet stuff, especially Raspberry Bounce, a Faygo Redpop simulacrum. 2012 Dry Riesling. Smells like cheddar cheese from the supermarket. Sour. Except for the smell, indistinguishable from the 2011 Riesling, the “Double Gold winner.”$15.

2009 Lemberger. Dusty, sweet scent. Drank, but taste disappeared instantly. Immediately forgetful. $18. 2010 Cabernet Franc. Cheap barbershop cologne. Awful. Oh My God. Awful.$18.

Estate Red. Thin. Not sweet. Reasonable plonk. $10.50. (I bought two bottles; shared them out on bus.) Knapp The only place we visited with a distillery. Tasting room nicely decorated with barrels. We had their barbecue of overcooked chicken. They like it sweet too, advertising Jammin’ Strawberry which will “flood your palate and bombard your senses.” I believed them and didn’t try it. Cabernet Sauvignon ’11. Almost no smell? Sour, thin; dries the mouth. “King of reds.”$18.95.

Sangiovese ’10. What is this? Aha! Nail polish remover. Tastes of day old apple cider made with peels. $16.95. Meritage ’11. Like flat, not-too-sweet root beer.$22.95.

Pasta Red Reserve. Smells like road construction. Too sweet. $10.95. Brandy. Fumes good replacement for nose hair trimmer. Stings the tongue. Couldn’t swallow. Aged what? Two days?$24.95.

Serenity. Passable. Tasted like a bin red wine. $12.95. (Bought bottle, shared out over lunch.) Lucas For no apparent reason, a nautical-themed winery (it’s nowhere near any water). Sorority hangout? The picture of medals is from here. The wines were pre-selected for us. Miss Chevious. My Grandma Briggs would have liked this: but she never paid more than two dollars a bottle. Sour as vinegar.$8.99. (Apparently if you buy some, you won a sticker “I got Naughtie at Lucas”. Several bridesmaids parties had these. “Gold Medal Winner!”)

Blues 2010. Cheap. God. Muck. Undrinkable. $8.99. (Nobody in our party could finish.) Semi-Dry Riesling 2010. Compared with neighbors and, yes, Off! Smells just like the bug spray. Didn’t dare taste.$13.99. (“Gold Medal Winner!”)

Butterfly. Smells like one of those junior artists paint set; kind which have ten paints in little joined plastic pots. Tastes exactly like Play Doh. $8.99. (“Gold Medal Winner!”) Tug Boat Red. Smells and tastes like a red bank sucker, the kind tellers used to hand out to children.$8.99. (“Gold Medal Winner!”)

Cabernet Franc Limited Reserve 2009. Puts me in mind what a diet alcohol would taste like. (This wasn’t on the scheduled flight. I asked pourer if we could try something that wasn’t sweet. I asked for boldest, best red. He suggested this. “Gold Medal Winner!”)

I didn’t buy anything from Lucas, but took a nap in their grass out front while waiting for our bus.

Somebody please feed this poor woman

One thing that is admirable about the well-attired man is that if he were to time travel, he would be at home in nearly any era. He may have to make some adjustments to better fit in (perhaps cast off the jacket and leave only the vest; or manipulate his tie into more of a cravat), but he will have the tools at his disposal that he needs to make a good impression on short notice. Sure, he might draw the odd glance from Henry VIII, but, on the whole, many of his items of clothing would be recognized as functional.

The same cannot be said of a woman, because “well-attired” has many nuances, especially in the summer months. Imagine a woman wearing a “hi-lo” skirt (which is the sartorial equivalent of the mullet, and perhaps can be best described as “party in the front and all business in the back”) and flip-flops hitching a ride in a time machine to 1860. Our poor time traveler will stick out like the proverbial sore thumb, and her mission does not portend to end well.

One of the misfortunes of the hi-lo skirt, or any garment that reveals a woman’s knees, is that a woman’s knees are revealed. Women’s knees, as a whole, are not beautiful. I am sorry to be a bearer of bad news, especially as the temperatures are soaring and as the winter’s tights and woolens are being cast off, but this is an incontrovertible fact.

Part of the problem with the knee itself is the anatomy of a woman’s leg. The basic shape is an inverted triangle, with the point of the triangle buried somewhere in the region of the mid-calf. The knee, in real life, does not at all resemble that of a knee of a mannequin propped up in a store window. I have never been in a designer’s atelier, but I have seen dressmaker’s dummies, which are torso shapes affixed to some sort of pole—with no legs. Any fashion that can be dreamed up is going to look much better on a leg-less dummy than on a flesh-and-blood human being that has to make her way in the world hobbling around on a pair of inverted triangles.

The poets support this view. They are silent on the shining beauty of Cleopatra’s knees and they entirely mute when it comes to the lower limb joints of the fair Helen. Fortunately for them both, they had the sense to cover them up in mixed company, or at least in the presence of poets.

It took millennia for hemlines to rise to the level they are now. My mother, a daughter of the 1950s, was a firm believer that a proper hemline for the female of the species was “two inches below the knee”. And my mother was literal in her interpretation of “knee” in that it was the horizontal center of that particular joint. “Two inches below” fell, in my view, right below the knee, and to be “two inches below” would require an additional two inches.

In the space between my mother’s and mind accounting of the “two inches” there was still enough of the knee’s characteristics on display to make the legs look their unflattering worst.

As a result, I was perhaps the only teenager who cried to my mother to lower my hems. I felt, and still feel, that the most attractive hem for a female is mid-calf, just where the flesh swells.

Many women declare that they admire the style of Grace Kelly, Audrey Hepburn, and Lauren Bacall, but then they go to their closets and come out looking like a second-string actress heading for rehab.

The problem isn’t necessarily with the women themselves, but more with what’s on offer. The shop windows are full of skirts that can be characterized as “eight inches above the knee.” By not offering a variety of hemlines, manufacturers and retailers are doing a grave disservice to women who want to save an unsuspecting public from having to absorb the shock of having to look at their puckered, wrinkled, discolored, but otherwise very useful knees.*

*Please don’t mention that they should wear trousers. Trousers could prove problematic for time travel.

Robert Brown

The Answer to Senn will continue on Monday. Look for my Finger Lakes winery tour tasting notes Sunday!

Several readers asked me to comment on an ensemble climate forecasting post over at Anthony’s place, written by Robert G. Brown. Truthfully, quite truthfully, I’d rather not. I am sicker of climate statistics than I am about dice probabilities. But…

I agree with very little of Brown’s interpretation of statistics. The gentleman takes too literally the language of classical, frequentist statistics, and this leads him astray.

There is nothing wrong, statistically or practically, with using “ensemble” forecasts (averages or functions of forecasts as new forecasts). They are often in weather forecasts better than “plain” or lone-model predications. The theory on which they are based is sound (the atmosphere is sensitive to initial conditions), the statistics, while imperfect, are in the ballpark and not unreasonable.

Ignore technicalities and think of this. We have model A, written by a group at some Leviathan-funded university, model B, written by a different group at another ward of Leviathan, and so on with C, D, etc. through Z. Each of these is largely the same, but different in detail. They differ because there is no Consensus on what the best model should be. Each of these predicts temperature (for ease, suppose just one number). Whether any of these models faithfully represents the physics of the atmosphere is a different question and is addressed below (and not important here).

Let’s define the ensemble forecast as the average of A through Z. Since forecasts that give an idea of uncertainty are better than forecasts which don’t, our ensemble forecast will use the spread of these models as an idea of the uncertainty.

We can go further and say that our uncertainty in the future temperature will be quantified by (say) a normal distribution1, which needs a central and a spread parameter. We’ll let the ensemble mean equal the central parameter and let the standard deviation of the ensemble equal the spread parameter.

This is an operational definition of a forecast. It is sane and comprehensible. The central parameter is not an estimate: we say it equals the ensemble mean. Same with the spread parameter: it is we who say what it is.

There is no “true” value of these parameters, which is why there are no estimates. Strike that: in one sense—perfection—there is a true value of the spread parameter, which is 0, and a true value of the central parameter, which is whatever (exactly) the temperature will be. But since we do not know the temperature in advance, there is no point to talking about “true” values.

Since there aren’t any “true” values (except in that degenerate sense), there are no estimates. Thus we have no interest in “independent and identically distributed models”, or in “random” or “uncorrelated samples” or any of that gobbledygook. There is no “abuse”, “horrendous” or otherwise, in the creation of this (potentially useful) forecast.

Listen: I could forecast tomorrow’s high temperature (quantify my uncertainty in its value) at Central Park with a normal with parameters 15o C (central) and 8o C (spread) every day forever. Just as you could thump your chest and say, every day from now until the Trump of Doom, the maximum will be 17o C (which is equivalent to central 17o C and spread 0o C).

Okay, so we have three forecasts in contention: the ensemble/normal, my unvarying normal, and your rigid normal. Who’s is better?

I don’t know, and neither do you.

It’s likely yours stinks, given our knowledge of past high temperatures (they aren’t always 17o C). But this isn’t proof it stinks. We’d have to wait until actual temperatures came in to say so. My forecast is not likely much better. It acknowledges more uncertainty than yours, but it’s still inflexible.

The ensemble will probably be best. It might be, as is usually the case with ensemble forecasts, that it will evince a steady bias: say it’s on average hot by 2o C. And it might be that the spread of the ensemble is too narrow; that is, the forecast will not be calibrated (calibration has several dimensions, none of which I will discuss today; look up my pal Tilmann Gneiting’s paper on the subject).

Bias and too-narrow spread are common failings of ensemble forecasts, but these can be fixed in the sense that the ensembles themselves go into a process which attempts a correction based on past performance and which outputs (something like) another normal distribution with modified parameters. Don’t sniff at this: this kind of correction is applied all the time to weather forecasts (it’s called MOS).

Now, are the original or adjusted ensemble forecasts any good? If so, then the models are probably getting the physics right. If not, then not. We have to check: do the validation and apply some proper score to them. Only that would tell us. We cannot, in any way, say they are wrong before we do the checking. They are certainly not wrong because they are ensemble forecasts. They could only be wrong if they fail to match reality. (The forecasts Roy S. had up a week or so ago didn’t look like they did too well, but I only glanced at his picture.)

Conclusion: ensemble forecasts are fine, even desirable since they acknowledge up front the uncertainty in the forecasts. Anything that gives a nod to chaos is a good thing.

Update Although it is true ensemble forecasting makes sense, I do NOT claim that they do well in practice for climate models. I also dispute the notion that we have to act before we are able to verify the models. That’s nuts. If that logic held, then we would have to act on any bizarre notion that took our fancy as long as we perceived it might be a big enough threat.

Come to think of it, that’s how politicians gain power.

Update I weep at the difficulty of explaining things. I’ve seen comments about this post on other sites. A few understand what I said, others—who I suspect want Brown to be right but aren’t bothering to be careful about the matter—did not. Don’t bother denying it. So many people say things like, “I don’t understand Brown, but I’m going to frame his post.” Good grief.

There are two separate matters here. Keep them that way.

ONE Do ensemble forecast make statistical sense? Yes. Yes, they do. Of course they do. There is nothing in the world wrong with them. It does NOT matter whether the object of the forecast is chaotic, complex, physical, emotional, anything. All that gibberish about “random samples of models” or whatever is meaningless. There will be no “b****-slapping” anybody. (And don’t forget ensembles were invented to acknowledge the chaotic nature of the atmosphere, as I said above.)

Forecasts are statements of uncertainty. Since we do not know the future state of the atmosphere, it is fine to say “I am uncertain about it.” We might even attach a number to this uncertainty. Why not? I saw somebody say something like “It’s wrong to say our uncertainty is 95% because the atmosphere is chaotic.” That’s as wrong as when a rabid progressive says, “There is no truth.”

TWO Are the ensemble models used in climate forecasts any good? They don’t seem to be; not for longer-range predictions (and don’t forget that ensembles can have just one member). Some climate model forecasts—those for a few months ahead—seem to have skill, i.e. they are good. Why deny the obvious? The multi-year ones look like they’re too hot.

If that’s so, that means when a fervent climatologists says, “The probability the global temperature will increase by 1 degree C over the next five years is 95%” he is making a statement which is too sure of itself. But that he can make such a statement—that it makes statistical sense to do so—is certain.

If you don’t believe this, you’re not thinking straight. After all, do you not believe yourself that the climatologist is too certain? If so, then you are equivalently making a statement of uncertainty about the future atmosphere. Even saying, “Nobody knows” is making a statement of uncertainty.

See the notes below this line and in my comments to others in the text.

——————————————————————————————-

1I pick the normal because of its ubiquity, not its appropriateness. Also, probability is not a real physical thing but a measure of uncertainty. Thus nothing—as in no thing—is “normally distributed”. Rather we quantify our uncertainty in the value of a thing with a normal. We say, “Given for the sake of argument that uncertainty in this thing is quantified by a normal, with this and that value of the central and spread parameter, the probability the thing equals X is 0.”

Little joke there. The probability of the thing equaling any—as in any—value is always and forevermore 0 for any normal. Normal distributions are weird.