Bayes Is More Than Probably Right: An Answer To Senn; Part I

Stephen Senn very kindly answered a post I wrote on p-values (Unsignificant Statistics: Or Die P-Value, Die Die Die) by sending me his “You May Believe You Are a Bayesian But You Are Probably Wrong” (in Rationality, Markets and Morals).

Since I will be teaching at Cornell these two weeks, and the topics are the same, I will use part of this time to answer his paper in depth.

It would be best to start here Subjective

Stephen Senn
Stephen Senn
Versus Objective Bayes (Versus Frequentism): Part I, since that series explains matters in greater detail.


Senn went wrong before he even began, with his title: “You May Believe You Are a Bayesian But You Are Probably Wrong.” If you are only “probably wrong” about your belief then you also might be right. And if you were certainly wrong, then we would have a proof which says so. A proof is a string of deductions, i.e. a valid and sound argument, which begins with obviously true premises (agreed to by all) and ends at a proposition we must believe—even if we don’t want to.

Senn does not have, nor does he claim to have, a proof which shows being a Bayesian is certainly wrong. It is only his best guess that this philosophy is wrong. Probably wrong. So here we are, already at probability. What could Senn mean by his probabilistic statement “probably wrong”? (Besides the pun, I mean.) It can’t be any kind of frequentist statement, as in “I’ve collected a ‘random sample’ of Bayesian philosophies, itself embedded in an infinite sequence of such philosophies, and the mean of this sample (considering errors in theory equal to zero) tends towards zero.” That makes no kind of sense, as I’m sure Senn would agree, but it would have to if probability was frequentism.

Bayesian philosophy, at best, comes in a finite number of flavors. It could be that some of these are false (I agree subjectivism, as it is usually understood, is), but in no way can we imagine any individual theory as being embedded in an infinite sequence of theories, which is required for frequentist theory to hold. No: either we can prove each theory true or false, or our evidence is not (yet?) sufficient, and thus we are only probably sure each theory is true or false. This sounds like a Bayesian statement, no? (If so, do we fail because of self-reference? Well, no, because we can build this theory from simpler propositions.)

It could be that Senn took a subjective Bayesian tack when he formed his title, or perhaps he took a logical probability, or objective, Bayesian one. (Incidentally, I’ll call this latter theory LPB for short.) Or he could have meant some as yet unknown (or at least unidentified) theory. Whatever it was, it couldn’t have been frequentism, as shown.

His leading candidate is eclecticism (Senn is not frequentist), which is one of two things. One is no belief at all. It means “I’ll do whatever I want whenever it seems good to me.” There is no theory here to disprove, nor prove. To say “I’m an eclectic” this way means “I don’t want to argue for anything, just against things.” Since we go nowhere engaging with this “theory”, we pass on to number two. This is to say, “I’ll take a little of that, some of this, and some of the other.” Here we have several sub-theories. As such, this kind of eclecticism is actually a whole theory (the compilation of sub-theories) which might be true or false. Thus Senn might have used Bayes for his title and he might use frequentism for (say) dice tosses.

Senn recalls that Fisher himself was “skeptical” of attempts to unify probability. Hacking, another Big Cheese, in line with other well-aged curds, is of the same opinion. Why should we have a theory? Why not many? The obvious answer to this is that there is that which is true and that which is false and we should seek the truth. If it turns out a theory of probability works for all kinds of uncertainty, we’re stuck with it. If it must be that several theories are true, then we must accept them all. But it’s wrong to use desire or suspicion as proof there are many and not one theory.

Senn himself proved that frequentism is out (and forever) as a complete theory of probability because it cannot handle propositions like his “probably wrong.” But this isn’t proof that Bayes everywhere right; not yet. Senn’s later examples might be sufficient to show all versions of Bayes are wrong, in which case some other theory must be true.

But we’ll have to see next time, because we’re already out of space, and because next topic isn’t simple.

Cornell Teaching Sojourn: Probability, Stats, & R

Time for the annual migration to Ithaca via a well accoutered golden coach (complete with undergraduates feeding professors grapes grown at Cornell’s orchards). There I will linger for two weeks, ruling as benign and loving dictator over ILRST 5150, i.e. Statistical Research Methods in ILR’s MPS program.

The class works by me holding forth with dulcet but brief pontifications followed by intense questioning of the students, as a cop might grill a suspect. “What did I just say? What in the dark-mattered universe do you think I meant by that? Have you signed up for the wine tour yet?”

The wine tour—completely unofficial and off the books—ends Week One with a journey to several Finger Lakes wineries to sample their wares. To be cruelly honest, many of these are poor. If the wines aren’t sour and vinegary, they are so sweet you could stand a teaspoon up in them. One unbearable vineyard (the name of which is hidden in a riddle) produces nothing but pinkish paint thinner. But everywhere the wines are wet and contain (among other chemicals) ethanol, which is welcome after five full days of statistics statistics statistics and with another week of the same to come.

(But there are dangers, too. At one stop on the wine trail, I was once nearly abducted by a bachelorette party and had to be rescued by one of my students.)

The class contains almost no math and certainly no memorization of formulas. I figure the computer can do those things for you, and that time spent proving things mathematically removes time spent in understanding what probability is and learning the strengths and limitations of statistics. As regular readers know, the latter are many, nefarious, and ubiquitous.

I have only one or two canned examples. The rest have to be provided by the students themselves. This eliminates having to figure out a whole new field and its data and how to describe its uncertainty. Besides, textbook examples are far too neat, even coy. Better to see how messy, compromising, and ambiguous collecting data is. Gives a far better appreciation of the ease of making mistakes and the resultant over-confidence.

I teach R; successfully, too. Yes, it is a programming language, but that is its great advantage. I was able to teach R to a man who did not know what a spreadsheet was and could not type. He did not own a computer. This wasn’t because of my ability, but because learning the rudiments of any logical programming language is something almost anybody can do. (I do not include SAS in this list; it is an appalling language.)

Following my custom, for the next two weeks posts will reflect, broadly or in detail, what is going on in the class. I won’t have time to do anything more. Feel free to ask questions, but understand I might not be able to get to all of them.

Update A good joke.

Tobacco Ads Could Lead To Cancer Cure

A sure cure for cancer?
A sure cure for cancer?
Today’s headline is true. True means that which is certain, without the possibility of error; that which is not false; that which accords with reality. It means that which is so even if you don’t want it to be; even, that is, if you have attended a sacred Raising Awareness ceremony about the evils of tobacco.

Yes: it is true that the next tobacco ad you see could lead to a cure for cancer. How?

I haven’t the slightest idea. I don’t have to know how, either. The headline is still true as long as somebody, sometime, somewhere could describe how, even if the description is only “in theory.”

My powers of imagination are weak, so I’ll rely on you to divine the path from tobacco ad to cancer cure. What I’ll do instead is distract you from thinking about this difficulty and talk about the glories of a cancer-free world.

Hey! No more cancer! Now that would be a fine thing. Right? No more pain, no suffering, no tears, grief, misery. No more mothers burying their blighted-by-disease daughters.

Like Sally Q. Evalston, 42, a Pinewood, Illinois elementary school teacher, beloved three-time winner of Teacher of the Year, who was carried away before her time with capital-C Cancer (which she “battled” with). Just you think about her. Look at her picture, feel for her mother, weep with her students.

This is the sort of tragedy that could be avoided thanks to our truthful headline. Admit it: you feel good thinking about this, don’t you? Isn’t it nice to be part of the cure for cancer, albeit in small proportion? Maybe you can email your Congressman (or woman!) and let him know you’re on his side, that you’d support him if he voted to increase funding for tobacco advertising. You could at least frown with severe disapproval at the next person you meet who suggests he’d rather not see more tobacco ads.

Assimilated all that? Then here is another true headline, “Tobacco Ads Could Lead to Daily Teen Smoking for Kids 14 and Under“.

Wait. Didn’t we just say that tobacco ads could cure cancer and now a rival claim says these same ads could cause kiddies to smoke? We did: both headlines are true. And so is is true that “Tobacco Ads Could Lead To More Cancer”. Just as it is true as true that “Tobacco Ads Could Lead To Mars Mission” or “Tobacco Ads Could Cause Nancy Pelosi To Stop Speaking Gibberish.”

The magic happens in could. Adding it—or might, may, possibly or the like—turns any proposition about the contingent into a truth. (Contingent = not logically necessary.) Anything contingent could or might be true; that is the nature of contingency. So adding a word like could in a contingent proposition merely makes the proposition tautological, and all tautologies are true.

Headlines like today’s are cheap journalist tricks; one of the most common, too. “Could Lead To” headlines and ledes betray the reporter’s prejudices and desires and make at best weak claims about reality. And the following articles usually fall prey to the standard human failing of searching only for supportive evidence, assuming that contradictory theories are the first refuge of scoundrels and “deniers.” No idea of the uncertainty in the claim of the headline ever appears.

Just for fun, I did a search on “Could Lead To” (surrounded by quotes; try this yourself). “Repetitive soccer ball ‘heading’ could lead to brain injury”, “10 nail deformities that could lead to bigger health problems”, “Heavy rain could lead to explosion in mosquito population”, “NYCHA Budget Cuts Could Lead To 500 Jobs Lost”, “Crowdfunding help could lead to a sandwich named after you”, “NHS changes could lead to hospital being sponsored by junk food firms.” An endless, ever-increasing stream.

And isn’t it curious that all of these, tacitly or directly, argue for government intervention?

Unsignificant Statistics: Or Die P-Value, Die Die Die

“My p-value was this big.”
Must…resist…quoting… from Stephen Ziliak’s gorgeous invective “Unsignificant Statistics” (where I stole today’s title) in the Financial Post.

Well, just a little (all emphasis mine and joyfully placed):

Statistical significance is junk science, and its big piles of nonsense are spoiling the research of more than particle physicists…

But here is something you can believe, and will want to: Statistical significance stinks

The null hypothesis test procedure is not the only test of significance but it is the most commonly used and abused of all the tests. From the get go, the test of statistical significance asks the wrong question

In framing the quantitative question the way they do, the significance-testing scientists have unknowingly reversed the fundamental equation of statistics. Believe it or not, they have transposed their hypothesis and data, forcing them to grossly distort the magnitudes of probable events…

They have fallen for a mistaken logic called in statistics the “fallacy of the transposed conditional.”

And that’s just the first part. I couldn’t finish the second because my eyes were overflowing with happy tears.

Ziliak and pal Deirdre McCloskey, incidentally, co-authored the must-read The Cult of Statistical Significance.

Cult, they say. Cult because there is an initiation at high price. Cult because statistical “significance” is invoked by occult incantations, the meaning of which has been lost in the mists of time. Cult because these things can not be questioned!

The p-value is a mysterious, magical threshold, an entity which lives, breathes, and gazes sternly over spreadsheets; a number gifted to us by the great, mysterious god Stochastikos1. It was he who decreed that great saying, “Oh-point-oh-five and thrive; Oh-point-oh-six and nix.”

Adepts know the meaning of this shorthand. So 0.050000001 is sufficient to cast a result outside the gates where there is weeping and gnashing of teeth. Yet 0.04999999 produces bliss of the kind had when the IRS decides not to audit.

Members cannot be identified by dress but by their manner of speaking. Clues are evasiveness and glib over-confidence. They say, “The probability my hypothesis is true is Amen” when what they mean is “Given my hypothesis is false, here is the value of an obscure function—one of many I could have picked—applied to the data assuming the model which quantifies its uncertainty is certainly true and that one of its parameters is set to zero and assuming I could regather my data in the same manner but randomly different ad infinitum.”

In the hands of a master, more significant p-values can be squeezed out of a set of data than donations Al Sharpton can secure by marching into an all-white corporation’s board room.

“Statistically significant” does not imply true nor useful nor even interesting. “Significance” is a fog which emanates from a computerized thurible, thick and pungent. It obscures and conceals. It woos and insinuates. It distracts. It is a mathematical sleight-of-hand, a trick. It takes the eye from the direct evidence at hand and refocuses it on the pyrotechnics of p-values. So delighted is the audience at seeing wee p-values that all memory of the point of a study vanishes.

Statistical significance is so powerful that it can prove both a hypothesis and its contrary simultaneously. One day it pronounces broccoli as the awful cause of splentic fever and tomorrow it is asserts unequivocally that broccoli is the only sane cure for the disease.

Both results will be accepted and believed, especially by those manning (and womanning!) bureaucracies and press rooms. Journalists won’t tell you about the deadly effect of either until 10 p.m. Government minions will latch gratefully on to anything “significant” as proof their budget (and therefore power) should be increased.

Time for statistical significance to be slain, its bones cremated, and its ashes scattered in secret. No trace should remain lest the infection re-spread. The only word of it should appear in Latin in tomes guarded by monks charged with collecting man’s (and woman’s!) intellectual follies.

Update Wuhahaha!


Thanks to Steve E for finding Ziliak’s piece.

1I didn’t think of this; I recall the name from the old usenet days.