William M. Briggs

Statistician to the Stars!

Pascal’s Pensées, A Tour: IV

PascalSince our walk through Summa Contra Gentiles is going so well, why not let’s do the same with Pascal’s sketchbook on what we can now call Thinking Thursdays. We’ll use the Dutton Edition, freely available at Project Gutenberg. (I’m removing that edition’s footnotes.)

Previous post.

9 When we wish to correct with advantage, and to show another that he errs, we must notice from what side he views the matter, for on that side it is usually true, and admit that truth to him, but reveal to him the side on which it is false. He is satisfied with that, for he sees that he was not mistaken, and that he only failed to see all sides. Now, no one is offended at not seeing everything; but one does not like to be mistaken, and that perhaps arises from the fact that man naturally cannot see everything, and that naturally he cannot err in the side he looks at, since the perceptions of our senses are always true.1

1Good advice for perpetual arguers like yours truly, who sometimes forgot it in the joy of battle, because what’s wrong with most everyday arguments are false or incomplete premises. Pascal’s dictum doesn’t work for every argument, of course. Some are so wrong, or the desire to believe a false conclusion so strong, that nothing short of divine grace will free a person from error. Do you really think that, unaided, you’ll bring a chiropractor or, say, an academic feminist to see what’s wrong with her stance? These folks are so far from the promised land that it isn’t even on their maps. This doesn’t make Pascal wrong, but winning people to the Truth is hard, brutal labor.

10 People are generally better persuaded by the reasons which they have themselves discovered than by those which have come into the mind of others.2

2This is why we have some sympathy whenever an educator rediscovers the truism that kids better grasp ideas they work out for themselves. Yet any kindly disposition we have is blown away the second the educator insists that all learning follow this regimen. If the child (or adult) isn’t provided with a solid foundation (a memorized one), almost no learning can follow. The student won’t have the tools to work things out and he won’t know when something is right and when it is wrong. And we are back to the first point.

11 All great amusements are dangerous to the Christian life; but among all those which the world has invented there is none more to be feared than the theatre. It is a representation of the passions so natural and so delicate that it excites them and gives birth to them in our hearts, and, above all, to that of love, principally when it is represented as very chaste and virtuous. For the more innocent it appears to innocent souls, the more they are likely to be touched by it. Its violence pleases our self-love, which immediately forms a desire to produce the same effects which are seen so well represented; and, at the same time, we make ourselves a conscience founded on the propriety of the feelings which we see there, by which the fear of pure souls is removed, since they imagine that it cannot hurt their purity to love with a love which seems to them so reasonable.

So we depart from the theatre with our heart so filled with all the beauty and tenderness of love, the soul and the mind so persuaded of its innocence, that we are quite ready to receive its first impressions, or rather to seek an opportunity of awakening them in the heart of another, in order that we may receive the same pleasures and the same sacrifices which we have seen so well represented in the theatre.3

3Initially, Pascal’s complaint reads like those of the nuns who schooled my father and who said the worst things kids did was to chew gum or get out of line in the halls. We are at the point where we long for the good old days where love was “represented as very chaste and virtuous.” So far into the wilderness are we that not one in five hundred could today share his fear of theater. It passes all imagination to see oneself marching down Broadway with a “Down with Tartuffe!” sign.

On the other hand, swap in television and movies for theatre and we see Pascal nails it. Particularly those whose of certain sexual natures who choose to violate natural law are portrayed as extraordinary loving wholly sympathetic creatures, superior to the rest of us (don’t think so? Read this). The situations into which these protagonists are thrust are so ludicrous that it would take an audience with a heart of stone not to root for them. No consequences are ever seen and anything that smacks of reality is expunged. Viewers are discouraged from thinking and tricked, manipulated into feeling, only feeling.

Art has always been recognized as dangerous.

Improper Language About Priors

A Christmas distribution of posteriors.

A Christmas distribution of posteriors. Image source.

Suppose you decided (almost surely by some ad hoc rule) that the uncertainty in some thing (call it y) is best quantified by a normal distribution with central parameter θ and spread 1. Never mind how any of this comes about. What is the value of θ? Nobody knows.

Before we go further, the proper answer to that question almost always should be: why should I care? After all, our stated goal was to understand the uncertainty in y, not θ. Besides, θ can never be observed; but y can. How much effort should we spend on something which is beside the point?

If you answered “oodles”, you might consider statistics as a profession. If you thought “some” was right, stick around.

Way it works is that data is gathered (old y) which is then used to say things, not about new y, but about θ. Turns out Bayes’s theorem requires an initial guess of the values of θ. The guess is called “a prior” (distribution): the language that is used to describe it is the main subject today.

Some insist that that the prior express “total ignorance”. What can that mean? I have a proposition (call it Q) about which I tell you nothing (other than it’s a proposition!). What is the probability Q is true? Well, given your total ignorance, there is none. You can’t consistent with the evidence say to yourself anything like, “Q has got to be contingent, therefore the probability Q is true is greater than 0 and less than 1.” Who said Q had to be contingent? You are in a state of “total ignorance” about Q: no probability exists.

The same is not true, and cannot be true, of θ. Our evidence positively tells us that “θ is a central parameter for a normal distribution.” There is a load of rich information in that proposition. We know lots about “normals”; how they give 0 probability to any observable, how they give non-zero probability to any interval on the real line, that θ expresses the central point and must be finite, and so on. It is thus impossible—as in impossible—for us to claim ignorance.

This makes another oft-heard phrase “non-informative prior” odd. I believe it originated from nervous once-frequentist recent converts to Bayesian theory. Frequentists hated (and still hate) the idea that priors could influence the outcome of an analysis (themselves forgetting nearly the whole of frequentist theory is ad hoc) and fresh Bayesians were anxious to show that priors weren’t especially important. Indeed, it can even be proved that in the face of rich and abundant information, the importance of the prior fades to nothing.

Information, alas, isn’t always abundant thus the prior can matter. And why shouldn’t it? More on that question in a moment. But because some think the prior should matter as little as possible, it is often suggested that the prior on θ should be “uniform”. That means that, just like the normal itself, the probability θ takes any value is zero, the probability of any interval is non-zero; it also means that all intervals of the same length have the same probability.

But this doesn’t work. Actually, that’s a gross understatement. It fails spectacularly. The uniform prior on θ is no longer a probability, proved easily by taking the integral of the density (which equals 1) over the real line, which turns out to be infinite. That kind of maneuver sends out what philosopher David Stove called “distress signals.” Those who want uniform priors are aware that they are injecting non-probability into a probability problem, but still want to retain “non-informatativity” so they call the result an “improper prior”. “Prior” makes it sound like it’s a probability, but “improper” acknowledges it isn’t. (Those who use improper priors justify them saying that the resultant posteriors are often, but not always, “proper” probabilities. Interestingly, “improper” priors in standard regression gives identical results, though of course interpreted differently, to classical frequentism.)

Why shouldn’t the prior be allowed to inform our uncertainty in θ (and eventually in y)? The only answer I can see is the one I already gave: residual frequentist guilt. It seems obvious that whatever definite, positive information we have about θ should be used, the results following naturally.

What definite information do we have? Well, some of that has been given. But all that ignores whatever evidence we have about the problem at hand. Why are we using normal distributions in the first place? If we’re using past y to inform about θ, that means we know something about the measurement process. Shouldn’t information like that be included? Yes.

Suppose the unit in which we’re measuring y is inches. Then suppose you have to communicate your findings to a colleague in France, a country which strangely prefers centimeters. Turns out that if you assumed, like the normal, θ was infinitely precise (i.e. continuous), the two answers—inches or centimeters—would give different probabilities to different intervals (suitably back-transformed). How can it be that merely changing units of measurement changes probabilities! Well, that’s a good question. It’s usually answered with a blizzard of mathematics (example), none of which allays the fears of Bayesian critics.

The problem is that we have ignored information. The yardstick we used is not infinitely precise, but has, like any measuring device anywhere, limitations. The best—as inbest—that we can do is to measure y from some finite set. Suppose this it to the nearest 1/16 of an inch. That means we can’t (or rather must) differentiate between 0″ and something less than 1/16″; it further means that we have some upper and lower limit. However we measure, the only possible results will fall into some finite set in any problem. Suppose this is 0″, 1/16″, 2/16″,…, 192/16″ (one foot; the exact units or set constituents do not matter, only that they exist does).

Well, 0″ = 0 cm, and 1/16″ = 0.15875 cm, and so on. Thus if the information was that any of the set were possible (in our next measurement of y), the probability of (say) 111/16″ is exactly the same as the probability of 17.6213 cm (we’ll always have to limit the number of digits in any number; thus 1/3 might in practice equal 0.333333 where the 3’s eventually end). And so on.

It turns out that if you take full account of the information, the units of measurement won’t matter! Notice also that the “prior” in this case was deduced from the available evidence; there was nothing ad hoc or “non-informative” about it at all (of course, other premises are possible leading to other deductions).

But then, with this information, we’re not really dealing with normal distributions. No parameters either: there is no θ in this setup. Ah. Is that so bad? We’ve given up the mathematical convenience continuity brings, but our reward is accuracy—and we never wander away from probability. We can still quantify the uncertainty in future (not yet seen) values of y given the old observations and knowledge of the measurement process, albeit at the price of more complicated formula (which seem more complicated than it really is at least because fewer people have worked on problems like these).

And we don’t really have to give up on continuity as an approximation. Here’s how it should work. First solve the problem at hand—quantifying the uncertainty in new (unseen) values of y given old ones and all the other premises available. I mean, calculate that exact answer. It will have some mathematical form, part of which will be dependent on the size or nature of the measurement process. Then let the number of elements in our measurement set grow “large”, i.e. take that formula to the limit (as recommended by, inter alia, Jaynes). Useful approximations will result. It will even be true that in some cases, the old stand-by, continuous-from-the-start answers will be rediscovered.

Best of all, we’ll have no distracting talk of “priors” and (parameter) “posteriors”. And we wouldn’t have to pretend continuous distributions (like the normal) are probabilities.

I Also Declare The Bayesian vs. Frequentist Debate Over For Data Scientists

LSMFT! What's the probability Santa prefers Luckies?

LSMFT! What’s the probability Santa prefers Luckies?

I stole the title, adding the word “also”, from an article by Rafael Irizarry at Simply Stats (tweeted by Diego Kuonen).

First, brush clearing. Data scientists. Sounds like galloping bureaucratic title inflation has struck again, no? Skip it.

Irizarry says, “If there is something Roger, Jeff and I agree on is that this debate is not constructive. As Rob Kass suggests it’s time to move on to pragmatism.” (Roger Peng and Jeff Leek co-run the blog; Rob Kass is a named person in statistics. Top men all.)

Pragmatism is a failed philosophy; as such, it cannot be relied on for anything. It says “use whatever works”, which has a nice sound to it (unlike “data scientist”), until you realize you’ve merely pushed the problem back one level. What does works mean?

No, really. However you form an answer will be philosophical at base. So we cannot escape having to have a philosophy of probability after all. There has to be some definite definition of works, thus also of probability, else the results we provide have no meaning.


Applied statisticians help answer questions with data. How should I design a roulette so my casino makes $? Does this fertilizer increase crop yield?…[skipping many good questions]… To do this we use a variety of techniques that have been successfully applied in the past and that we have mathematically shown to have desirable properties. Some of these tools are frequentist, some of them are Bayesian, some could be argued to be both, and some don’t even use probability. The Casino will do just fine with frequentist statistics, while the baseball team might want to apply a Bayesian approach to avoid overpaying for players that have simply been lucky.

Suppose a frequentist provides an answer to a casino. How does the casino interpret it? They must interpret it somehow. That means having a philosophy of probability. Same thing with the baseball team. Now this philosophy can be flawed, as many are, but it can be flawed in such a way that not much harm is done. That’s why it seems frequentism does not produce much harm for casinos and why the same is true for Bayesian approaches in player pay scales.

It’s even why approaches which “don’t even use probability” might not cause much harm. Incidentally, I’m guessing by “don’t use probability” Irizarry means some mathematical algorithm that spits out answers to given inputs, a comment I based on his use of “mathematically…desirable properties”. But this is to mistake mathematics for or as probability. Probability is not math.

There exists a branch of mathematics called probability (really measure theory) which is treated like any other branch; theorems proved, papers written, etc. But it isn’t really probability. The math only becomes probability when its applied to questions. At that point an interpretation, i.e. a philosophy, is needed. And it’s just as well to get the right one.

Why is frequentism the wrong interpretation? Because to say we can’t know any probability until the trump of doom sounds—a point in time which is theoretically infinitely far away—is silly. Why is Bayes the wrong interpretation? Well, it isn’t; not completely. The subjective version is.

Frequency can and should inform probability. Given the evidence, or premises, “In this box are six green interocitors and four red ones. One interocitor will be pulled from the box” the probability of “A green interocitor will be pulled” is 6/10. Even though there are no such things as interocitors. Hence no real relative frequencies.

Subjectivity is dangerous in probability. A subjective Bayesian could, relying on the theory, say, “I ate a bad burrito. The probability of pulling a green interocitor is 97.121151%”. How could you prove him wrong?

Answer: you cannot. Not if subjectivism is right. You cannot say his guess doesn’t “work”, because why? Because there are no interocitors. You can never do an “experiment.” Ah, but why would you want to? Experiments only work with observables, which are the backbone of science. But who said probability only had to be used in science? Well, many people do say it, at least by implication. That’s wrong, though.

The mistake is not only to improperly conflate mathematics with probability, but to confuse probability models with reality. We need be especially wary of the popular fallacy of assuming the parameters of probability models are reality (hence the endless consternation over “priors”). Although one should, as Irizarry insists, be flexible with the method one uses, we should always strive to get the right interpretation.

What’s the name of this correct way? Well, it doesn’t really have one. Logic, I suppose, à la Laplace, Keynes, Jaynes, Stove, etc. I’ve used this in the past, but come to think it’s limiting. Maybe the best name is probability as argument.

NYC Protesters: ‘What Do We Want? Dead Cops!’ What They’ll Get Is Something Else

“Comedy” (or perhaps “farce”) is a good word to describe the human predicament given events like Sony (the hacked company which revealed emails with lame jokes) studio head and noted progressive Amy Pascal having to “run to Al Sharpton next week to beg for forgiveness“.

Run to Al Sharpton next week to beg for forgiveness! AL SHARPTON!

Now if you don’t find that hilarious, then your imagination is stunted, your sense of history and proportion is sadly narrow, and your educational upbringing largely constituted of propaganda. You’re probably also a danger to yourself; at the least your mental well being is suspect, and you might, if you are exceptionally far gone, even be a danger to society.

Incidentally, it’s one thing to pretend to be “shocked”, “sickened”, “horrified”, suicidal or worse by these emails. So much is expected because of the certifiably insane, really quite lunatic, way race relations are in this country. But if you really are “shocked” etc., then baby, you are a — — — — —. I can’t say the word in an open forum. But you know damn well what it is. We’re raising a nation of mollycoddled half-witted hyper-sensitive snivelling weepy perpetual children. Sticks and stones may break their bones, but microaggressive names will plunge them into sickening (to witnesses) despair. Idiots.

Remember when we discussed herd immunity and the waning of Christianity? How plenty of folks, particularly young ones, safely ensconced in the remnants of Christian structures but ignorant of that fact, think everything will be fine, nay superior, once we throw off the final “religious shackles” and judge morality based on enlightened voting? And how, because few have any memory what Western civilization was like without the civilizing influence of Christianity (never being taught any history that wasn’t ideological), everybody is in for a rude surprise in, say, thirty to fifty years?

Same sort of thing applies to the idiot children (some of them fully grown) who marched through my backyard Saturday night chanting, “What do we want? Dead cops?” Dead cops, mark you. They also, further cementing the poop-filled diaper metaphor (or should I say reality?), chanted that they wanted it “now”. (Another contingent of mentally challenged did similar things in Berkeley. It’s unclear whether they received class credit for it.)

Some of them meant that “now.” Some officers were assaulted. The tweet above (linked here if you can’t see it) shows one bag of confiscated hammers, which some homicidal maniac (I speak literally) brought to use on the skulls of police (see this).

See if you can put yourself in the shoes of the cop who has just fingered the collar of the bloodlusting fool with the deadly weapons. The miscreant is probably foaming at the mouth and resisting arrest, probably shouting some standard activist slogan like “Kill! Kill! Kill!”. You get the idea.

If I were that cop and I thought nobody was looking, I’d be severely tempted to show that young man the kind of damage hammers can inflict. I’d call it a science demonstration. Tempted. I don’t think I’d do it unless the contemptible halfwit actually struck me and I felt I was defending myself.

Multiply this temptation across the hundreds to thousands of cops at these many “demonstrations” (demonstrating what? sublime unteachable ignorance?). Are you not as surprised as I that nobody has yet received what was coming to him? If the demonstrations continue, it’s bound to happen, and when it does the media will trumpet it as “brutality.” This will prove beyond any doubt any civilized person might yet cherish that journalists are among the least of us. But skip that.

Suppose these mental mole hills are granted their wish and we have a few dead cops and nobody does anything about it (like our self-styled communist mayor). Further suppose that cops, having had “sensitivity” grilled into them, pull back from black neighborhoods and stop the arrests. What will happen?

For whatever reasons, blacks commit far more crime than any other group (see the FBI stats here and here). This is just the raw statistics talking—and they’re talking an order of magnitude for violent crime, more for lesser crimes. Stop arresting criminals and the crime rate climbs. Where it peaks nobody knows.

The “demonstrators” don’t know now how good they have it. Herd immunity. The police have done these past few decades a marvelous albeit imperfect job (name one profession which is perfect). Remove police, lower their moral, let crime increase, and what happens? Nothing good.

One day these silly children are going to go too far and some authority is going to, as C Northcote Parkinson colorfully put it, “have the moral courage” to fire into the crowd. It’s at that point the “demonstrators” will get the exact opposite of their “demands.”

Update The fellow behind the hammers is—are you ready?—a CUNY professor and “poet”. What rhymes with (his words) “F*** the police”?

Update Although the points made here have nothing to do with it, let’s acknowledge racism does exist in America, and it sure is ugly. For instance this and this from a NYC official who said “Racist NYPD *******”.

« Older posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑