William M. Briggs

Statistician to the Stars!

Page 3 of 548

Manly Things, Northern Michigan Edition

This post is one that has been restored after the hacking. All original comments were lost.

My number one son sent me the video above (linked here). Can you imagine being that happy over catching a fish?

Never mind that. Can you even put yourself in a small booth set over floating ice the whole day long, considering that lake ice might be unsafe! You could slip and end up with a concussion, or develop frost burn, or even hypothermia!

Remember when I told you of the time I brought a knife to school? Not a penknife or something goofy like a Swiss Army knife (that knife may be why that army often chooses neutrality). No, sir. A real knife. A knife capable of wounding, killing, eviscerating man and beast. Yes, and I wasn’t the only one. A score of us weapons platforms charged through school halls terrorizing…precisely no one.

No one thought anything of it. Boys with knifes? In northern Michigan this was unusual as an EPA bureaucrat inventing a new regulation. Now this was in 1980 and…wait. Boys? What about girls!?

Well, we surely had girls—I remember them fine—but I don’t recall any carrying knives. Besides, the reason we were armed was that we had just returned from a winter survival course. We had to trudge into the woods in groups of three boys—no girls—and stay alive by building our own shelters, making our own fires, and so forth.

Not one of us died, and my memory tells me none were maimed, either. Maybe we came back scuffed up a bit. We certainly stank. But the point is this. Can you imagine any high school allowing such a thing today? There must be one somewhere. But if so, it’s only because it hasn’t been caught yet.

We also had shop class. Welding was my favorite, though there were subtle pleasures to be had chopping things up with a blowtorch. You probably have no idea of the range of items such a tool can melt. But we did. We experimented freely. We braized. We used machines to fold thick metals. Drill presses were pressed into service. Enormous band saws sang as the deadly clawed blade zipped by at enormous speeds! These machines of destruction were even used to cut wood and, rumor had it, the last teacher’s thumb clean off. Blood, we heard, was everywhere.

One teacher, not too bright, decided to show us gasoline was not as dangerous as claimed. So he had a bucket of it brought over. He lit a match and quickly plunged his hand (yes, holding the lit match) into the liquid. I’m still here so you know how the experiment ended. I don’t recall any being brave enough to repeat the stunt.

The reason I wasn’t keen on this was because we regularly burned our trash, especially the wood from the deconstruction we were doing on our house. One fall afternoon my dad, dad’s dad, and I had the pit by the railroad tracks loaded, an especially big pile. My dad thought an entire can of gas would do the trick to get the blaze going properly.

The wood was soaked. We stood at the pit’s edge. My dad lit the match and tossed it in. Now what happened was the oddest thing. It was as if all sound stopped, like we were in a vacuum. The light from the flames which engulfed us was pleasant and warm. I only felt a wind as the flames subsided.

Our eyebrows and exposed hair was burnt. Awful smell. My dad’s first words were, “Don’t tell your mother.” My grandpa was more phlegmatic. “You used too much gas.” I’m not sure trash burning is still “allowed.”

We heated with wood in those days. Which meant going out and chopping it up. And that meant chain saws. Big ones. Don’t let the chain get too loose. The wood, dear reader, came home in the form of logs which had to be split. That meant sledges and wedges. Metal on metal. Flying shrapnel. The danger of crushed toes. My dad wore glasses to see, but I never did and, no, never goggles either.

You already know about the ever-present guns. We were allowed to trudge off into the woods, unsupervised—no cell phones! No way of finding us! We’d be gone for hours and hours!—and to kill animals with guns! No one ever got shot. Well, not really. We did used to play war with pellet and BB guns. That doesn’t count because the danger of real death was pretty small.


Pascal’s Pensées, A Tour: IV

PascalSince our walk through Summa Contra Gentiles is going so well, why not let’s do the same with Pascal’s sketchbook on what we can now call Thinking Thursdays. We’ll use the Dutton Edition, freely available at Project Gutenberg. (I’m removing that edition’s footnotes.)

Previous post.

9 When we wish to correct with advantage, and to show another that he errs, we must notice from what side he views the matter, for on that side it is usually true, and admit that truth to him, but reveal to him the side on which it is false. He is satisfied with that, for he sees that he was not mistaken, and that he only failed to see all sides. Now, no one is offended at not seeing everything; but one does not like to be mistaken, and that perhaps arises from the fact that man naturally cannot see everything, and that naturally he cannot err in the side he looks at, since the perceptions of our senses are always true.1

1Good advice for perpetual arguers like yours truly, who sometimes forgot it in the joy of battle, because what’s wrong with most everyday arguments are false or incomplete premises. Pascal’s dictum doesn’t work for every argument, of course. Some are so wrong, or the desire to believe a false conclusion so strong, that nothing short of divine grace will free a person from error. Do you really think that, unaided, you’ll bring a chiropractor or, say, an academic feminist to see what’s wrong with her stance? These folks are so far from the promised land that it isn’t even on their maps. This doesn’t make Pascal wrong, but winning people to the Truth is hard, brutal labor.

10 People are generally better persuaded by the reasons which they have themselves discovered than by those which have come into the mind of others.2

2This is why we have some sympathy whenever an educator rediscovers the truism that kids better grasp ideas they work out for themselves. Yet any kindly disposition we have is blown away the second the educator insists that all learning follow this regimen. If the child (or adult) isn’t provided with a solid foundation (a memorized one), almost no learning can follow. The student won’t have the tools to work things out and he won’t know when something is right and when it is wrong. And we are back to the first point.

11 All great amusements are dangerous to the Christian life; but among all those which the world has invented there is none more to be feared than the theatre. It is a representation of the passions so natural and so delicate that it excites them and gives birth to them in our hearts, and, above all, to that of love, principally when it is represented as very chaste and virtuous. For the more innocent it appears to innocent souls, the more they are likely to be touched by it. Its violence pleases our self-love, which immediately forms a desire to produce the same effects which are seen so well represented; and, at the same time, we make ourselves a conscience founded on the propriety of the feelings which we see there, by which the fear of pure souls is removed, since they imagine that it cannot hurt their purity to love with a love which seems to them so reasonable.

So we depart from the theatre with our heart so filled with all the beauty and tenderness of love, the soul and the mind so persuaded of its innocence, that we are quite ready to receive its first impressions, or rather to seek an opportunity of awakening them in the heart of another, in order that we may receive the same pleasures and the same sacrifices which we have seen so well represented in the theatre.3

3Initially, Pascal’s complaint reads like those of the nuns who schooled my father and who said the worst things kids did was to chew gum or get out of line in the halls. We are at the point where we long for the good old days where love was “represented as very chaste and virtuous.” So far into the wilderness are we that not one in five hundred could today share his fear of theater. It passes all imagination to see oneself marching down Broadway with a “Down with Tartuffe!” sign.

On the other hand, swap in television and movies for theatre and we see Pascal nails it. Particularly those whose of certain sexual natures who choose to violate natural law are portrayed as extraordinary loving wholly sympathetic creatures, superior to the rest of us (don’t think so? Read this). The situations into which these protagonists are thrust are so ludicrous that it would take an audience with a heart of stone not to root for them. No consequences are ever seen and anything that smacks of reality is expunged. Viewers are discouraged from thinking and tricked, manipulated into feeling, only feeling.

Art has always been recognized as dangerous.


Improper Language About Priors

A Christmas distribution of posteriors.

A Christmas distribution of posteriors. Image source.

Suppose you decided (almost surely by some ad hoc rule) that the uncertainty in some thing (call it y) is best quantified by a normal distribution with central parameter θ and spread 1. Never mind how any of this comes about. What is the value of θ? Nobody knows.

Before we go further, the proper answer to that question almost always should be: why should I care? After all, our stated goal was to understand the uncertainty in y, not θ. Besides, θ can never be observed; but y can. How much effort should we spend on something which is beside the point?

If you answered “oodles”, you might consider statistics as a profession. If you thought “some” was right, stick around.

Way it works is that data is gathered (old y) which is then used to say things, not about new y, but about θ. Turns out Bayes’s theorem requires an initial guess of the values of θ. The guess is called “a prior” (distribution): the language that is used to describe it is the main subject today.

Some insist that that the prior express “total ignorance”. What can that mean? I have a proposition (call it Q) about which I tell you nothing (other than it’s a proposition!). What is the probability Q is true? Well, given your total ignorance, there is none. You can’t consistent with the evidence say to yourself anything like, “Q has got to be contingent, therefore the probability Q is true is greater than 0 and less than 1.” Who said Q had to be contingent? You are in a state of “total ignorance” about Q: no probability exists.

The same is not true, and cannot be true, of θ. Our evidence positively tells us that “θ is a central parameter for a normal distribution.” There is a load of rich information in that proposition. We know lots about “normals”; how they give 0 probability to any observable, how they give non-zero probability to any interval on the real line, that θ expresses the central point and must be finite, and so on. It is thus impossible—as in impossible—for us to claim ignorance.

This makes another oft-heard phrase “non-informative prior” odd. I believe it originated from nervous once-frequentist recent converts to Bayesian theory. Frequentists hated (and still hate) the idea that priors could influence the outcome of an analysis (themselves forgetting nearly the whole of frequentist theory is ad hoc) and fresh Bayesians were anxious to show that priors weren’t especially important. Indeed, it can even be proved that in the face of rich and abundant information, the importance of the prior fades to nothing.

Information, alas, isn’t always abundant thus the prior can matter. And why shouldn’t it? More on that question in a moment. But because some think the prior should matter as little as possible, it is often suggested that the prior on θ should be “uniform”. That means that, just like the normal itself, the probability θ takes any value is zero, the probability of any interval is non-zero; it also means that all intervals of the same length have the same probability.

But this doesn’t work. Actually, that’s a gross understatement. It fails spectacularly. The uniform prior on θ is no longer a probability, proved easily by taking the integral of the density (which equals 1) over the real line, which turns out to be infinite. That kind of maneuver sends out what philosopher David Stove called “distress signals.” Those who want uniform priors are aware that they are injecting non-probability into a probability problem, but still want to retain “non-informatativity” so they call the result an “improper prior”. “Prior” makes it sound like it’s a probability, but “improper” acknowledges it isn’t. (Those who use improper priors justify them saying that the resultant posteriors are often, but not always, “proper” probabilities. Interestingly, “improper” priors in standard regression gives identical results, though of course interpreted differently, to classical frequentism.)

Why shouldn’t the prior be allowed to inform our uncertainty in θ (and eventually in y)? The only answer I can see is the one I already gave: residual frequentist guilt. It seems obvious that whatever definite, positive information we have about θ should be used, the results following naturally.

What definite information do we have? Well, some of that has been given. But all that ignores whatever evidence we have about the problem at hand. Why are we using normal distributions in the first place? If we’re using past y to inform about θ, that means we know something about the measurement process. Shouldn’t information like that be included? Yes.

Suppose the unit in which we’re measuring y is inches. Then suppose you have to communicate your findings to a colleague in France, a country which strangely prefers centimeters. Turns out that if you assumed, like the normal, θ was infinitely precise (i.e. continuous), the two answers—inches or centimeters—would give different probabilities to different intervals (suitably back-transformed). How can it be that merely changing units of measurement changes probabilities! Well, that’s a good question. It’s usually answered with a blizzard of mathematics (example), none of which allays the fears of Bayesian critics.

The problem is that we have ignored information. The yardstick we used is not infinitely precise, but has, like any measuring device anywhere, limitations. The best—as inbest—that we can do is to measure y from some finite set. Suppose this it to the nearest 1/16 of an inch. That means we can’t (or rather must) differentiate between 0″ and something less than 1/16″; it further means that we have some upper and lower limit. However we measure, the only possible results will fall into some finite set in any problem. Suppose this is 0″, 1/16″, 2/16″,…, 192/16″ (one foot; the exact units or set constituents do not matter, only that they exist does).

Well, 0″ = 0 cm, and 1/16″ = 0.15875 cm, and so on. Thus if the information was that any of the set were possible (in our next measurement of y), the probability of (say) 111/16″ is exactly the same as the probability of 17.6213 cm (we’ll always have to limit the number of digits in any number; thus 1/3 might in practice equal 0.333333 where the 3’s eventually end). And so on.

It turns out that if you take full account of the information, the units of measurement won’t matter! Notice also that the “prior” in this case was deduced from the available evidence; there was nothing ad hoc or “non-informative” about it at all (of course, other premises are possible leading to other deductions).

But then, with this information, we’re not really dealing with normal distributions. No parameters either: there is no θ in this setup. Ah. Is that so bad? We’ve given up the mathematical convenience continuity brings, but our reward is accuracy—and we never wander away from probability. We can still quantify the uncertainty in future (not yet seen) values of y given the old observations and knowledge of the measurement process, albeit at the price of more complicated formula (which seem more complicated than it really is at least because fewer people have worked on problems like these).

And we don’t really have to give up on continuity as an approximation. Here’s how it should work. First solve the problem at hand—quantifying the uncertainty in new (unseen) values of y given old ones and all the other premises available. I mean, calculate that exact answer. It will have some mathematical form, part of which will be dependent on the size or nature of the measurement process. Then let the number of elements in our measurement set grow “large”, i.e. take that formula to the limit (as recommended by, inter alia, Jaynes). Useful approximations will result. It will even be true that in some cases, the old stand-by, continuous-from-the-start answers will be rediscovered.

Best of all, we’ll have no distracting talk of “priors” and (parameter) “posteriors”. And we wouldn’t have to pretend continuous distributions (like the normal) are probabilities.


I Also Declare The Bayesian vs. Frequentist Debate Over For Data Scientists

LSMFT! What's the probability Santa prefers Luckies?

LSMFT! What’s the probability Santa prefers Luckies?

I stole the title, adding the word “also”, from an article by Rafael Irizarry at Simply Stats (tweeted by Diego Kuonen).

First, brush clearing. Data scientists. Sounds like galloping bureaucratic title inflation has struck again, no? Skip it.

Irizarry says, “If there is something Roger, Jeff and I agree on is that this debate is not constructive. As Rob Kass suggests it’s time to move on to pragmatism.” (Roger Peng and Jeff Leek co-run the blog; Rob Kass is a named person in statistics. Top men all.)

Pragmatism is a failed philosophy; as such, it cannot be relied on for anything. It says “use whatever works”, which has a nice sound to it (unlike “data scientist”), until you realize you’ve merely pushed the problem back one level. What does works mean?

No, really. However you form an answer will be philosophical at base. So we cannot escape having to have a philosophy of probability after all. There has to be some definite definition of works, thus also of probability, else the results we provide have no meaning.


Applied statisticians help answer questions with data. How should I design a roulette so my casino makes $? Does this fertilizer increase crop yield?…[skipping many good questions]… To do this we use a variety of techniques that have been successfully applied in the past and that we have mathematically shown to have desirable properties. Some of these tools are frequentist, some of them are Bayesian, some could be argued to be both, and some don’t even use probability. The Casino will do just fine with frequentist statistics, while the baseball team might want to apply a Bayesian approach to avoid overpaying for players that have simply been lucky.

Suppose a frequentist provides an answer to a casino. How does the casino interpret it? They must interpret it somehow. That means having a philosophy of probability. Same thing with the baseball team. Now this philosophy can be flawed, as many are, but it can be flawed in such a way that not much harm is done. That’s why it seems frequentism does not produce much harm for casinos and why the same is true for Bayesian approaches in player pay scales.

It’s even why approaches which “don’t even use probability” might not cause much harm. Incidentally, I’m guessing by “don’t use probability” Irizarry means some mathematical algorithm that spits out answers to given inputs, a comment I based on his use of “mathematically…desirable properties”. But this is to mistake mathematics for or as probability. Probability is not math.

There exists a branch of mathematics called probability (really measure theory) which is treated like any other branch; theorems proved, papers written, etc. But it isn’t really probability. The math only becomes probability when its applied to questions. At that point an interpretation, i.e. a philosophy, is needed. And it’s just as well to get the right one.

Why is frequentism the wrong interpretation? Because to say we can’t know any probability until the trump of doom sounds—a point in time which is theoretically infinitely far away—is silly. Why is Bayes the wrong interpretation? Well, it isn’t; not completely. The subjective version is.

Frequency can and should inform probability. Given the evidence, or premises, “In this box are six green interocitors and four red ones. One interocitor will be pulled from the box” the probability of “A green interocitor will be pulled” is 6/10. Even though there are no such things as interocitors. Hence no real relative frequencies.

Subjectivity is dangerous in probability. A subjective Bayesian could, relying on the theory, say, “I ate a bad burrito. The probability of pulling a green interocitor is 97.121151%”. How could you prove him wrong?

Answer: you cannot. Not if subjectivism is right. You cannot say his guess doesn’t “work”, because why? Because there are no interocitors. You can never do an “experiment.” Ah, but why would you want to? Experiments only work with observables, which are the backbone of science. But who said probability only had to be used in science? Well, many people do say it, at least by implication. That’s wrong, though.

The mistake is not only to improperly conflate mathematics with probability, but to confuse probability models with reality. We need be especially wary of the popular fallacy of assuming the parameters of probability models are reality (hence the endless consternation over “priors”). Although one should, as Irizarry insists, be flexible with the method one uses, we should always strive to get the right interpretation.

What’s the name of this correct way? Well, it doesn’t really have one. Logic, I suppose, à la Laplace, Keynes, Jaynes, Stove, etc. I’ve used this in the past, but come to think it’s limiting. Maybe the best name is probability as argument.

« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑