William M. Briggs

Statistician to the Stars!

Author: Briggs (page 2 of 543)

Improper Language About Priors

A Christmas distribution of posteriors.

A Christmas distribution of posteriors. Image source.

Suppose you decided (almost surely by some ad hoc rule) that the uncertainty in some thing (call it y) is best quantified by a normal distribution with central parameter θ and spread 1. Never mind how any of this comes about. What is the value of θ? Nobody knows.

Before we go further, the proper answer to that question almost always should be: why should I care? After all, our stated goal was to understand the uncertainty in y, not θ. Besides, θ can never be observed; but y can. How much effort should we spend on something which is beside the point?

If you answered “oodles”, you might consider statistics as a profession. If you thought “some” was right, stick around.

Way it works is that data is gathered (old y) which is then used to say things, not about new y, but about θ. Turns out Bayes’s theorem requires an initial guess of the values of θ. The guess is called “a prior” (distribution): the language that is used to describe it is the main subject today.

Some insist that that the prior express “total ignorance”. What can that mean? I have a proposition (call it Q) about which I tell you nothing (other than it’s a proposition!). What is the probability Q is true? Well, given your total ignorance, there is none. You can’t consistent with the evidence say to yourself anything like, “Q has got to be contingent, therefore the probability Q is true is greater than 0 and less than 1.” Who said Q had to be contingent? You are in a state of “total ignorance” about Q: no probability exists.

The same is not true, and cannot be true, of θ. Our evidence positively tells us that “θ is a central parameter for a normal distribution.” There is a load of rich information in that proposition. We know lots about “normals”; how they give 0 probability to any observable, how they give non-zero probability to any interval on the real line, that θ expresses the central point and must be finite, and so on. It is thus impossible—as in impossible—for us to claim ignorance.

This makes another oft-heard phrase “non-informative prior” odd. I believe it originated from nervous once-frequentist recent converts to Bayesian theory. Frequentists hated (and still hate) the idea that priors could influence the outcome of an analysis (themselves forgetting nearly the whole of frequentist theory is ad hoc) and fresh Bayesians were anxious to show that priors weren’t especially important. Indeed, it can even be proved that in the face of rich and abundant information, the importance of the prior fades to nothing.

Information, alas, isn’t always abundant thus the prior can matter. And why shouldn’t it? More on that question in a moment. But because some think the prior should matter as little as possible, it is often suggested that the prior on θ should be “uniform”. That means that, just like the normal itself, the probability θ takes any value is zero, the probability of any interval is non-zero; it also means that all intervals of the same length have the same probability.

But this doesn’t work. Actually, that’s a gross understatement. It fails spectacularly. The uniform prior on θ is no longer a probability, proved easily by taking the integral of the density (which equals 1) over the real line, which turns out to be infinite. That kind of maneuver sends out what philosopher David Stove called “distress signals.” Those who want uniform priors are aware that they are injecting non-probability into a probability problem, but still want to retain “non-informatativity” so they call the result an “improper prior”. “Prior” makes it sound like it’s a probability, but “improper” acknowledges it isn’t. (Those who use improper priors justify them saying that the resultant posteriors are often, but not always, “proper” probabilities. Interestingly, “improper” priors in standard regression gives identical results, though of course interpreted differently, to classical frequentism.)

Why shouldn’t the prior be allowed to inform our uncertainty in θ (and eventually in y)? The only answer I can see is the one I already gave: residual frequentist guilt. It seems obvious that whatever definite, positive information we have about θ should be used, the results following naturally.

What definite information do we have? Well, some of that has been given. But all that ignores whatever evidence we have about the problem at hand. Why are we using normal distributions in the first place? If we’re using past y to inform about θ, that means we know something about the measurement process. Shouldn’t information like that be included? Yes.

Suppose the unit in which we’re measuring y is inches. Then suppose you have to communicate your findings to a colleague in France, a country which strangely prefers centimeters. Turns out that if you assumed, like the normal, θ was infinitely precise (i.e. continuous), the two answers—inches or centimeters—would give different probabilities to different intervals (suitably back-transformed). How can it be that merely changing units of measurement changes probabilities! Well, that’s a good question. It’s usually answered with a blizzard of mathematics (example), none of which allays the fears of Bayesian critics.

The problem is that we have ignored information. The yardstick we used is not infinitely precise, but has, like any measuring device anywhere, limitations. The best—as inbest—that we can do is to measure y from some finite set. Suppose this it to the nearest 1/16 of an inch. That means we can’t (or rather must) differentiate between 0″ and something less than 1/16″; it further means that we have some upper and lower limit. However we measure, the only possible results will fall into some finite set in any problem. Suppose this is 0″, 1/16″, 2/16″,…, 192/16″ (one foot; the exact units or set constituents do not matter, only that they exist does).

Well, 0″ = 0 cm, and 1/16″ = 0.15875 cm, and so on. Thus if the information was that any of the set were possible (in our next measurement of y), the probability of (say) 111/16″ is exactly the same as the probability of 17.6213 cm (we’ll always have to limit the number of digits in any number; thus 1/3 might in practice equal 0.333333 where the 3’s eventually end). And so on.

It turns out that if you take full account of the information, the units of measurement won’t matter! Notice also that the “prior” in this case was deduced from the available evidence; there was nothing ad hoc or “non-informative” about it at all (of course, other premises are possible leading to other deductions).

But then, with this information, we’re not really dealing with normal distributions. No parameters either: there is no θ in this setup. Ah. Is that so bad? We’ve given up the mathematical convenience continuity brings, but our reward is accuracy—and we never wander away from probability. We can still quantify the uncertainty in future (not yet seen) values of y given the old observations and knowledge of the measurement process, albeit at the price of more complicated formula (which seem more complicated than it really is at least because fewer people have worked on problems like these).

And we don’t really have to give up on continuity as an approximation. Here’s how it should work. First solve the problem at hand—quantifying the uncertainty in new (unseen) values of y given old ones and all the other premises available. I mean, calculate that exact answer. It will have some mathematical form, part of which will be dependent on the size or nature of the measurement process. Then let the number of elements in our measurement set grow “large”, i.e. take that formula to the limit (as recommended by, inter alia, Jaynes). Useful approximations will result. It will even be true that in some cases, the old stand-by, continuous-from-the-start answers will be rediscovered.

Best of all, we’ll have no distracting talk of “priors” and (parameter) “posteriors”. And we wouldn’t have to pretend continuous distributions (like the normal) are probabilities.

I Also Declare The Bayesian vs. Frequentist Debate Over For Data Scientists

LSMFT! What's the probability Santa prefers Luckies?

LSMFT! What’s the probability Santa prefers Luckies?

I stole the title, adding the word “also”, from an article by Rafael Irizarry at Simply Stats (tweeted by Diego Kuonen).

First, brush clearing. Data scientists. Sounds like galloping bureaucratic title inflation has struck again, no? Skip it.

Irizarry says, “If there is something Roger, Jeff and I agree on is that this debate is not constructive. As Rob Kass suggests it’s time to move on to pragmatism.” (Roger Peng and Jeff Leek co-run the blog; Rob Kass is a named person in statistics. Top men all.)

Pragmatism is a failed philosophy; as such, it cannot be relied on for anything. It says “use whatever works”, which has a nice sound to it (unlike “data scientist”), until you realize you’ve merely pushed the problem back one level. What does works mean?

No, really. However you form an answer will be philosophical at base. So we cannot escape having to have a philosophy of probability after all. There has to be some definite definition of works, thus also of probability, else the results we provide have no meaning.

Irizarry:

Applied statisticians help answer questions with data. How should I design a roulette so my casino makes $? Does this fertilizer increase crop yield?…[skipping many good questions]… To do this we use a variety of techniques that have been successfully applied in the past and that we have mathematically shown to have desirable properties. Some of these tools are frequentist, some of them are Bayesian, some could be argued to be both, and some don’t even use probability. The Casino will do just fine with frequentist statistics, while the baseball team might want to apply a Bayesian approach to avoid overpaying for players that have simply been lucky.

Suppose a frequentist provides an answer to a casino. How does the casino interpret it? They must interpret it somehow. That means having a philosophy of probability. Same thing with the baseball team. Now this philosophy can be flawed, as many are, but it can be flawed in such a way that not much harm is done. That’s why it seems frequentism does not produce much harm for casinos and why the same is true for Bayesian approaches in player pay scales.

It’s even why approaches which “don’t even use probability” might not cause much harm. Incidentally, I’m guessing by “don’t use probability” Irizarry means some mathematical algorithm that spits out answers to given inputs, a comment I based on his use of “mathematically…desirable properties”. But this is to mistake mathematics for or as probability. Probability is not math.

There exists a branch of mathematics called probability (really measure theory) which is treated like any other branch; theorems proved, papers written, etc. But it isn’t really probability. The math only becomes probability when its applied to questions. At that point an interpretation, i.e. a philosophy, is needed. And it’s just as well to get the right one.

Why is frequentism the wrong interpretation? Because to say we can’t know any probability until the trump of doom sounds—a point in time which is theoretically infinitely far away—is silly. Why is Bayes the wrong interpretation? Well, it isn’t; not completely. The subjective version is.

Frequency can and should inform probability. Given the evidence, or premises, “In this box are six green interocitors and four red ones. One interocitor will be pulled from the box” the probability of “A green interocitor will be pulled” is 6/10. Even though there are no such things as interocitors. Hence no real relative frequencies.

Subjectivity is dangerous in probability. A subjective Bayesian could, relying on the theory, say, “I ate a bad burrito. The probability of pulling a green interocitor is 97.121151%”. How could you prove him wrong?

Answer: you cannot. Not if subjectivism is right. You cannot say his guess doesn’t “work”, because why? Because there are no interocitors. You can never do an “experiment.” Ah, but why would you want to? Experiments only work with observables, which are the backbone of science. But who said probability only had to be used in science? Well, many people do say it, at least by implication. That’s wrong, though.

The mistake is not only to improperly conflate mathematics with probability, but to confuse probability models with reality. We need be especially wary of the popular fallacy of assuming the parameters of probability models are reality (hence the endless consternation over “priors”). Although one should, as Irizarry insists, be flexible with the method one uses, we should always strive to get the right interpretation.

What’s the name of this correct way? Well, it doesn’t really have one. Logic, I suppose, à la Laplace, Keynes, Jaynes, Stove, etc. I’ve used this in the past, but come to think it’s limiting. Maybe the best name is probability as argument.

Summary Against Modern Thought: Nothing Is Predicated Univocally Of God & Other Things

This may be proved in three ways. The first...

This may be proved in three ways. The first…

See the first post in this series for an explanation and guide of our tour of Summa Contra Gentiles. All posts are under the category SAMT.

Previous post.

Perhaps a simple way of summarizing this chapter is that (what is not surprising) God is unlike anything else, and our language, properly used, necessarily must reflect this. I’m trying out the new footnote style here, too.

Chapter 32: That Nothing Is Predicated Univocally Of God And Other Things

1 FROM the above it is clear that nothing can be predicated univocallyi of God and other things. For an effect which does not receive the same form specifically as that whereby the agent acts, cannot receive in a univocal sense the name derived from that form: for the sun and the heat generated from the sun are not called hot univocally. Now the forms of things whereof God is cause do not attain to the species of the divine virtue, since they receive severally and particularly that which is in God simply and universally.[1] It is evident therefore that nothing can be said univocally of God and other things.ii

iUnivocally: having one only one unambiguous meaning. Aristotle: “A man and an ox are both animal, and these are univocally so named, inasmuch as not only the name, but also the definition, is the same in both cases: for if a man should state in what sense each is an animal, the statement in the one case would be identical with that in the other.”

This is contrasted with equivocally; e.g. saying “he’s tall”, “that’s a tall order” use tall equivocally. Don’t argue with me, this article is an argument!

iiIt’s obvious the sun and the heat we feel from it, though both are hot, are not the same thing. We must use hot equivocally. As we’ll see below, the things that we can say about God we can’t say about things which aren’t God.

Here is St Thomas on the same subject in Summa Theologica (emphasis mine):

…In the same way, as said in the preceding article, all perfections existing in creatures divided and multiplied, pre-exist in God unitedly. Thus when any term expressing perfection is applied to a creature, it signifies that perfection distinct in idea from other perfections; as, for instance, by the term “wise” applied to man, we signify some perfection distinct from a man’s essence, and distinct from his power and existence, and from all similar things; whereas when we apply to it God, we do not mean to signify anything distinct from His essence, or power, or existence. Thus also this term “wise” applied to man in some degree circumscribes and comprehends the thing signified; whereas this is not the case when it is applied to God; but it leaves the thing signified as incomprehended, and as exceeding the signification of the name. Hence it is evident that this term “wise” is not applied in the same way to God and to man. The same rule applies to other terms. Hence no name is predicated univocally of God and of creatures.


4 Again. That which is predicated univocally of several things is more simple than either of them, at least in our way of understanding. Now nothing can be more simple than God, either in reality or in our way of understanding. Therefore nothing is predicated univocally of God and other things.iii

iiiDon’t forget simple is a technical term here, meaning not made of parts, with no potential, etc. Whatever we can discover to say univocally about God can’t be said of anything else in that same way.


5 Further. Whatever is predicated univocally of several things belongs by participation to each of the things of which it is predicated: for the species is said to participate the genus, and the individual the species. But nothing is said of God by participation, since whatever is participated is confined to the mode of a participated thing, and thus is possessed partially and not according to every mode of perfection. It follows therefore that nothing is predicated univocally of God and other things.iv

ivA gross simplification, or perhaps a good parlor game: see if you can name of thing predicated of God which cannot be predicated of anything else. For example, “God is being-itself” which predicates the being-itselfness of God. But other things certainly have being (you do, I do), though nothing else is being-itself. Inasmuch as we can comprehend this term (which is a long way short of the fullness of it), we can only say this univocally of God and of nothing else.


6 [THIS ARGUMENT MAY BE SKIPPED] Again. That which is predicated of several things according to priority and posteriority is certainly not predicated of them univocally, since that which comes first is included in the definition of what follows, for instance substance in the definition of accident considered as a being. If therefore we were to say being univocally of substance and accident, it would follow that substance also should enter into the definition of being as predicated of substance: which is clearly impossible.v Now nothing is predicated in the same order of God and other things, but according to priority and posteriority: since all predicates of God are essential, for He is called being because He is very essence, and good because He is goodness itself: whereas predicates are applied to others by participation; thus Socrates is said to be a man, not as though he were humanity itself, but as a subject of humanity. Therefore it is impossible for any thing to be predicated univocally of God and other things.vi

vIt would be circular.

viA last emphasis. If we can speak univocally of God, whatever term we happen to use it would then be impossible to use it univocally of any other thing.


———————————————————————————

[1] Chs. xxviii., xxix.
[2] Ch. xxiii.
[3] Chs. xxiv., xxv.
[4] Ch. xxiii.

The Reimaginings Of Exodus And Noah Inspires Moviemakers—And You!

John Nolte doesn’t like Ridley Scott’s Exodus: Gods and Kings.

…Scott makes a fool of himself. DeMille used the ancient biblical tale to tell a universal story about human liberty. Where Charlton Heston’s Moses demanded that Ramses “Let my people go!”, Bale’s Moses — and this is no joke — demands that Ramses pay his slaves a living wage and make them — again, no joke — citizens. DeMille’s Moses was a liberator. Scott’s Moses is a community organizer agitating for executive action on the minimum wage and amnesty.

Now Darren Aronofsky’s Noah was just as tuned to our post-Christian ever-so-delicate sensibilities. In his review of that movie, Nolte writes:

The sins of idolatry, blasphemy, dishonesty, adultery, and treating your parents with disrespect have absolutely nothing to do with why God wants to flood the earth and start over. “Noah” isn’t even interested in Jesus’ commandment to love one another as you love yourself.

Aronofsky’s “God” is only disappointed, disgusted and ready to be rid of man for the single sin of hurting the environment. And hurting the environment is defined in the film as strip-mining, eating animal flesh, hunting, and even plucking a flower no bigger than a dime because “it’s pretty.”

…Every glimpse of those God will wipe out shows these “sinners” exploiting Mother Nature. They butcher meat, tear live animals to pieces, hunt, mine, and cut trees. According to Aronofsky, that is all these people are guilty of and that is enough to justify the coming biblical genocide.

If “God” can destroy the world for the mortal sin of pressing pretty flowers, what sort of hell, with your enormous carbon “footprint”, do you think awaits you, you climate denier, you?

But Noah made money, and early forecasts are that Exodus will do at least okay. The Bible is in and profitable. It is a rich source of moral stories that has barely begun to be mined for movie material. So, our job today is to help filmmakers with suggestions of which tales we’d like to see.

Sanitized tales, of course. We don’t want to offend anybody. Feelings must not be hurt. We—us blessed folks living on the right side of history—know more than our unenlightened ancestors. Obviously, we cannot present the Bible as it is written and must tweak it a bit. Here are some of my ideas (in which I exercise artistic license). What are yours?

Sodom and Gomorrah: The Pride And The Glory

Two mysterious strangers approach the desert twin cities of Sodom and Gomorrah. A festival—a parade, wine, food, circus acts—is in progress. The innkeeper Lot sees them approach. He curses under his breath, “This is all I need.” Flashback: Lot kicking out tenants who could not pay, kicking in the teeth of a man who will later be seen in the festival parade, and kicking around cats and small children.

The strangers ask for a room. Lot, seeing they are rich, boots other guests to make space. He offers his daughter and wife to the aggrieved as recompense, throwing all into the street. Seeing this, the incensed crowd reacts and tries to bring Lot to justice. The man with the kicked teeth shouts, “You inhospitable brute!”

Lot escapes into the night through a hidden back door. Meanwhile, the strangers, who have been at the wine, realize what has happened. Turns out they are members of the Yogic Guardians, mythic creatures from the planet Cron charged with meting out punishment throughout the seven hundred worlds! The strangers go into Down Dog Supreme in order to blast Lot to smithereens. But as they are inebriated, their aim is off and the towns are destroyed instead.

Lot sees this and his heart softens. He weeps and vows to mend his ways. We end with an elderly Lot (who now runs a bathhouse in Nineveh) sharing a lovely sunset on his porch with his beloved goat Franklin.

The Sharing (A Lifetime movie)

After a long day community organizing, a weary Jesus wants nothing but to rest and eat. But his followers bring a women to him saying, “This woman has said, ‘All lives matter‘.” Jesus knew they were testing him and said, “That’s racist.” He directed that the woman be re-educated by trained experts.

His followers were many and Jesus was worried there would not be enough food. But an apostle reminded that good man that all his followers were members of the Green Organic Cooperative and that they had plenty and were willing to share. “That’s a miracle,” said Jesus.

Later, all sit around a single organic candle and sing songs celebrating how nice it was to be nice to one another.

Confession: I claim no originality. Minus the cinematic details, these plots are directly from modern theologians.

Older posts Newer posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑