William M. Briggs

Statistician to the Stars!

Free Will and God’s Providence.: An Introduction—Guest Post by Bob Kurland

Two robots have a very brief discussion of free will.

Two robots have a very brief discussion of free will.

Starting this Friday and continuing for the next three weeks, a series on Free Will by our old friend Bob Kurland.

“Of course I believe in free will. I have no choice.” Isaac Balshevis Singer, The Salon Interview 1987.

Introduction

In the past decades we have seen outrage after outrage committed by religious terrorists, gangs, members of drug cartels—the murder of Christian, Jewish and Arab children, the rape of Christian nuns, the trafficking of women and girls, “knockout” beatings of whites, the teaching of hate. I’ll not discuss how these villains attempt to justify their acts on the basis of religion or deprived socio-economic status. Rather, I want to address the following questions:

  • Do the terrorists commit these deeds freely, as we understand Free Will?
  • If they do act freely, how is it possible, for us as Christians, to forgive them?
  • Whether or not their actions be free, is there a way to see this evil as compatible with or proceeding from God’s Foreknowledge?

“‘Free Will’ is a philosophical term of art for a particular sort of capacity of rational agents to choose a course of action from among various alternatives.” Timothy O’Connor, Stanford Encyclopedia of Philosophy

Timothy O’Connor’s definition above of Free Will sets the stage for stating the problem, although one important adverb has been omitted from his definition: rather than “agents to choose” I would write “agents to choose freely”. One might also add “after due deliberation and reflection”.

What are the objections to Free Will as thus defined?

  • First, if the universe is deterministic, plays out according to set physical laws, there can be only one future and there can be no free choices. If, as special relativity suggests, that there is a particular past, present and future for each particular reference frame, so that all is encompassed in a block universe and everything is laid out before us, independent of our actions.
  • Second, if our genes determine our personality, character and intelligence, how can there be different ways for us to choose, and thus to be free?
  • Third, if, on the other hand, we are formed by economic and social circumstances that mold our morals and attitudes, what ethical options are then open?

Or, if as some would have it, the randomness of quantum mechanics governs our decisions, how can this randomness be reconciled with conscious deliberation and free choice? Where is the entity within us, the soul, that can act freely?

The Problem of Divine Providence

“By His providence God protects and governs all things which he has made…even those things which are yet to come into existence through the free action of creatures” First Vatican Council, Dei Filius

As a Catholic, I believe in a transcendent, omnipotent and omniscient God. “Omnipotent” means God can do what He wills, all that does not contradict the laws of logic or of necessary truths—God can’t and wouldn’t make 2+2=5 or a four-sided triangle. “Omniscient” means God knows what has happened, is happening, and will happen. God is eternal, so that past, present and future (in any frame of reference) are in His ken. (Not all theologists agree with this last dictum.) Such is Divine Providence, God’s omnipotence and His omniscience, including His Foreknowledge, the knowledge of the future.

Thus God knows whether I will do my daily prayer, sleep late and miss Mass tomorrow, get angry at the slow driver in front of me next week,…But if God does know all my actions, past and future, where is my freedom to do differently? Supposedly God has given me free will to choose, but if he knows what I will choose, am I truly free, even if I think I am? That is the problem of reconciling Free Will and God’s Foreknowledge.

The Problem of Moral Responsibility

If we do not have free will can we be held to be morally responsible for evil acts? Insanity–lack of knowledge of the moral implication of our acts–is a defense against murder and claims of “irresistible impulse” have been used to deny guilt.

The Catholic Catechism (1857) gives, “Mortal sin is sin whose object is grave matter and which is also committed with full knowledge and deliberate consent.” The phrase “deliberate consent” implies a free will consent, so we might ask whether addiction, genetic predisposition, socio/psychological factors could be considered mitigating factors. The theologians are not in total agreement, but some do propose that addiction and other conditions negating free will mitigate the gravity of sin. Or, as the Jets proclaim to Officer Krupke in West Side story, “It’s just our upbringing that gets us out of hand”.

Some Reference

Robert Kane, “Reflections on Free Will and Determinism“. John Martin Fischer, Robert Kane, Derk Pereboom, Manuel Vargas, Four Views on Free Will (pdf). Timothy O’Connor, “Free Will“. Eliezer J. Sternberg, My Brain Made Me Do It: The Rise of Neuroscience and the Threat to Moral Responsibility. Alfred Freddoso, “Molinism.” St. Augustine, On Grace and Free Will. The Information Philosopher, “The Block Universe of Special Relativity.”

Pascal’s Pensées, A Tour: IV

PascalSince our walk through Summa Contra Gentiles is going so well, why not let’s do the same with Pascal’s sketchbook on what we can now call Thinking Thursdays. We’ll use the Dutton Edition, freely available at Project Gutenberg. (I’m removing that edition’s footnotes.)

Previous post.

9 When we wish to correct with advantage, and to show another that he errs, we must notice from what side he views the matter, for on that side it is usually true, and admit that truth to him, but reveal to him the side on which it is false. He is satisfied with that, for he sees that he was not mistaken, and that he only failed to see all sides. Now, no one is offended at not seeing everything; but one does not like to be mistaken, and that perhaps arises from the fact that man naturally cannot see everything, and that naturally he cannot err in the side he looks at, since the perceptions of our senses are always true.1

1Good advice for perpetual arguers like yours truly, who sometimes forgot it in the joy of battle, because what’s wrong with most everyday arguments are false or incomplete premises. Pascal’s dictum doesn’t work for every argument, of course. Some are so wrong, or the desire to believe a false conclusion so strong, that nothing short of divine grace will free a person from error. Do you really think that, unaided, you’ll bring a chiropractor or, say, an academic feminist to see what’s wrong with her stance? These folks are so far from the promised land that it isn’t even on their maps. This doesn’t make Pascal wrong, but winning people to the Truth is hard, brutal labor.


10 People are generally better persuaded by the reasons which they have themselves discovered than by those which have come into the mind of others.2

2This is why we have some sympathy whenever an educator rediscovers the truism that kids better grasp ideas they work out for themselves. Yet any kindly disposition we have is blown away the second the educator insists that all learning follow this regimen. If the child (or adult) isn’t provided with a solid foundation (a memorized one), almost no learning can follow. The student won’t have the tools to work things out and he won’t know when something is right and when it is wrong. And we are back to the first point.


11 All great amusements are dangerous to the Christian life; but among all those which the world has invented there is none more to be feared than the theatre. It is a representation of the passions so natural and so delicate that it excites them and gives birth to them in our hearts, and, above all, to that of love, principally when it is represented as very chaste and virtuous. For the more innocent it appears to innocent souls, the more they are likely to be touched by it. Its violence pleases our self-love, which immediately forms a desire to produce the same effects which are seen so well represented; and, at the same time, we make ourselves a conscience founded on the propriety of the feelings which we see there, by which the fear of pure souls is removed, since they imagine that it cannot hurt their purity to love with a love which seems to them so reasonable.

So we depart from the theatre with our heart so filled with all the beauty and tenderness of love, the soul and the mind so persuaded of its innocence, that we are quite ready to receive its first impressions, or rather to seek an opportunity of awakening them in the heart of another, in order that we may receive the same pleasures and the same sacrifices which we have seen so well represented in the theatre.3

3Initially, Pascal’s complaint reads like those of the nuns who schooled my father and who said the worst things kids did was to chew gum or get out of line in the halls. We are at the point where we long for the good old days where love was “represented as very chaste and virtuous.” So far into the wilderness are we that not one in five hundred could today share his fear of theater. It passes all imagination to see oneself marching down Broadway with a “Down with Tartuffe!” sign.

On the other hand, swap in television and movies for theatre and we see Pascal nails it. Particularly those whose of certain sexual natures who choose to violate natural law are portrayed as extraordinary loving wholly sympathetic creatures, superior to the rest of us (don’t think so? Read this). The situations into which these protagonists are thrust are so ludicrous that it would take an audience with a heart of stone not to root for them. No consequences are ever seen and anything that smacks of reality is expunged. Viewers are discouraged from thinking and tricked, manipulated into feeling, only feeling.

Art has always been recognized as dangerous.


Improper Language About Priors

A Christmas distribution of posteriors.

A Christmas distribution of posteriors. Image source.

Suppose you decided (almost surely by some ad hoc rule) that the uncertainty in some thing (call it y) is best quantified by a normal distribution with central parameter θ and spread 1. Never mind how any of this comes about. What is the value of θ? Nobody knows.

Before we go further, the proper answer to that question almost always should be: why should I care? After all, our stated goal was to understand the uncertainty in y, not θ. Besides, θ can never be observed; but y can. How much effort should we spend on something which is beside the point?

If you answered “oodles”, you might consider statistics as a profession. If you thought “some” was right, stick around.

Way it works is that data is gathered (old y) which is then used to say things, not about new y, but about θ. Turns out Bayes’s theorem requires an initial guess of the values of θ. The guess is called “a prior” (distribution): the language that is used to describe it is the main subject today.

Some insist that that the prior express “total ignorance”. What can that mean? I have a proposition (call it Q) about which I tell you nothing (other than it’s a proposition!). What is the probability Q is true? Well, given your total ignorance, there is none. You can’t consistent with the evidence say to yourself anything like, “Q has got to be contingent, therefore the probability Q is true is greater than 0 and less than 1.” Who said Q had to be contingent? You are in a state of “total ignorance” about Q: no probability exists.

The same is not true, and cannot be true, of θ. Our evidence positively tells us that “θ is a central parameter for a normal distribution.” There is a load of rich information in that proposition. We know lots about “normals”; how they give 0 probability to any observable, how they give non-zero probability to any interval on the real line, that θ expresses the central point and must be finite, and so on. It is thus impossible—as in impossible—for us to claim ignorance.

This makes another oft-heard phrase “non-informative prior” odd. I believe it originated from nervous once-frequentist recent converts to Bayesian theory. Frequentists hated (and still hate) the idea that priors could influence the outcome of an analysis (themselves forgetting nearly the whole of frequentist theory is ad hoc) and fresh Bayesians were anxious to show that priors weren’t especially important. Indeed, it can even be proved that in the face of rich and abundant information, the importance of the prior fades to nothing.

Information, alas, isn’t always abundant thus the prior can matter. And why shouldn’t it? More on that question in a moment. But because some think the prior should matter as little as possible, it is often suggested that the prior on θ should be “uniform”. That means that, just like the normal itself, the probability θ takes any value is zero, the probability of any interval is non-zero; it also means that all intervals of the same length have the same probability.

But this doesn’t work. Actually, that’s a gross understatement. It fails spectacularly. The uniform prior on θ is no longer a probability, proved easily by taking the integral of the density (which equals 1) over the real line, which turns out to be infinite. That kind of maneuver sends out what philosopher David Stove called “distress signals.” Those who want uniform priors are aware that they are injecting non-probability into a probability problem, but still want to retain “non-informatativity” so they call the result an “improper prior”. “Prior” makes it sound like it’s a probability, but “improper” acknowledges it isn’t. (Those who use improper priors justify them saying that the resultant posteriors are often, but not always, “proper” probabilities. Interestingly, “improper” priors in standard regression gives identical results, though of course interpreted differently, to classical frequentism.)

Why shouldn’t the prior be allowed to inform our uncertainty in θ (and eventually in y)? The only answer I can see is the one I already gave: residual frequentist guilt. It seems obvious that whatever definite, positive information we have about θ should be used, the results following naturally.

What definite information do we have? Well, some of that has been given. But all that ignores whatever evidence we have about the problem at hand. Why are we using normal distributions in the first place? If we’re using past y to inform about θ, that means we know something about the measurement process. Shouldn’t information like that be included? Yes.

Suppose the unit in which we’re measuring y is inches. Then suppose you have to communicate your findings to a colleague in France, a country which strangely prefers centimeters. Turns out that if you assumed, like the normal, θ was infinitely precise (i.e. continuous), the two answers—inches or centimeters—would give different probabilities to different intervals (suitably back-transformed). How can it be that merely changing units of measurement changes probabilities! Well, that’s a good question. It’s usually answered with a blizzard of mathematics (example), none of which allays the fears of Bayesian critics.

The problem is that we have ignored information. The yardstick we used is not infinitely precise, but has, like any measuring device anywhere, limitations. The best—as inbest—that we can do is to measure y from some finite set. Suppose this it to the nearest 1/16 of an inch. That means we can’t (or rather must) differentiate between 0″ and something less than 1/16″; it further means that we have some upper and lower limit. However we measure, the only possible results will fall into some finite set in any problem. Suppose this is 0″, 1/16″, 2/16″,…, 192/16″ (one foot; the exact units or set constituents do not matter, only that they exist does).

Well, 0″ = 0 cm, and 1/16″ = 0.15875 cm, and so on. Thus if the information was that any of the set were possible (in our next measurement of y), the probability of (say) 111/16″ is exactly the same as the probability of 17.6213 cm (we’ll always have to limit the number of digits in any number; thus 1/3 might in practice equal 0.333333 where the 3’s eventually end). And so on.

It turns out that if you take full account of the information, the units of measurement won’t matter! Notice also that the “prior” in this case was deduced from the available evidence; there was nothing ad hoc or “non-informative” about it at all (of course, other premises are possible leading to other deductions).

But then, with this information, we’re not really dealing with normal distributions. No parameters either: there is no θ in this setup. Ah. Is that so bad? We’ve given up the mathematical convenience continuity brings, but our reward is accuracy—and we never wander away from probability. We can still quantify the uncertainty in future (not yet seen) values of y given the old observations and knowledge of the measurement process, albeit at the price of more complicated formula (which seem more complicated than it really is at least because fewer people have worked on problems like these).

And we don’t really have to give up on continuity as an approximation. Here’s how it should work. First solve the problem at hand—quantifying the uncertainty in new (unseen) values of y given old ones and all the other premises available. I mean, calculate that exact answer. It will have some mathematical form, part of which will be dependent on the size or nature of the measurement process. Then let the number of elements in our measurement set grow “large”, i.e. take that formula to the limit (as recommended by, inter alia, Jaynes). Useful approximations will result. It will even be true that in some cases, the old stand-by, continuous-from-the-start answers will be rediscovered.

Best of all, we’ll have no distracting talk of “priors” and (parameter) “posteriors”. And we wouldn’t have to pretend continuous distributions (like the normal) are probabilities.

I Also Declare The Bayesian vs. Frequentist Debate Over For Data Scientists

LSMFT! What's the probability Santa prefers Luckies?

LSMFT! What’s the probability Santa prefers Luckies?

I stole the title, adding the word “also”, from an article by Rafael Irizarry at Simply Stats (tweeted by Diego Kuonen).

First, brush clearing. Data scientists. Sounds like galloping bureaucratic title inflation has struck again, no? Skip it.

Irizarry says, “If there is something Roger, Jeff and I agree on is that this debate is not constructive. As Rob Kass suggests it’s time to move on to pragmatism.” (Roger Peng and Jeff Leek co-run the blog; Rob Kass is a named person in statistics. Top men all.)

Pragmatism is a failed philosophy; as such, it cannot be relied on for anything. It says “use whatever works”, which has a nice sound to it (unlike “data scientist”), until you realize you’ve merely pushed the problem back one level. What does works mean?

No, really. However you form an answer will be philosophical at base. So we cannot escape having to have a philosophy of probability after all. There has to be some definite definition of works, thus also of probability, else the results we provide have no meaning.

Irizarry:

Applied statisticians help answer questions with data. How should I design a roulette so my casino makes $? Does this fertilizer increase crop yield?…[skipping many good questions]… To do this we use a variety of techniques that have been successfully applied in the past and that we have mathematically shown to have desirable properties. Some of these tools are frequentist, some of them are Bayesian, some could be argued to be both, and some don’t even use probability. The Casino will do just fine with frequentist statistics, while the baseball team might want to apply a Bayesian approach to avoid overpaying for players that have simply been lucky.

Suppose a frequentist provides an answer to a casino. How does the casino interpret it? They must interpret it somehow. That means having a philosophy of probability. Same thing with the baseball team. Now this philosophy can be flawed, as many are, but it can be flawed in such a way that not much harm is done. That’s why it seems frequentism does not produce much harm for casinos and why the same is true for Bayesian approaches in player pay scales.

It’s even why approaches which “don’t even use probability” might not cause much harm. Incidentally, I’m guessing by “don’t use probability” Irizarry means some mathematical algorithm that spits out answers to given inputs, a comment I based on his use of “mathematically…desirable properties”. But this is to mistake mathematics for or as probability. Probability is not math.

There exists a branch of mathematics called probability (really measure theory) which is treated like any other branch; theorems proved, papers written, etc. But it isn’t really probability. The math only becomes probability when its applied to questions. At that point an interpretation, i.e. a philosophy, is needed. And it’s just as well to get the right one.

Why is frequentism the wrong interpretation? Because to say we can’t know any probability until the trump of doom sounds—a point in time which is theoretically infinitely far away—is silly. Why is Bayes the wrong interpretation? Well, it isn’t; not completely. The subjective version is.

Frequency can and should inform probability. Given the evidence, or premises, “In this box are six green interocitors and four red ones. One interocitor will be pulled from the box” the probability of “A green interocitor will be pulled” is 6/10. Even though there are no such things as interocitors. Hence no real relative frequencies.

Subjectivity is dangerous in probability. A subjective Bayesian could, relying on the theory, say, “I ate a bad burrito. The probability of pulling a green interocitor is 97.121151%”. How could you prove him wrong?

Answer: you cannot. Not if subjectivism is right. You cannot say his guess doesn’t “work”, because why? Because there are no interocitors. You can never do an “experiment.” Ah, but why would you want to? Experiments only work with observables, which are the backbone of science. But who said probability only had to be used in science? Well, many people do say it, at least by implication. That’s wrong, though.

The mistake is not only to improperly conflate mathematics with probability, but to confuse probability models with reality. We need be especially wary of the popular fallacy of assuming the parameters of probability models are reality (hence the endless consternation over “priors”). Although one should, as Irizarry insists, be flexible with the method one uses, we should always strive to get the right interpretation.

What’s the name of this correct way? Well, it doesn’t really have one. Logic, I suppose, à la Laplace, Keynes, Jaynes, Stove, etc. I’ve used this in the past, but come to think it’s limiting. Maybe the best name is probability as argument.

« Older posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑