Because I am jet-lagged (and fundamentally lazy) here is a classic post, modified to include a homework assignment. Don’t be like me: do the work. This post contains some fundamental derivations from Bayes’s Theorem which are of great interest, and not just in “proving” the existence of God (I use scare quotes because if your “proof” only has God as “probable” then it is not a proof). Anxious readers may start at Notation. This originally appeared November, 2012.
Because my ignorance is vast—and, as many readers will argue, probably increasing—I only just yesterday heard of Swinburne’s “P-inductive” and “C-inductive” arguments for the existence of God. I had heard of Richard Swinburne, but I thought the name historical, a contemporary of John Calvin or perhaps Jonathan Edwards. Well, it sounds Eighteenth Century Protestanty, doesn’t it?
Boy, was I off base.
Anyway, let’s make up for lost time. This post is merely a sketch. I want to explore his probability language only; we won’t today use his arguments to prove God’s existence.
David Stove (in The Rationality of Induction) showed us how easy it is skip gleefully down the wrong path by misusing words. How misuse causes misunderstanding, and misunderstanding becomes the seed of “paradoxes” and philosophical “problems”, such as the supposed “problem of induction.” This is an academic belief which says that induction is not “justified” or rational. Yet, of course, every academic who proclaims that induction is irrational also uses it.
We all do, and must use, induction. Now, an inductive argument is an invalid one. That is, the conclusion of an inductive argument does not follow infallibly from its premises. Example (from Hume): (premise) all the many flames I have seen before have been hot, therefore (conclusion) this flame will be hot. Induction is also why tenured faculty do not leap from tall buildings and expect to live. It might be that this flame won’t be hot and that holding our palm over it will turn it to ice instead of cooking our flesh. But there isn’t anybody who would make a bet on this “might.” Also, we can’t say we have deduced this flame will be hot, so the argument is invalid, but we have induced it will be.
So let’s be careful with Swinburne’s language and talk of P and C probability arguments. His notation: all new arguments begin with some conclusion or hypothesis we want to judge. Call this H. All arguments, including probability arguments, are conditional on premises or evidence. Thus all arguments must have a list of premises, or evidence. Call new evidence E and old evidence or knowledge K.
We write the probability H is true given K as Pr( H | K ). Let K = “We have a two sided object which will be tossed once, one side of which is labeled ‘head’, and when tossed only one side can show.” Let H = “A ‘head’ shows.” Then Pr( H | K ) = 1/2. Note very carefully that we constructed H. We could have let H = “Barbara Streisand votes for Romney” and then Pr( H | K ) = unknown.
We know K but suppose we learn E = “A magician will toss the object, and when he tosses the object always comes up ‘head’.” Then Pr( H | E & K ) = 1, it is certain a ‘head’ will show. Notice that we added E to the list of premises, i.e. to K. We got to this equation through the use of Bayes’s theorem, which goes like this:
Pr( H | E & K ) = Pr( E | H & K ) * Pr( H | K ) / Pr( E | K )
(see the classic posts link for the statistics teaching articles for why this is so). We already know Pr( H | K ) = 1/2 and that Pr( H | E & K ) = 1. But what of Pr( E | H & K ) and Pr( E | K )? Well, Pr( E | H & K ) / Pr( E | K ) must equal 2 (because 2 * 1/2 = 1).
Let’s be careful: Pr( E | H & K ) says given we know K and we know a ‘head’ showed, what is the probability that a magician with this talent exists? I haven’t any idea what this number would be except to say it can’t be 1, because if it were it means we deduced E is true given (only) H & K, which of course we can’t. Now Pr( E | K ) says, given K, what is the probability a magician with this talent exists? Again, I don’t know, except to say that this probability is between 0 and 1.
But we can deduce that Pr( E | H & K ) = 2 * Pr( E | K ). That is, after learning a ‘head’ did show (and knowing K), we know that E is twice as likely as before we knew the outcome.
This is a great example, because it shows that not all probability is precisely quantifiable, but that we can sometimes bound it. And that it is futile to search for answers where none can be found. Even in situations that seem trivially easy, as this one.
We need one more piece of notation. H is our conclusion or hypothesis. Saying “H” is shorthand for “H is true”. We need a way to say “H is false.” How about “~H”? Eh, not perfect, kinda ugly, but it’s common enough. Thus, Pr(H | K) + Pr(~H | K) = 1, right?
Swinburne (modified) says a P-probability argument makes its conclusion more probable than not. That means,
Pr( H | E & K ) > 1/2
and the inequality is strict. This implies that a “good” P-probability argument starts with Pr( H | K ) <= 1/2. In other words, adding evidence E pushes us from “maybe not” into the realm of “maybe so.” The example we used above is a good P-probability argument.
Swinburne (modified) says a C-probability argument raises the probability of its conclusion. That means
Pr( H | E & K ) > Pr( H | K )
where again the inequality is strict. A C-probability argument is thus weaker than a P-one, because it could be that
1/2 > Pr( H | E & K ) > Pr( H | K ).
A C-probability argument can sometimes be a P-probability argument, but only when Pr( H | E & K ) > 1/2. Our example is also a good C-probability argument.
Point of both these (unnecessary?) complications is to examine arguments which increase the probability of H after adding evidence E. When does that happen? Look again at Reverend Bayes, rearranged:
Pr( H | E & K ) / Pr( H | K ) = Pr( E | H & K ) / Pr( E | K ).
Pr( H | E & K ) > Pr( H | K ),
Pr( H | E & K ) / Pr( H | K ) > 1,
Pr( E | H & K ) / Pr( E | K ) > 1.
In other words, Pr( H | E & K ) > Pr( H | K ) when the evidence (or premise) E becomes more likely if we assume H is true. And this is true for both P-probability and C-probability arguments. We already saw that this was true for our example.
There is one more result to be had. It isn’t as easy to get to. Ready?
Two rules of probability let us write:
Pr(E | K) = Pr(E & H | K) + Pr(E & ~H | K) [total probability]
Pr(E & H | K) = Pr(E | H & K) * Pr(H | K); [conditional prob.]
Pr(E & ~H | K) = Pr(E | ~H & K) * Pr(~H | K).
We already proved that if H is more probable after knowing E that Pr( E | H & K ) / Pr( E | K ) > 1. Thus substituting for Pr(E | K) and multiplying both sides by this substitution, we have
Pr( E | H & K ) > Pr(E | H & K) * Pr(H | K) + Pr(E | ~H & K) * Pr(~H | K).
Pr( E | H & K ) – Pr(E | H & K) * Pr(H | K) > Pr(E | ~H & K) * Pr(~H | K).
Gathering terms and because (1 – Pr(H | K)) = Pr(~H | K) we conclude
Pr( E | H & K ) > Pr(E | ~H & K),
and that this must hold when the probability of H increases when adding E. In other words, the probability that E is true given we assume H is true is larger than the probability E is true given we assume H is false. In other, other words, the probability the evidence is true is larger assuming H true than assuming H false. For our example this says it is more likely a tricky magician exists if we see a ‘head’ than if we do not see it, which I hope makes sense.
There is no conclusion yet, except for these mathematical consequences (well known to readers of Jaynes, for example). We’ll have to return to these results when we look at Swinburne’s implementation of them. But not today.
There is no understanding without doing, which should be the only reason for homework. Your task: propose a K, H, and one or more E such that the probability of H increases on E. Bonus points for non-quantifiable K, H, and E (let him that readeth understand).