*Because I am jet-lagged (and fundamentally lazy) here is a classic post, modified to include a homework assignment. Don’t be like me: do the work. This post contains some fundamental derivations from Bayes’s Theorem which are of great interest, and not just in “proving” the existence of God (I use scare quotes because if your “proof” only has God as “probable” then it is not a proof). Anxious readers may start at Notation. This originally appeared November, 2012.*

Because my ignorance is vast—and, as many readers will argue, probably increasing—I only just yesterday heard of Swinburne’s “P-inductive” and “C-inductive” arguments for the existence of God. I had heard of Richard Swinburne, but I thought the name historical, a contemporary of John Calvin or perhaps Jonathan Edwards. Well, it sounds Eighteenth Century Protestanty, doesn’t it?

Boy, was I off base.

Anyway, let’s make up for lost time. This post is merely a sketch. I want to explore his probability language only; we won’t today use his arguments to prove God’s existence.

David Stove (in *The Rationality of Induction*) showed us how easy it is skip gleefully down the wrong path by misusing words. How misuse causes misunderstanding, and misunderstanding becomes the seed of “paradoxes” and philosophical “problems”, such as the supposed “problem of induction.” This is an academic belief which says that induction is not “justified” or rational. Yet, of course, every academic who proclaims that induction is irrational also uses it.

We all do, and must use, induction. Now, an inductive argument is an invalid one. That is, the conclusion of an inductive argument does not follow infallibly from its premises. Example (from Hume): (premise) all the many flames I have seen before have been hot, therefore (conclusion) *this* flame will be hot. Induction is also why tenured faculty do not leap from tall buildings and expect to live. It *might* be that this flame won’t be hot and that holding our palm over it will turn it to ice instead of cooking our flesh. But there isn’t anybody who would make a bet on this “might.” Also, we can’t say we have *deduced* this flame will be hot, so the argument is invalid, but we have induced it will be.

**Notation**

So let’s be careful with Swinburne’s language and talk of P and C probability arguments. His notation: all new arguments begin with some conclusion or hypothesis we want to judge. Call this H. All arguments, including probability arguments, are conditional on premises or evidence. Thus all arguments *must* have a list of premises, or evidence. Call new evidence E and old evidence or knowledge K.

We write the probability H is true given K as Pr( H | K ). Let K = “We have a two sided object which will be tossed once, one side of which is labeled ‘head’, and when tossed only one side can show.” Let H = “A ‘head’ shows.” Then Pr( H | K ) = 1/2. Note very carefully that *we* constructed H. We could have let H = “Barbara Streisand votes for Romney” and then Pr( H | K ) = unknown.

We know K but suppose we learn E = “A magician will toss the object, and when he tosses the object always comes up ‘head’.” Then Pr( H | E & K ) = 1, it is certain a ‘head’ will show. Notice that we added E to the list of premises, i.e. to K. We got to this equation through the use of Bayes’s theorem, which goes like this:

Pr( H | E & K ) = Pr( E | H & K ) * Pr( H | K ) / Pr( E | K )

(see the classic posts link for the statistics teaching articles for why this is so). We already know Pr( H | K ) = 1/2 and that Pr( H | E & K ) = 1. But what of Pr( E | H & K ) and Pr( E | K )? Well, Pr( E | H & K ) / Pr( E | K ) must equal 2 (because 2 * 1/2 = 1).

Let’s be careful: Pr( E | H & K ) says given we know K *and* we *know* a ‘head’ showed, what is the probability that a magician with this talent exists? I haven’t any idea what this number would be except to say it can’t be 1, because if it were it means we *deduced* E is true given (only) H & K, which of course we can’t. Now Pr( E | K ) says, given K, what is the probability a magician with this talent exists? Again, I don’t know, except to say that this probability is between 0 and 1.

But we *can* deduce that Pr( E | H & K ) = 2 * Pr( E | K ). That is, after learning a ‘head’ did show (and knowing K), we know that E is twice as likely as before we knew the outcome.

This is a great example, because it shows that not all probability is precisely quantifiable, but that we can sometimes bound it. And that it is futile to search for answers where none can be found. Even in situations that seem trivially easy, as this one.

We need one more piece of notation. H is our conclusion or hypothesis. Saying “H” is shorthand for “H is true”. We need a way to say “H is false.” How about “~H”? Eh, not perfect, kinda ugly, but it’s common enough. Thus, Pr(H | K) + Pr(~H | K) = 1, right?

**P-probability arguments**

Swinburne (modified) says a P-probability argument makes its conclusion more probable than not. That means,

Pr( H | E & K ) > 1/2

and the inequality is strict. This implies that a “good” P-probability argument starts with Pr( H | K ) <= 1/2. In other words, adding evidence E pushes us from “maybe not” into the realm of “maybe so.” The example we used above is a good P-probability argument.

**C-probability arguments**

Swinburne (modified) says a C-probability argument raises the probability of its conclusion. That means

Pr( H | E & K ) > Pr( H | K )

where again the inequality is strict. A C-probability argument is thus weaker than a P-one, because it could be that

1/2 > Pr( H | E & K ) > Pr( H | K ).

A C-probability argument can sometimes be a P-probability argument, but only when Pr( H | E & K ) > 1/2. Our example is also a good C-probability argument.

**Increasing probabilities**

Point of both these (unnecessary?) complications is to examine arguments which increase the probability of H after adding evidence E. When does that happen? Look again at Reverend Bayes, rearranged:

Pr( H | E & K ) / Pr( H | K ) = Pr( E | H & K ) / Pr( E | K ).

Now if

Pr( H | E & K ) > Pr( H | K ),

then

Pr( H | E & K ) / Pr( H | K ) > 1,

and thus

Pr( E | H & K ) / Pr( E | K ) > 1.

In other words, Pr( H | E & K ) > Pr( H | K ) when the evidence (or premise) E becomes more likely if we assume H is true. And this is true for both P-probability and C-probability arguments. We already saw that this was true for our example.

There is one more result to be had. It isn’t as easy to get to. Ready?

Two rules of probability let us write:

Pr(E | K) = Pr(E & H | K) + Pr(E & ~H | K) [total probability]

and that

Pr(E & H | K) = Pr(E | H & K) * Pr(H | K); [conditional prob.]

also

Pr(E & ~H | K) = Pr(E | ~H & K) * Pr(~H | K).

We already proved that if H is more probable after knowing E that Pr( E | H & K ) / Pr( E | K ) > 1. Thus substituting for Pr(E | K) and multiplying both sides by this substitution, we have

Pr( E | H & K ) > Pr(E | H & K) * Pr(H | K) + Pr(E | ~H & K) * Pr(~H | K).

From which

Pr( E | H & K ) – Pr(E | H & K) * Pr(H | K) > Pr(E | ~H & K) * Pr(~H | K).

Gathering terms and because (1 – Pr(H | K)) = Pr(~H | K) we conclude

Pr( E | H & K ) > Pr(E | ~H & K),

and that this must hold when the probability of H increases when adding E. In other words, the probability that E is true given we assume H is true is larger than the probability E is true given we assume H is false. In other, other words, the probability the evidence is true is larger assuming H true than assuming H false. For our example this says it is more likely a tricky magician exists if we see a ‘head’ than if we do not see it, which I hope makes sense.

**Conclusion**

There is no conclusion yet, except for these mathematical consequences (well known to readers of Jaynes, for example). We’ll have to return to these results when we look at Swinburne’s implementation of them. But not today.

**Homework**

There is no understanding without doing, which should be the only reason for homework. Your task: propose a K, H, and one or more E such that the probability of H increases on E. Bonus points for non-quantifiable K, H, and E (let him that readeth understand).

1 November 2012 at 11:20 pm

I met Prof. Swinburne once at cocktail reception whilst a student of Philosophy and Theology at Oxford. He was retired at that point and aged but ever the gentleman and (after drinks and years) sharper than me (at my youngest and most sober). We don’t produce many like him anymore.

2 November 2012 at 1:29 am

Self awareness of one’s increasing ignorance comes with wisdom.

I think that English is the best language. I also think that it is quite poor.

2 November 2012 at 2:42 am

“Barbara Streisand votes for Romney” sounds like K. Has always, is now and will always be gushing effusives regarding Romney. Or if not K what possible knowledge could we have that would have any “!” with H.

Psyto… nad.

I’t might have been interesting to have another psychotic as President but I think Obama will serve a whole second term.

2 November 2012 at 11:08 am

Flames that are produced while some material is burning are hot because the burning process releases lots of energy. And lots of flames you see are the result of that burning process, and that is the reason they are not. That is not induction, but prediction.

It is of course quite possible that you can make something cold that looks like a flame, or even find a burning process that generates very little energy, and will therefore have quite cool flames.

Nevertheless, being burned is not a popular experience so there is no harm in being careful around flames.

2 November 2012 at 12:03 pm

If probability is regarded as the degree of belief or confirmation, basically, this post explains the following intuitions using probability calculus.

An extra argument/evidence/information (E) will have no influence on, or strengthen, or weaken the degree of your belief in a certain argument/event (H).

E is called a C-probability argument if it strengthens your belief in H, and a P-probability argument if it increases your degree of belief in H to more than 50%.

Loosely speaking, in this case, we know E as positive evidence for H. In a way, E and H are

positivelycorrelated. So, given you know H is true, your degree of belief in E will increase, and vice versa. Given ~H (non H) is true, it will weaken your belief in E. Furthermore, your degree of belief in E given H is true is larger than the one given H is not true.My 2 cents.

2 November 2012 at 12:07 pm

“that is the reason they are not. ” >> that is the reason they are hot.

Sorry.

2 November 2012 at 12:24 pm

@JH

In case of the magician, as soon as he throws hist first tails, it is clear he’s not a magician. Belief is then not lessened, but shattered. And the interesting bit is, that before the throw of the tail, belief was apparently increasing.

Regarding Pr(E| H & K), it is possible to compute this. Nobody would believe you to be a magician, unless you have already thrown quite a lot of heads and no tails at all. Like nobody would believe you if you said a coin with two sides, one of which would be head. If the magician was to supply his own coin, and that coin would have two sides with ‘head’, then it would be very easy to prove one was a magician.

So instead of having just a magician, you have a magician with a verified number of heads and no tails on record. The change of such a person being a magician is easy to calculate, (1/2) to the power of N.

And it would be a good idea to check the coin after each throw, if the other side still is not ‘head’.

2 November 2012 at 1:08 pm

Sander van der Wal,

Indeed, the negative evidence that the magician throws a tail will destroy the belief that he/she is a magician as defined in H by Briggs. In other words, the additional negative evidence has weakened the degree of belief in H (before the throw of tails) to zero.

2 November 2012 at 3:18 pm

Once proposition E is introduced.

P(H|K)>1/2

P(H|K)=P(H|K&E)+P(H|K&~E)

P(H|K&~E)=1/2

I am not sure it really matters if P(E) is small but it is distracting me from following the rest of the arguement.

2 November 2012 at 3:23 pm

Doug M,

Your second line is a mistake. You

cansay:Pr(H|K) = Pr(H & E | K) + Pr(H & ~E | K),

but

Pr(H|K) = Pr(H | K & E) + Pr(H | K & ~E)

is false in general.

2 November 2012 at 4:58 pm

Correction noted.

3 November 2012 at 2:18 am

And when Jesus had finished eating he turned to his disciples and said, “Brethren, I should have passed on the refried beans. Because I’ve got a fart the size of Palestine about to explode out of my ass.’–Jesus Christ, as told to Kirk Cameron.

3 November 2012 at 8:38 am

All,

I left L.W. Dickel’s comment intact, only to show a representative example of what happens when you marry self-esteem and ignorance. Coincidentally, this is the slogan of our modern-day educational system.

4 November 2012 at 3:52 am

“When you marry self esteem and ignorance.”

Isn’t that how religions are formed?

Only you might add: deluded, retarded and prone to believing in superstitious bullshit.

Cheers!

3 February 2013 at 12:46 pm

Briggs,

My God, how could you think that increasing ignorance isn’t a very good sign. It means at the least that you are paying attention.

I have very little use for people whose ignorance is not increasing – it usually means that they simply aren’t trying.

3 February 2013 at 4:27 pm

define “god”

3 February 2013 at 5:42 pm

Not an example, more of a refutation.

Instead of assuming that the guy throwing a coin is a magican, lets assume we think he is a fraud throwing a coin with heads only. Now, each throw that turns up a head increases the change he is a fraud, in exactly the same way as in the belief that the guy is a magican.

But you cannot be both a magician and a fraud at the same time, these hypotheses are mutually exclusive.

But from the measurements, it is impossible to say which of the two hypotheses is correct, as the same meaurement makes both hypotheses equally more likely.

4 February 2013 at 11:44 am

Early on you say that because the coin has two sides and only one will appear that P(H|K)=1/2. However, this probability does not follow from the premise K. We simply don’t have enough information to give P(H|K) a value.

4 February 2013 at 12:30 pm

Charles, why not? If you have a two sided object of which only one side can show when tossed, then each side has a 1/2 chance of showing. If I add more evidence I can change my 1/2 probability to be more refined. Isn’t this the principle of indifference in action?

4 February 2013 at 1:06 pm

Would this work as an answer to the homework?

With knowledge K “I have a shuffled deck of cards, and one card will be drawn from the top.”

The conclusion H is “The card drawn will be an Ace.”

I add evidence E “This is a pinochle deck.”

If an Ace comes up on the first draw, my probability that I’m dealing with a pinochle deck has increased greater than if an Ace did not come up.

I can’t necessarily quantify it since regular decks have 52 cards, but pinochle decks have 48 face cards & aces, and I don’t know how many cards there are in the deck.

Does this make sense?

4 February 2013 at 1:12 pm

@Sander, I don’t think that’s what’s being demonstrated by your example.

He may be a fraud, but it doesn’t matter. Logically, the probability of a magician being the one that tossed the coin given that we saw a heads is greater than the probability of a magician being the one that tossed the coin if a tails comes up. We’re not trying to find the correctness of the tosser being a magician. We simply know that if a heads comes up, we are more confident in the person being a magician than if a tails came up.

4 February 2013 at 2:49 pm

If you can think of only one hypothesis that explains the reason why this guy is only throwing heads, then you will become more confident.

But as soon as you think of a second hypotheses, you need to add that second hypotheses to the equation. You get a combined hypothesis, the guy is either a magician, of a fraud, but noth both. Now the guy throws heads, and we become more confident that the guy is either a magician, or a fraud, but not both.

And then we think of a new hypothesis, the guy is just very lucky and his first tail will come after 10*2000 throw.

Which means we are now testing a new hypothesis again, magician, fraud or very luck. And after a new heads we are even more confident of the guy being a magician, a fraud or just very lucky.

And a fourth hypothesis, the guy has now run out of his luck and the next throw is tails. The testable hypothesis now is, magician, fraud, lucky or out of luck. And he throws tails, so again we have become more confident in our combined hypothesis, magician, fraud, lucky, or out of luck, because the guy has thrown a tail.

Clearly, exor-ing hypotheses do not work.

If you have two, or more hypotheses you must test them separately, and as long as the experiments do not contradict the hypotheses your confidence in all of them will grow, and grow, and grow. But how is that possible if these hypotheses are mutually exclusive, i.e. P(H1) + P(H2) + … + P(Hn) = 1, or 0, the sum being 1 of one of these hypotheses is indeed true, or 0, if none of the proposed hypotheses is true.

Clearly if you sum over all possible hypotheses, one of them is true so the sum is 1, and we do become more confident of the true hypothesis. And if we are lucky and have formulated the true hypothesis among our finite number of hypotheses we will become more confident in that one too is we run enough experiments. But if we have not formulated the true hypothesis, then our confidence will grow in all the hypotheses that have not yet being proven wrong, knowing all the time that at most one of these is true indeed. The only confidence that is growing is that one of them might be true.

Hence the idea that you need to find an experiment that will disprove a hypothesis as quickly as possible. So you inspect the coin after each throw, to see if it is a proper one, with one head and one tails.

4 February 2013 at 3:38 pm

Followup to Nate: All we know is the object has two sides and one side will appear. Therefore P(H|K) is between 0 and 1 inclusive. Also P(T|K)=1-P(H|K). That’s all we know. Anything else represents investigator bias.

For instance, we can apply a “minimax” idea: if we select P(T|K)=p, then we are wrong by at most the larger of p and 1-p. The minimum over p of the max(p,1-p) says choose p=0.5.

In practice, this may be a perfectly reasonable way of dealing with our uncertainty, but WMB is discussing a problem of logic. IMHO, he can’t claim Bayesian probability is logical and apply arbitrary criteria for eliminating the uncertainty.

E.g., if we knew the object was a fairly ordinary looking coin, then a Bayesian might reasonably assume P(H|K) = 0.5, or perhaps assume a beta distribution centered on 0.5 for P(H|K). These assumptions are reasonable because we’ve flipped many coins in the past and “know” P(H|K) is approximately 0.5 for a great many coins, so therefore it is likely to 0.5 for this coin. But we have no such additional information in this problem.

4 February 2013 at 4:08 pm

So basically, @Sander, you are saying that Pr( E | H & K ) > Pr( E | ~H & K ) is true, but not very useful because we only have one possible hypothesis, and only the set of evidence given? What if you can’t easily find evidence that will easily falsify something?

I’m confused as to your terminology. Briggs posted that K is our knowledge (coin with heads n tails), H is our hypothesis (coin shows heads) and E is our evidence. Sure, in a thought experiment we can invent all sorts of E, but aren’t we just trying to see improve our credence of E? And when a heads is thrown, we are more confident in E, no matter what the other E could be? But don’t we all become more confident differently? If I can’t think if another reason than that he’s a magician, and can’t find more evidence, I may make a bet after he throws heads 20 times in a row that he’s a magician. But if you think that the coin is weighted and the man is a fraud, or he could be a magician, you might not bet at all, since your confidence that he’s one of the two has gone up, but you need evidence that allows you to distinguish them.

I’m not sure we disagree, I’m just not quite understanding your point – that we may end up overconfident if we only allow one piece of evidence to enter our minds?

4 February 2013 at 4:48 pm

Charles,

So you’re saying that all we know is:

0 <= Pr(H|K) <= 1

0 <= Pr(T|K) <= 1

Pr(H|K) + Pr(T|K) = 1

So I guess my question is:

We don't know anything except that one of these two outcomes may occur. If I don't assign equal probabilities to Pr(H|K) and Pr(T|K), then i'm assiging non-equal proabilities to them, and I don't have any evidence to do that. I have to have a degree of belief in Pr(H|K) and Pr(T|K) – given that I know nothing except that one of two outcomes appears, shouldn't I, based on the evidence, believe in either one equally?

4 February 2013 at 5:14 pm

Nate, you say you should choose either one equally “based on the evidence”. There is no evidence supporting any probability choice. Any value, whether 0.00001, 0.2354, or 0.5, or any other number would represent your bias, not evidence.

The point WMB is making is that all probabilities are dependent on the premises you make. The point I’m making is the premises have to be sufficiently strong to justify assigning numbers to those probabilities.

This is a fundamental problem in arguing Bayesian probability is an extension of ordinary logic. In almost every problem one can think of, there is no amount of empirical evidence that allows one to assign probabilities uniquely (so that every knowledgeable person would assign the same probabilities).

As some point, the practitioner asserts his bias, as you do arguing we should choose probabilities “equally”.

5 February 2013 at 4:50 am

@Nate

I am saying that having a single hypothesis is a bad idea, especially when you are only looking for experimental outcomes which agree with the hypothesis. The equations do show that the change the hypothesis is true becomes a bit bigger. But what the equation does not show is that the same is true for all other competing hypotheses. You have good reason to be a bit less inconfident about a theory that has not been disproven by more experimental data, but no reason at all to be more confident. This is a glass being almost empty.

5 February 2013 at 1:51 pm

Sander & Charles, thank you for your detailed answers to my questions.

5 February 2013 at 2:38 pm

well, if Dickel can get a personal response from Briggs, so can I.

at the very least, it increases the probability of getting a personal response from Briggs to > 0

5 February 2013 at 2:45 pm

Charles Boncelet,

We do in fact have enough information. The probability is assigned via the “statistical syllogism”. Click on my “Who” and navigate down to the list of papers on assigning probabilities.

anona,

Hi there.

7 February 2013 at 4:47 pm

I’ve read your paper on assigning probabilities (“On the non-arbitrary assignment of probability”). Admittedly, I need to read it again, but …

I find the statistical syllogism argument just as arbitrary as any other (indifference, ignorance, etc). All these mechanisms (you reject) assign equal probability but they are bad, while statistical syllogism assigns the same equal probabilities but it is good.

May I suggest the argument might be more compelling if you could provide examples when statistical syllogism assigns different–and better–probabilities than the methods you reject.

Viewed as a branch of mathematics, probability is agnostic to the assignment of actual values, i.e., any value of p works. The equations are self-consistent.

What makes probability interesting is its connection to real-life experiments. I.e., probability is predictive. It tells us something about what might happen. We can even make strong statements in many experiments (e.g., the various laws of large numbers).

I maintain that we have no basis for assigning a numerical value to P(H|K) absent some physical understanding of the object and/or absent some knowledge of past history of the experiment.

Arbitrarily assigning P(H|K)=0.5 can result in terrible performance, in, say, a data compression system or a drug testing regimen.

Just because we want to assign a value to P(H|K) doesn’t mean we should.

(BTW, I get far less worked up on the “arbitrariness” or “subjectivity” of Bayesian priors than others do. As long as the prior is “reasonable” the system should quickly adapt to data and the resulting posterior probabilities should depend weakly on the prior.)

12 February 2013 at 11:45 am

—

Swinburne (modified) says a P-probability argument makes its conclusion more probable than not. That means,

Pr( H | E & K ) > 1/2

and the inequality is strict. This implies that a “good” P-probability argument starts with Pr( H | K ) Pr(E | ~H & K). This says that the change that a some guy being a magician is bigger if he throws heads than if he throws tails, using a proper coin.

Pr(E | ~H & K) = Pr(E & ~H & K) / Pr(~H & K). So the change that you are a magician *if* you have thrown tails using a proper coin is equal to the change you are a magician *after* you have thrown tails using a proper coin, divided by the change that a proper coin comes up with tails.

By definition you are not a magician if you throw tails, so Pr(E & ~H & K) == zero. Pr(~H & K ) is a proper number, 1/2, and zero divided by 1/2 is zero.

The statement therefore says that the change you are a magician *if* you throw tails using a proper coin is zero too. Which is again what we would expect.

Let’s throw again. We have more knowledge: K1 = H & K. We have the same hypothesis E, and a new throw H1.

Pr( H1 | E & K1 ) > 1/2

expanding

Pr( H1 | E & H & K ) > 1/2

And indeed, the change that a magician will throw two heads in a row must be bigger than 1/2.

Pr( E | H1 & K1 ) > Pr(E | ~H1 & K1).

expanding

Pr(E | H1 & H & K) > Pr(E | ~H1 & H & K).

Which becomes

Pr(E | ~H1 & H & K) = Pr(E & ~H1 & H & K) / Pr(~H & H & K). So the change that you are a magician *if* you throw heads first, then tails using a proper coin, is the change that you are a magician *after* you thrown heads, then tails, using a proper coin. And that change is zero too.

Adding more tests doesn’t matter. You only increase the knowledge K, and every new H is as big a hurdle as the first.

Ok, so what about adding the outcome to E: E1 = E & H:

Pr( H1 | E1 & K ) > 1/2

expanding

Pr(H1 | E & H & K) > 1

which can be written as

Pr(H1 | E & K1) > 1/2

and we have already shown that what happens then.