We’ve done this before in different form. But it hasn’t stuck; plus we need this for reference.
Not all probability is quantifiable. The proof for this is simple: all that must be demonstrated is one probability that cannot be made into a unique number. I’ll do this in a moment, but first it is interesting to recall that in its infancy it wasn’t clear probability could or should be represented numerically. (See Jim Franklin’s terrific The Science of Conjecture: Evidence and Probability Before Pascal.) It is only obvious probability is numerical when you’ve grown up subsisting solely on a diet of numbers, a condition true any working scientist.
The problem is because some probabilities are numerical, the only time it feels real, scientific and weighty, is if it is stated numerically. Nobody wants to make a decisions based on mere words, not when figures can be used. Result? Over-certainty.
Kolmogorov in 1933’s Foundations of the Theory of Probability gave us stated axioms which put probability on a firm footing. Problem is, the first axiom said, or seemed to say, “probability is a number”, and so did the second (the third gave a rule for manipulating these numbers). The axioms also require a good dose of mathematical training to comprehend, which contributed to the idea probabilities are numbers.
Different, not-so-rigorous, but nevertheless appealing axioms were given by Cox in 1961. Their appeal was their statement in plain English and concordance with commonsense. (Cox’s lack of mathematical rigor was subsequently fixed by several authors.1) Now these axioms yield two interesting results. First is that probability is always conditional. We can never write (in standard symbols) Pr(A), which reads “The probability of proposition A”, but must write Pr(A|B), “The probability of A given the premise or evidence B.” This came as no shock to logicians, who knew that the conclusion of any argument must be “conditioned on” premises or evidence of some kind, even if this evidence is just our intuition. This result didn’t shock anybody else, either. Because it’s rarely remembered. Another victim of treating probability exclusively mathematically.
The second result sounds like numbers. Certainty has probability 1, falsity probability 0, just as expected. And, given some evidence B, the probability of some A plus the probability that A is false must equal 1: that is, it is a certainty (given B) that either A or not-A is true. Numbers, but only sort of, because there is no proof that for any A or B, Pr(A|B) will be a number. And indeed, there can be no proof, as you’ll discover. In short: Cox’s proofs are not constructive.
Cox’s axioms (and their many variants) are known, or better to say, followed by only a minority of physicists and Bayesian statisticians. They are certainly not as popular as Kolmogorov’s, even though following Cox’s trail can and usually does lead to Kolmogorov. Which is to say, to mathematics, i.e. numbers.
Here’s our example of a numberless probability: B = “A few Martians wear hats” and A = “The Martian George wears a hat.” There is no unique Pr(A|B) because there is no unique map from “a few” to any number. The only way to generate a unique number is to modify B. Say B’ = “A few, where ‘a few’ means 10%, Martians wear hats.” Then Pr(A|B’) = 0.1. Or B” = “A few, where ‘a few’ means never more than one-half…” Then 0 < Pr(A|B”) < 0.5. It should be obvious that B is not B’ nor B” (if it isn’t, you’re in deep kimchi). More examples are had by changing “a few” to “some”, “most”, “a bunch”, “not so many” and on and on, none of which lead to a unique probability. This is all true even though, in each case, Pr(A|B) + Pr(not-A|B) = 1. (Why? Because that formula is a tautology.)
It turns out most probability isn’t quantifiable, because most judgments of uncertainty cannot be and are not stated numerically. “Scientific” propositions, many of which can be quantified, are very rare in human discourse. Consider this, from which you will see it easy to generate endless examples. B (spoken by Bill) = “I might go over to Bob’s” as the sole premise for A = “Bill will go to Bob’s”. Note very carefully that this is your premise, not Bill’s. It is your uncertainty in A given B that is of interest. The only way to come to a definite number is by adding to B; perhaps by your knowledge of Bill’s habits. But if you were a bystander and overheard the conversation, you wouldn’t know how to add to B, unless you did so by subtle hints of Bill’s dress, his mannerisms, and things like that. Anyway, all these change B, and make it into something which is not B. That’s cheating. If asked for Pr(A|B) one must provide Pr(A|B) and not Pr(A|B’) or anything else.
This seemingly trivial rule is astonishingly difficult to remember or to heed if one is convinced probability is numerical. It would never be violated when working through a syllogism, say, or calculating a mathematical proof, where blatant additions to specified evidence are rejected out of hand. A professor would never let a student change the problem so that the student can answer it. Not so with probabilities. People will change the problem to make it more amenable. “Subjective” Bayesians make a career out of it.
Why is the rule so hard? No sooner will you ask somebody what is Pr(A|B) and they’ll say, “Well there’s lot of factors to consider…” There are not. There is only one, and that is B’s logical relation to A. Anything else, however interesting, is not relevant. Unless one wants to change the problem and discover the plausible evidence B’ which gives A its most extreme probability (nearest to 0 or 1). The modifier “plausible” is needed, because it is always possible to create evidence which makes A true or false (e.g. B = “A is impossible”). The plausibility is to fit the evidence into a larger scheme of propositions. This is a large topic, skipped here, because it is incidental.
Lots of detail left out here, which you have to fill in. See the classic posts page for how.
Update 2 Fixed the d*&^%^*&& typo that one of my enemies placed in the equation below. Rats!
Update An algebraic analogy. “If y = 1 and x + y < 7, solve for x.” There isn’t provided enough information to derive a unique value for x. It thus would be absurd, and obviously so, to say, “Well, I feel most x are positive; I mean, if I were to bet. And I’ve seen a lot of them around 3, though I’ve come across a few 4s too. I’m going with 3.”
Precision is often denied us. As silly as this example is, we see its equivalent occur in probability all the time.
1See inter alia Dupré and Tipler, 2009. New Axioms for Rigorous Bayesian Probability Bayesians Analysis, 3, 599-606.