*We’ve done this before in different form. But it hasn’t stuck; plus we need this for reference.*

Not all probability is quantifiable. The proof for this is simple: all that must be demonstrated is one probability that cannot be made into a unique number. I’ll do this in a moment, but first it is interesting to recall that in its infancy it wasn’t clear probability could or should be represented numerically. (See Jim Franklin’s terrific *The Science of Conjecture: Evidence and Probability Before Pascal*.) It is only obvious probability is numerical when you’ve grown up subsisting solely on a diet of numbers, a condition true any working scientist.

The problem is because *some* probabilities are numerical, the only time it feels real, scientific and weighty, is if it is stated numerically. Nobody wants to make a decisions based on mere words, not when figures can be used. Result? Over-certainty.

**Axiomatically**

Kolmogorov in 1933’s *Foundations of the Theory of Probability* gave us stated axioms which put probability on a firm footing. Problem is, the first axiom said, or seemed to say, “probability is a number”, and so did the second (the third gave a rule for manipulating these numbers). The axioms also require a good dose of mathematical training to comprehend, which contributed to the idea probabilities are numbers.

Different, not-so-rigorous, but nevertheless appealing axioms were given by Cox in 1961. Their appeal was their statement in plain English and concordance with commonsense. (Cox’s lack of mathematical rigor was subsequently fixed by several authors.^{1}) Now these axioms yield two interesting results. First is that probability is *always* conditional. We can never write (in standard symbols) Pr(A), which reads “The probability of proposition A”, but must write Pr(A|B), “The probability of A given the premise or evidence B.” This came as no shock to logicians, who knew that the conclusion of any argument must be “conditioned on” premises or evidence of some kind, even if this evidence is just our intuition. This result didn’t shock anybody else, either. Because it’s rarely remembered. Another victim of treating probability exclusively mathematically.

The second result sounds like numbers. Certainty has probability 1, falsity probability 0, just as expected. And, given some evidence B, the probability of some A plus the probability that A is false must equal 1: that is, it is a certainty (given B) that either A or not-A is true. Numbers, but only sort of, because there is no proof that for any A or B, Pr(A|B) will be a number. And indeed, there can be no proof, as you’ll discover. In short: Cox’s proofs are not constructive.

Cox’s axioms (and their many variants) are known, or better to say, followed by only a minority of physicists and Bayesian statisticians. They are certainly not as popular as Kolmogorov’s, even though following Cox’s trail can and usually does lead to Kolmogorov. Which is to say, to mathematics, i.e. numbers.

**Numberless probability**

Here’s our example of a numberless probability: B = “A few Martians wear hats” and A = “The Martian George wears a hat.” There is no unique Pr(A|B) because there is no unique map from “a few” to any number. The only way to generate a unique number is to modify B. Say B’ = “A few, where ‘a few’ means 10%, Martians wear hats.” Then Pr(A|B’) = 0.1. Or B” = “A few, where ‘a few’ means never more than one-half…” Then 0 < Pr(A|B”) < 0.5. It should be obvious that B is not B’ nor B” (if it isn’t, you’re in deep kimchi). More examples are had by changing “a few” to “some”, “most”, “a bunch”, “not so many” and on and on, none of which lead to a unique probability. This is all true even though, in each case, Pr(A|B) + Pr(not-A|B) = 1. (Why? Because that formula is a tautology.)

It turns out most probability isn’t quantifiable, because most judgments of uncertainty cannot be and are not stated numerically. “Scientific” propositions, many of which can be quantified, are very rare in human discourse. Consider this, from which you will see it easy to generate endless examples. B (spoken by Bill) = “I might go over to Bob’s” as the *sole* premise for A = “Bill will go to Bob’s”. Note very carefully that this is *your* premise, not Bill’s. It is *your* uncertainty in A given B that is of interest. The only way to come to a definite number is by *adding to* B; perhaps by your knowledge of Bill’s habits. But if you were a bystander and overheard the conversation, you wouldn’t know how to add to B, unless you did so by subtle hints of Bill’s dress, his mannerisms, and things like that. Anyway, all these *change* B, and make it into something which is not B. That’s cheating. If asked for Pr(A|B) one must provide Pr(A|B) and not Pr(A|B’) or anything else.

This seemingly trivial rule is astonishingly difficult to remember or to heed *if* one is convinced probability is numerical. It would never be violated when working through a syllogism, say, or calculating a mathematical proof, where blatant additions to specified evidence are rejected out of hand. A professor would never let a student change the problem so that the student can answer it. Not so with probabilities. People *will* change the problem to make it more amenable. “Subjective” Bayesians make a career out of it.

Why is the rule so hard? No sooner will you ask somebody what is Pr(A|B) and they’ll say, “Well there’s lot of factors to consider…” There are not. There is only one, and that is B’s logical relation to A. Anything else, however interesting, is not relevant. *Unless* one wants to change the problem and discover the plausible evidence B’ which gives A its most extreme probability (nearest to 0 or 1). The modifier “plausible” is needed, because it is always possible to create evidence which makes A true or false (e.g. B = “A is impossible”). The plausibility is to fit the evidence into a larger scheme of propositions. This is a large topic, skipped here, because it is incidental.

*Lots of detail left out here, which you have to fill in. See the classic posts page for how.*

**Update 2** Fixed the d*&^%^*&& typo that one of my enemies placed in the equation below. Rats!

**Update** An algebraic analogy. “If y = 1 and x + y < 7, solve for x.” There isn’t provided enough information to derive a unique value for x. It thus would be absurd, and obviously so, to say, “Well, I feel most x are positive; I mean, if I were to *bet*. And I’ve seen a lot of them around 3, though I’ve come across a few 4s too. I’m going with 3.”

Precision is often denied us. As silly as this example is, we see its equivalent occur in probability all the time.

—————————————————————-

^{1}See *inter alia* Dupré and Tipler, 2009. New Axioms for Rigorous Bayesian Probability *Bayesians Analysis*, 3, 599-606.

September 15, 2013 at 9:49 am

I can’t help but think that this is all semantics, i.e. a debate over the definition of probability. Whereas I would say that there is insufficient information to calculate a probability, you insist in calling it a non-quantifiable probability. A tempest in a teapot? A diet of numbers versus a Diet of Worms?

September 15, 2013 at 10:52 am

This also proves that most numbers aren’t numbers, either.

For example, consider “the number of martians who wear hats”. We’re told this is “a few”, but “a few” isn’t a number. Most expressions in everyday use are like this. and so, therefore, most numbers are not numbers.

A question – can you always write P(A|U)? Where U is the universal event, the sample space itself, the set of all the things that can happen. What does this mean?

September 15, 2013 at 11:20 am

Scotian,

I think his point is that for some problems there will never be enough information to quantify the probability.

September 15, 2013 at 1:26 pm

NIV,

The trick is not writing it in symbols, which can be a useful shorthand, but they can just as easily obfuscate.

In other words, better define “the set of all things that can happen” which sounds complicated.

September 15, 2013 at 1:36 pm

MattS,

Will never be? What does that mean? In any case, I don’t see how this changes things.

September 15, 2013 at 1:42 pm

Ah! You mean like writing numbers without symbols like 1, 2, 3, 4, … ?

September 15, 2013 at 1:56 pm

Scotian,

Nope.

Here’s another: B = “Between half and two thirds of X are F” and A = “(a new) x is F”. Interval there.

Besides, when you say “insufficient info” it’s like saying there’s always info which would lead to extreme answer (0 or 1). Why not? Why stop short at, say, 32.76%?

But you see what I mean. The lure is too strong. You just will have to change the B into something it isn’t. The provided evidence—the only evidence you get for the stated problem at hand—isn’t good enough.

September 15, 2013 at 1:59 pm

NIV,

Nope. I mean explaining your creature so that it is unambiguously comprehensible.

September 15, 2013 at 2:12 pm

Strange. I often find the symbols less ambiguous and more comprehensible than the words.

I’m always a bit suspicious when people say they can’t explain things in symbols. It’s often a sign of fuzzy thinking. 🙂

September 15, 2013 at 2:16 pm

NIV,

Ah, but the symbols are always symbolic

ofsomething. Trusting too deeply in symbols is what leads to the deadly sin of reification.And I still haven’t any idea what your U is.

September 15, 2013 at 2:34 pm

This is one of the rare occasions where you have misunderstood me, Briggs. It seems to me that you have changed the definition of probability as commonly used and then have complained that others are misusing the term. You could have just as easily said that the probability is known to lie between one half and two thirds.

September 15, 2013 at 2:59 pm

“Not all probability is quantifiable.”

Try telling that to the EPA. They are masters at quantifying the unknown. They were able to determine that exposure to environmental tobacco smoke causes cancer by quantifying peoples answers to a questionare asking if they had been exposed to environmental tobacco smoke.

September 15, 2013 at 3:04 pm

Oh, I don’t *trust* in symbols – they’re only a tool. The point about symbols is that they only contain what you put into them. They don’t drag along a whole baggage of unexamined associations and assumptions with them. You write down the argument using the symbols to make sure nothing has sneaked through, and *only then* do you translate it back into words.

If someone is claiming that their argument can’t be expressed in symbols, it usually means it relies on something unstated that they don’t want to talk about. At least when it comes to arguing about mathematics, I’d say.

I suppose I’m still mildly surprised you don’t know what U is, especially as you implicitly used it in your argument above, but given that you claim not to know what P(A) means either, not very. Different people look at the world differently. As you said in an earlier post, one weeps for the difficulty of explaining things, but at the same time, it makes the world more interesting.

September 15, 2013 at 3:47 pm

It’s the vibe.

September 15, 2013 at 5:01 pm

I do not believe Briggs has changed the definition of probability. He said if you can’t know you cannot quantify. The problem is if the question is ill-posed it is worse to supply your own information.

In risk assessment there is estimation by eliciting expert opinion. Ask three or more experts, average their answers, and you have the best available answer. Over confidence of experts and expert supply of information not in the question does not a risk assessment make. Worse is defining an expert. Often experts have little actual experience in the area of their alleged expertise. They are experts because they served on committees or managed a group that has done work in the area.

As Briggs said,”I mean explaining your creature so that it is unambiguously comprehensible.” The onus is on the one asking the question. If the question cannot be answered as posed, then no quantitative answer is possible. There are too many short-cuts and tricks-of-the-trade that are being used in risk assessment (Is coffee good for us this week?) that answering ill-posed questions results in only in ill.

September 15, 2013 at 5:35 pm

Everyone keeps telling me what Briggs means. What a strange thing to do. As to the value of expert opinion I like this quote from Feynman:

“The man who replaced me on the commission said, “That book was approved by sixty-five engineers at the Such-and-such Aircraft Company!”

I didn’t doubt that the company had some pretty good engineers, but to take sixty-five engineers is to take a wide range of ability–and to necessarily include some pretty poor guys! It was once again the problem of averaging the length of the emperor’s nose, or the ratings on a book with nothing between the covers. It would have been far better to have the company decide who their better engineers were, and to have them look at the book. I couldn’t claim that I was smarter than sixty-five other guys–but the average of sixtyfive other guys, certainly!”

Couldn’t have said it better myself.

September 16, 2013 at 4:44 am

All,

See the addendum (suggested by comments) to the original post.

September 16, 2013 at 7:06 am

Great post Bob…when is your book on all this going to be published?

September 16, 2013 at 7:07 am

That should be great post Matt… sorry, Freudian ego slip.

September 16, 2013 at 7:13 am

There is a problem with your update Briggs as x = 6.

September 16, 2013 at 8:51 am

If we are quoting Feynman,

“It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty–a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid–not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked–to make sure the other fellow can tell they have been eliminated.

“Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can–if you know anything at all wrong, or possibly wrong–to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.

“In summary, the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgement in one particular direction or another.”

I believe that this quote comes close to what I understand Briggs as saying. While the actual quote concerns the end of an experiment, it applies to the beginning, also. A well-posed question, Pr(A|B) requires an explicit description of B. A kind of leaning over backwards.

It is necessary to tell us why x is not equal to 6. Briggs, did you mean x + y + z = 7? There I go supplying information.

September 16, 2013 at 12:14 pm

Mr Briggs, you have me scratching your head with the “algebraic analogy”. What are the other possible values of x? I guess it’s been a while since Algebra I for me.

By the way, have you ever read “Probability, Statistics, and Truth” by R. von Mises? (Not L. von Mises!) First edition was 1928, although I read a 1981 edition. He defines probability as the (limiting value of the) frequency of encountering a certain attribute in a certain collective/population, and rejects the definition of probability as a measurement of our own uncertainty about a thing. “[I]f we know nothing about a thing, we cannot say anything about its probability”. It is an argument like yours, against assigning made-up quantities in order to force unquantifiable probability to work with quantitative tools (i.e. statistics).

September 16, 2013 at 2:15 pm

“He defines probability as the (limiting value of the) frequency of encountering a certain attribute in a certain collective/population, and rejects the definition of probability as a measurement of our own uncertainty about a thing.”That’s the Frequentist definition of probability! Briggs will be offended!

At the risk of joining the chorus of those saying ‘what Briggs means’, there’s a different perspective that probably won’t help. (I’ve explained this before, but nobody seems to have understood, so I don’t suppose this attempt will fare any better.)

There are three separate domains in which we might define probability. There is:

1) Actual objective reality.

2) A mathematical model of reality.

3) The mind of an observer of random events.

We have no way of knowing how reality actually works. Is it deterministic? Is randomness real, or an illusion? We don’t know.

All we can do is build mental models of reality. These are approximations, idealisations, neat and perfect toy universes where we can make up the rules and do the impossible (e.g. represent quantities as real numbers of infinite precision). This is the universe that obeys Kolmogorov’s axioms. This is the world of sigma algebras and measures and functions and sets. This is the place where Frequentist and Propensity models of probability are defined. And this is where the absolute probabilities P(A) etc. live.

This mathematical model also contains conditional probabilities P(A|B) as well, that are related to P(A and B) and P(B) and so on by Bayes theorem. We call all of these ‘Bayesian probabilities’.

The mathematical world also contains mathematical observers, who have no access to the true probabilities that the model is set up to use, but can only see the *outcomes* of random experiments. They might sometimes be able to argue for a particular absolute probability on symmetry grounds, but they have no way to determine the *actual* probability of anything experimentally.

All they can do is to do experiments, and observe the outcomes. The information that provides is always incomplete, and subject to error. You can never determine P(A), you can only make some observations b of a random variable B, use your knowledge of physics to figure out P(B|A) for each possible A, and then calculate a quantity P(A|B=b) that is *probably* close to P(A) for a big enough amount of evidence b. Such a quantity is called a ‘belief’.

Probabilities are what the mathematical model actually *does*, and *cannot* be determined by observers. Beliefs are what the observers in the mathematical model can possibly *know* about the probabilities, their best guess given their observations. The two are not the same kind of thing, although they mostly follow the same mathematical rules.

‘Bayesian belief’, the form of belief that follows the rules of Bayesian probability, is the only quantity observable to modeled observers. Bayesian probability, the actual absolute and conditional probabilities in the model, can never be observed. They can only be defined when setting up the model.

And Bayesian belief is what Briggs is calling “probability”, and he is denying the existence/validity of what I have described above as Bayesian probability. It’s a sort of understandable position since the only things we can actually see/touch/calculate are the beliefs, but it results in a very peculiar ontology in which beliefs no longer model probabilities, but are left hanging, pointing at nothing, referring to nothing.

And this too is what I think Scotian meant: – that Briggs has changed the definition of probability to that of belief, rejected the original definition of probability, and then complained that people are misusing the term.

Briggs is quite right that the belief/probability distinction is confusing, widely confused, and that what statisticians are actually calculating from experiments are in most cases actually beliefs. Scotian is quite right that Briggs is changing the definition of probability from what most mathematicians understand by the term to his own private non-standard definition, and then moaning about how difficult that is to explain to people.

And the point of Briggs’ present post was to say that experimental results and observations are often partial and ambiguous, and therefore the only sort of probabilities people can calculate (i.e. beliefs) are likewise partial and ambiguous. Unlike absolute probabilities, beliefs are often a bit fuzzy around the edges.

September 16, 2013 at 3:38 pm

Joe Clark,

That’s because there was a typo placed there by my enemies while I was soaring through the air.

Arrrgggggggh. Typos will be the death of me.

It’s now fixed. OUR KNOWLEDGE of x is that it is less than, but not equal to 6, and possibly any value all the way down.

Probability is logic, and like logic, it makes statements of our knowledge. If you think it’s real, mail me a bucket of it and I’ll send you as many dollars as real probabilities you send me.

And now…jet lag. Be back tomorrow.

September 17, 2013 at 1:51 pm

FYI, I’m not saying that the von Mises book declares there to be no such thing as “probability as a measure of uncertainty”, rather, he’s insisting that the different concepts of probability not be called by the same name. Remember, his book is about the relationship between empirical/frequentist probability and statistics. What I like about it, and I think you would like too, is that he is making clear the distinction between the different conceptions of probability, and emphatically stating that readers should NOT take statistical methods intended for the first definition of probability and fudge the numbers to apply them to the second. It’s philosophically clear, and I liked that.

September 17, 2013 at 8:37 pm

Joe Clark,

There are plenty of instances, like in the Martian example, where there just is no limiting relative frequency. And, anyway, to call the lrf “probability” is never any practical value, for, as Keynes long ago told us, in the long run (there the lrf lives) we shall all be dead.

There’s a terrific paper listing 17 (I think; maybe it was 13) proofs showing lrf cannot be probability. The Martian-like example is one of them. It’s in the philosophical literature and not statistics, so it’s not as well know. Hayek? Or some name like that. I’m blanking on it, and far too lazy to look it up.

Anyway, once I lay holt of it (as my dad would say) I’ll go over the points on the blog. Should be fun.

September 18, 2013 at 1:01 am

Do you mean Hajek?

http://philrsss.anu.edu.au/people-defaults/alanh/papers/what_cp_couldnt_be.pdf

September 18, 2013 at 8:50 am

NIV,

I was close! That’s the guy. Although in the paper I agree with his conclusion “that conditional probability should be taken as the fundamental notion in probability theory” you’ll notice he never refers to Cox. Cox is really only known via Jaynes, and that means known early on in physics, and only latterly in probability.

We should do some of Hájek’s papers here.

September 18, 2013 at 1:33 pm

In that case, this might be (the other half of?) the paper you was thinking about.

http://philrsss.anu.edu.au/people-defaults/alanh/papers/fifteen.pdf

Exercises for the student for the most part, I thought. Entertaining, anyway.

September 18, 2013 at 1:37 pm

NIV,

Yep, one of them. Actually, he references the earlier, fuller one right in the beginning.

UpdateForgot to say thanks. Thanks!