Philosophy

Most Probabilities Aren’t Quantifiable

Look at those colorful numbers!

Look at those colorful numbers!

We’ve done this before in different form. But it hasn’t stuck; plus we need this for reference.

Not all probability is quantifiable. The proof for this is simple: all that must be demonstrated is one probability that cannot be made into a unique number. I’ll do this in a moment, but first it is interesting to recall that in its infancy it wasn’t clear probability could or should be represented numerically. (See Jim Franklin’s terrific The Science of Conjecture: Evidence and Probability Before Pascal.) It is only obvious probability is numerical when you’ve grown up subsisting solely on a diet of numbers, a condition true any working scientist.

The problem is because some probabilities are numerical, the only time it feels real, scientific and weighty, is if it is stated numerically. Nobody wants to make a decisions based on mere words, not when figures can be used. Result? Over-certainty.

Axiomatically

Kolmogorov in 1933’s Foundations of the Theory of Probability gave us stated axioms which put probability on a firm footing. Problem is, the first axiom said, or seemed to say, “probability is a number”, and so did the second (the third gave a rule for manipulating these numbers). The axioms also require a good dose of mathematical training to comprehend, which contributed to the idea probabilities are numbers.

Different, not-so-rigorous, but nevertheless appealing axioms were given by Cox in 1961. Their appeal was their statement in plain English and concordance with commonsense. (Cox’s lack of mathematical rigor was subsequently fixed by several authors.1) Now these axioms yield two interesting results. First is that probability is always conditional. We can never write (in standard symbols) Pr(A), which reads “The probability of proposition A”, but must write Pr(A|B), “The probability of A given the premise or evidence B.” This came as no shock to logicians, who knew that the conclusion of any argument must be “conditioned on” premises or evidence of some kind, even if this evidence is just our intuition. This result didn’t shock anybody else, either. Because it’s rarely remembered. Another victim of treating probability exclusively mathematically.

The second result sounds like numbers. Certainty has probability 1, falsity probability 0, just as expected. And, given some evidence B, the probability of some A plus the probability that A is false must equal 1: that is, it is a certainty (given B) that either A or not-A is true. Numbers, but only sort of, because there is no proof that for any A or B, Pr(A|B) will be a number. And indeed, there can be no proof, as you’ll discover. In short: Cox’s proofs are not constructive.

Cox’s axioms (and their many variants) are known, or better to say, followed by only a minority of physicists and Bayesian statisticians. They are certainly not as popular as Kolmogorov’s, even though following Cox’s trail can and usually does lead to Kolmogorov. Which is to say, to mathematics, i.e. numbers.

Numberless probability

Here’s our example of a numberless probability: B = “A few Martians wear hats” and A = “The Martian George wears a hat.” There is no unique Pr(A|B) because there is no unique map from “a few” to any number. The only way to generate a unique number is to modify B. Say B’ = “A few, where ‘a few’ means 10%, Martians wear hats.” Then Pr(A|B’) = 0.1. Or B” = “A few, where ‘a few’ means never more than one-half…” Then 0 < Pr(A|B”) < 0.5. It should be obvious that B is not B’ nor B” (if it isn’t, you’re in deep kimchi). More examples are had by changing “a few” to “some”, “most”, “a bunch”, “not so many” and on and on, none of which lead to a unique probability. This is all true even though, in each case, Pr(A|B) + Pr(not-A|B) = 1. (Why? Because that formula is a tautology.)

It turns out most probability isn’t quantifiable, because most judgments of uncertainty cannot be and are not stated numerically. “Scientific” propositions, many of which can be quantified, are very rare in human discourse. Consider this, from which you will see it easy to generate endless examples. B (spoken by Bill) = “I might go over to Bob’s” as the sole premise for A = “Bill will go to Bob’s”. Note very carefully that this is your premise, not Bill’s. It is your uncertainty in A given B that is of interest. The only way to come to a definite number is by adding to B; perhaps by your knowledge of Bill’s habits. But if you were a bystander and overheard the conversation, you wouldn’t know how to add to B, unless you did so by subtle hints of Bill’s dress, his mannerisms, and things like that. Anyway, all these change B, and make it into something which is not B. That’s cheating. If asked for Pr(A|B) one must provide Pr(A|B) and not Pr(A|B’) or anything else.

This seemingly trivial rule is astonishingly difficult to remember or to heed if one is convinced probability is numerical. It would never be violated when working through a syllogism, say, or calculating a mathematical proof, where blatant additions to specified evidence are rejected out of hand. A professor would never let a student change the problem so that the student can answer it. Not so with probabilities. People will change the problem to make it more amenable. “Subjective” Bayesians make a career out of it.

Why is the rule so hard? No sooner will you ask somebody what is Pr(A|B) and they’ll say, “Well there’s lot of factors to consider…” There are not. There is only one, and that is B’s logical relation to A. Anything else, however interesting, is not relevant. Unless one wants to change the problem and discover the plausible evidence B’ which gives A its most extreme probability (nearest to 0 or 1). The modifier “plausible” is needed, because it is always possible to create evidence which makes A true or false (e.g. B = “A is impossible”). The plausibility is to fit the evidence into a larger scheme of propositions. This is a large topic, skipped here, because it is incidental.

Lots of detail left out here, which you have to fill in. See the classic posts page for how.


Update 2 Fixed the d*&^%^*&& typo that one of my enemies placed in the equation below. Rats!

Update An algebraic analogy. “If y = 1 and x + y < 7, solve for x.” There isn’t provided enough information to derive a unique value for x. It thus would be absurd, and obviously so, to say, “Well, I feel most x are positive; I mean, if I were to bet. And I’ve seen a lot of them around 3, though I’ve come across a few 4s too. I’m going with 3.”

Precision is often denied us. As silly as this example is, we see its equivalent occur in probability all the time.

—————————————————————-

1See inter alia Dupré and Tipler, 2009. New Axioms for Rigorous Bayesian Probability Bayesians Analysis, 3, 599-606.

Categories: Philosophy, Statistics

32 replies »

  1. I can’t help but think that this is all semantics, i.e. a debate over the definition of probability. Whereas I would say that there is insufficient information to calculate a probability, you insist in calling it a non-quantifiable probability. A tempest in a teapot? A diet of numbers versus a Diet of Worms?

  2. This also proves that most numbers aren’t numbers, either.

    For example, consider “the number of martians who wear hats”. We’re told this is “a few”, but “a few” isn’t a number. Most expressions in everyday use are like this. and so, therefore, most numbers are not numbers.

    A question – can you always write P(A|U)? Where U is the universal event, the sample space itself, the set of all the things that can happen. What does this mean?

  3. Scotian,

    I think his point is that for some problems there will never be enough information to quantify the probability.

  4. NIV,

    The trick is not writing it in symbols, which can be a useful shorthand, but they can just as easily obfuscate.

    In other words, better define “the set of all things that can happen” which sounds complicated.

  5. Scotian,

    Nope.

    Here’s another: B = “Between half and two thirds of X are F” and A = “(a new) x is F”. Interval there.

    Besides, when you say “insufficient info” it’s like saying there’s always info which would lead to extreme answer (0 or 1). Why not? Why stop short at, say, 32.76%?

    But you see what I mean. The lure is too strong. You just will have to change the B into something it isn’t. The provided evidence—the only evidence you get for the stated problem at hand—isn’t good enough.

  6. Strange. I often find the symbols less ambiguous and more comprehensible than the words.

    I’m always a bit suspicious when people say they can’t explain things in symbols. It’s often a sign of fuzzy thinking. 🙂

  7. NIV,

    Ah, but the symbols are always symbolic of something. Trusting too deeply in symbols is what leads to the deadly sin of reification.

    And I still haven’t any idea what your U is.

  8. This is one of the rare occasions where you have misunderstood me, Briggs. It seems to me that you have changed the definition of probability as commonly used and then have complained that others are misusing the term. You could have just as easily said that the probability is known to lie between one half and two thirds.

  9. “Not all probability is quantifiable.”
    Try telling that to the EPA. They are masters at quantifying the unknown. They were able to determine that exposure to environmental tobacco smoke causes cancer by quantifying peoples answers to a questionare asking if they had been exposed to environmental tobacco smoke.

  10. Oh, I don’t *trust* in symbols – they’re only a tool. The point about symbols is that they only contain what you put into them. They don’t drag along a whole baggage of unexamined associations and assumptions with them. You write down the argument using the symbols to make sure nothing has sneaked through, and *only then* do you translate it back into words.

    If someone is claiming that their argument can’t be expressed in symbols, it usually means it relies on something unstated that they don’t want to talk about. At least when it comes to arguing about mathematics, I’d say.

    I suppose I’m still mildly surprised you don’t know what U is, especially as you implicitly used it in your argument above, but given that you claim not to know what P(A) means either, not very. Different people look at the world differently. As you said in an earlier post, one weeps for the difficulty of explaining things, but at the same time, it makes the world more interesting.

  11. I do not believe Briggs has changed the definition of probability. He said if you can’t know you cannot quantify. The problem is if the question is ill-posed it is worse to supply your own information.

    In risk assessment there is estimation by eliciting expert opinion. Ask three or more experts, average their answers, and you have the best available answer. Over confidence of experts and expert supply of information not in the question does not a risk assessment make. Worse is defining an expert. Often experts have little actual experience in the area of their alleged expertise. They are experts because they served on committees or managed a group that has done work in the area.

    As Briggs said,”I mean explaining your creature so that it is unambiguously comprehensible.” The onus is on the one asking the question. If the question cannot be answered as posed, then no quantitative answer is possible. There are too many short-cuts and tricks-of-the-trade that are being used in risk assessment (Is coffee good for us this week?) that answering ill-posed questions results in only in ill.

  12. Everyone keeps telling me what Briggs means. What a strange thing to do. As to the value of expert opinion I like this quote from Feynman:

    “The man who replaced me on the commission said, “That book was approved by sixty-five engineers at the Such-and-such Aircraft Company!”
    I didn’t doubt that the company had some pretty good engineers, but to take sixty-five engineers is to take a wide range of ability–and to necessarily include some pretty poor guys! It was once again the problem of averaging the length of the emperor’s nose, or the ratings on a book with nothing between the covers. It would have been far better to have the company decide who their better engineers were, and to have them look at the book. I couldn’t claim that I was smarter than sixty-five other guys–but the average of sixtyfive other guys, certainly!”

    Couldn’t have said it better myself.

  13. If we are quoting Feynman,

    “It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty–a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid–not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked–to make sure the other fellow can tell they have been eliminated.

    “Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can–if you know anything at all wrong, or possibly wrong–to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.

    “In summary, the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgement in one particular direction or another.”

    I believe that this quote comes close to what I understand Briggs as saying. While the actual quote concerns the end of an experiment, it applies to the beginning, also. A well-posed question, Pr(A|B) requires an explicit description of B. A kind of leaning over backwards.

    It is necessary to tell us why x is not equal to 6. Briggs, did you mean x + y + z = 7? There I go supplying information.

  14. Mr Briggs, you have me scratching your head with the “algebraic analogy”. What are the other possible values of x? I guess it’s been a while since Algebra I for me.

    By the way, have you ever read “Probability, Statistics, and Truth” by R. von Mises? (Not L. von Mises!) First edition was 1928, although I read a 1981 edition. He defines probability as the (limiting value of the) frequency of encountering a certain attribute in a certain collective/population, and rejects the definition of probability as a measurement of our own uncertainty about a thing. “[I]f we know nothing about a thing, we cannot say anything about its probability”. It is an argument like yours, against assigning made-up quantities in order to force unquantifiable probability to work with quantitative tools (i.e. statistics).

  15. “He defines probability as the (limiting value of the) frequency of encountering a certain attribute in a certain collective/population, and rejects the definition of probability as a measurement of our own uncertainty about a thing.”

    That’s the Frequentist definition of probability! Briggs will be offended!

    At the risk of joining the chorus of those saying ‘what Briggs means’, there’s a different perspective that probably won’t help. (I’ve explained this before, but nobody seems to have understood, so I don’t suppose this attempt will fare any better.)

    There are three separate domains in which we might define probability. There is:
    1) Actual objective reality.
    2) A mathematical model of reality.
    3) The mind of an observer of random events.

    We have no way of knowing how reality actually works. Is it deterministic? Is randomness real, or an illusion? We don’t know.

    All we can do is build mental models of reality. These are approximations, idealisations, neat and perfect toy universes where we can make up the rules and do the impossible (e.g. represent quantities as real numbers of infinite precision). This is the universe that obeys Kolmogorov’s axioms. This is the world of sigma algebras and measures and functions and sets. This is the place where Frequentist and Propensity models of probability are defined. And this is where the absolute probabilities P(A) etc. live.

    This mathematical model also contains conditional probabilities P(A|B) as well, that are related to P(A and B) and P(B) and so on by Bayes theorem. We call all of these ‘Bayesian probabilities’.

    The mathematical world also contains mathematical observers, who have no access to the true probabilities that the model is set up to use, but can only see the *outcomes* of random experiments. They might sometimes be able to argue for a particular absolute probability on symmetry grounds, but they have no way to determine the *actual* probability of anything experimentally.

    All they can do is to do experiments, and observe the outcomes. The information that provides is always incomplete, and subject to error. You can never determine P(A), you can only make some observations b of a random variable B, use your knowledge of physics to figure out P(B|A) for each possible A, and then calculate a quantity P(A|B=b) that is *probably* close to P(A) for a big enough amount of evidence b. Such a quantity is called a ‘belief’.

    Probabilities are what the mathematical model actually *does*, and *cannot* be determined by observers. Beliefs are what the observers in the mathematical model can possibly *know* about the probabilities, their best guess given their observations. The two are not the same kind of thing, although they mostly follow the same mathematical rules.

    ‘Bayesian belief’, the form of belief that follows the rules of Bayesian probability, is the only quantity observable to modeled observers. Bayesian probability, the actual absolute and conditional probabilities in the model, can never be observed. They can only be defined when setting up the model.

    And Bayesian belief is what Briggs is calling “probability”, and he is denying the existence/validity of what I have described above as Bayesian probability. It’s a sort of understandable position since the only things we can actually see/touch/calculate are the beliefs, but it results in a very peculiar ontology in which beliefs no longer model probabilities, but are left hanging, pointing at nothing, referring to nothing.

    And this too is what I think Scotian meant: – that Briggs has changed the definition of probability to that of belief, rejected the original definition of probability, and then complained that people are misusing the term.

    Briggs is quite right that the belief/probability distinction is confusing, widely confused, and that what statisticians are actually calculating from experiments are in most cases actually beliefs. Scotian is quite right that Briggs is changing the definition of probability from what most mathematicians understand by the term to his own private non-standard definition, and then moaning about how difficult that is to explain to people.

    And the point of Briggs’ present post was to say that experimental results and observations are often partial and ambiguous, and therefore the only sort of probabilities people can calculate (i.e. beliefs) are likewise partial and ambiguous. Unlike absolute probabilities, beliefs are often a bit fuzzy around the edges.

  16. Joe Clark,

    That’s because there was a typo placed there by my enemies while I was soaring through the air.

    Arrrgggggggh. Typos will be the death of me.

    It’s now fixed. OUR KNOWLEDGE of x is that it is less than, but not equal to 6, and possibly any value all the way down.

    Probability is logic, and like logic, it makes statements of our knowledge. If you think it’s real, mail me a bucket of it and I’ll send you as many dollars as real probabilities you send me.

    And now…jet lag. Be back tomorrow.

  17. FYI, I’m not saying that the von Mises book declares there to be no such thing as “probability as a measure of uncertainty”, rather, he’s insisting that the different concepts of probability not be called by the same name. Remember, his book is about the relationship between empirical/frequentist probability and statistics. What I like about it, and I think you would like too, is that he is making clear the distinction between the different conceptions of probability, and emphatically stating that readers should NOT take statistical methods intended for the first definition of probability and fudge the numbers to apply them to the second. It’s philosophically clear, and I liked that.

  18. Joe Clark,

    There are plenty of instances, like in the Martian example, where there just is no limiting relative frequency. And, anyway, to call the lrf “probability” is never any practical value, for, as Keynes long ago told us, in the long run (there the lrf lives) we shall all be dead.

    There’s a terrific paper listing 17 (I think; maybe it was 13) proofs showing lrf cannot be probability. The Martian-like example is one of them. It’s in the philosophical literature and not statistics, so it’s not as well know. Hayek? Or some name like that. I’m blanking on it, and far too lazy to look it up.

    Anyway, once I lay holt of it (as my dad would say) I’ll go over the points on the blog. Should be fun.

  19. NIV,

    I was close! That’s the guy. Although in the paper I agree with his conclusion “that conditional probability should be taken as the fundamental notion in probability theory” you’ll notice he never refers to Cox. Cox is really only known via Jaynes, and that means known early on in physics, and only latterly in probability.

    We should do some of Hájek’s papers here.

  20. NIV,

    Yep, one of them. Actually, he references the earlier, fuller one right in the beginning.

    Update Forgot to say thanks. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *