Yep, Probability Is Not As Simple As You Think

Couple of folks (Mike W, Dan Hughes) asked me to comment on the article “The concept of probability is not as simple as you think” by Nevin Climenhaga.

Three popular theories analyse probabilities as either frequencies, propensities or degrees of belief. Suppose I tell you that a coin has a 50 per cent probability of landing heads up. These theories, respectively, say that this is:

  • The frequency with which that coin lands heads;
  • The propensity, or tendency, that the coin’s physical characteristics give it to land heads;
  • How confident I am that it lands heads.

These are three of the biggies, it’s true. Climenhaga gives examples of the quirks and difficulties of these definitions.

Climenhaga’s first example starts with “Adam flips a fair coin that self-destructs after being tossed four times.” He then shows how confusion enters, but I don’t think he saw how he introduced some confusion with that “fair”, for it follows that saying a coin is “fair” is saying it has a probability of coming up heads 50% in the propensity definition; that is, “fair” implies propensity.

Yet he also takes it that since the coin came up three out of four times propensity implies the probability is 3/4. But if that were so if it came up zero times out of four, or even out of one, the probability would be 0.

There is a mathematical definition of frequency, which involves limits of subsequences embedded in infinite sequences which we needed not detail. The math is fine, but do such sequences or even subsequences exist in actuality rather potentially? No. That makes using a probability-as-frequency impossible because no unique sequences exist for anything (especially coin flips).

The other difficulty with frequency is this. Suppose we have four beings in a room, A-D, and all are the same species. A-C are Martians. What is D? Martian, too. Logic demands this and nobody disputes that. But there is no frequency because there are no Martians. Yet we can still do logic, and we can still do probability. Change the first premise to four beings, A-D, three Martians and one Venusian. Then what is probability D is a Martian? Simple: 3/4. But there is no frequency again, because there is no sequence, and there are no aliens.

Confidence it closer to the mark, but it has whiffs of subjectivism, which would mean the amount beans you had in your stomach changes the probability.

Climenhaga and I agree with what probability is: degree-of-support. Which is to say, logic. My own views and proofs and full arguments are in this award eligible book. Here’s Climenhaga:

Here, probabilities are understood as relations of evidential support between propositions. ‘The probability of X given Y’ is the degree to which Y supports the truth of X. When we speak of ‘the probability of X’ on its own, this is shorthand for the probability of X conditional on any background information we have. When Beth says that there is a 50 per cent probability that the coin landed heads, she means that this is the probability that it lands heads conditional on the information that it was tossed and some information about its construction (for example, it being symmetrical).

Yep. This also implies, as I believe, there is no such thing as “The probability of X” on its own; i.e. there is no unconditional probability.

Because they turn probabilities into different kinds of entities, our four theories offer divergent advice on how to figure out the values of probabilities. The first three interpretations (frequency, propensity and confidence) try to make probabilities things we can observe — through counting, experimentation or introspection. By contrast, degrees of support seem to be what philosophers call ‘abstract entities’ — neither in the world nor in our minds.

We disagree about that, for I say in our minds is exactly where probability is.

Suppose we’re on a jury. How are we supposed to figure out the probability that the defendant committed the murder, so as to see whether there can be reasonable doubt about his guilt?…

…Here we are concerned with the probability of a cause (the defendant committing the murder) given an effect (his fingerprints being on the murder weapon). Bayes’s theorem lets us calculate this as a function of three further probabilities: the prior probability of the cause, the probability of the effect given this cause, and the probability of the effect without this cause.

I’d state this differently. I say that if we knew the cause, then we don’t need the probability; which is to say, the probability is easy and extreme, either 0 or 1, depending on the proposition (“He did it”, “He’s innocent”).

And, of course, we don’t need Bayes per se. It’s a handy tool, but it’s not strictly necessary. If you want to go hard into this, you can read about presuming innocence and Bayesian priors.

Let me clarify: probability is a measure of our understanding of cause. Cause has four aspects: there is the formal cause, material, efficient, and final. The more we know of these the closer the probability is to 1. The less we know, the fuzzier the probability. As homework, do the Martian/Venusian example with respect to cause (hint: authors cause).

There’s a whole collection of articles on the subject here.

19 Thoughts

  1. You need the number and degree of probable causes.
    I get, “What’s the chance my car will make it…?”
    Good stuff. Sorry I’m not up to the math.

  2. “We disagree about that, for I say in our minds is exactly where probability is.”

    I’m not sure you disagree. Probability is in our minds in the sense that the probability we assign to proposition x depends on what we think is true (or, perhaps, what we just _suppose_ to be true for the sake of argument). But the degree to which some propositions support others is not a matter of what’s in my mind, and that’s Climenhaga’s point. “Degree of support” is an abstract concept, similar to logic; it’s an objective-but-non-physical relation between propositions. That sounds similar to your own view, but I might be wrong.

  3. Hello Dr. Briggs,

    I tried to post a comment several times and got error messages because your blog thought it was spam. Then I tried to E-mail you at the address listed in the error message and got a reply saying the message could not be delivered.

    Perhaps you can fish the comment out.

    Thanks,
    Nevin

  4. “if we knew the cause, then we don’t need the probability”

    I wonder. Consider coincidences. As Maritain explains, the coincidences are events that lie
    that intersection of multiple causal chains. The coincidence itself does not have a cause.
    Not everything has a cause. Only things that possess a certain unity have causes.
    But coincidences being coincidences don’t have causes.
    So, the assumption that Briggs often makes that randomness is mere ignorance of causes is not adequate. Something can be known to be random only when we know the true causal chains.
    Only then we can judge that the event is a coincidence.
    An example is provided by Maritain. A man feels thirsty and finding no water at home, goes out to a well and is killed ed by a gang to robbers lying in ambush there.
    There are multiple causal chains that happen to intersect at a point but the point itself has no cause.

  5. Where would you get your propensities and degrees of confidence from except from
    observation of frequencies?

  6. Mactoul,

    Deduction. E.g., a six-sided object would have a probability of 1/6 for any side appearing on the next roll assuming all you knew was that it was six-sided and can be rolled.

  7. This is posted on behalf of Nevin Climenhaga. For no good reason I can discover, the spam filter eats his comments.

    Thanks for this post. A few thoughts:

    (1) You’re right that the concept of a “fair” coin brings with it a lot of baggage. Like you, I would see it as communicating information about the propensity of the coin to land heads being equal to the propensity of the coin to land tails, relative to the kind of information we usually have about the set-up of a coin toss (roughly, some basic macro-level facts about the environment in which it is tossed, but not micro-level facts which would allow us to deduce the outcome). You seemed to have some worries about this relating to my saying that P(first toss landed heads | 3 of the 4 tosses landed heads) = 3/4, but I didn’t follow your discussion on that point.

    (2) Here’s a thought experiment to suggest that probabilities cannot be in our minds. Imagine that at the beginning of the universe, God created two urns, one of which he would draw a ball out of. The color of the ball would determine whether or not God would create conscious minds. God flipped a coin to choose between the urns. If the coin landed heads, he drew a ball out of the urn with 2 black balls and 1 white ball (U1). If it landed tails, he drew a ball out of the urn with 1 black ball and 2 white balls (U2). If God then drew a white ball, he set things up for conscious minds to evolve. If he instead drew a black ball, he set things up so that conscious minds wouldn’t evolve. Call this information K, and call the proposition that conscious minds exist Minds.

    We can consider the following probability: P(U1|~Minds&K). Bayes’ Theorem lets us determine that this probability is 2/3 (proof omitted). But how can this probability exist only in our minds? The “premises” of the probability are incompatible with the existence of minds. (If you think God is a conscious mind in the relevant sense, alter the background information K accordingly.) So they are not something anyone could ever have as evidence. This probability would remain 2/3 even if minds had never existed. So it looks to me like it exists independently of us.

    (3) Other than that, our views on probability are quite close, as you say. You might be interested in my views on when Bayes’ Theorem should be applied to help us determine the values of probabilities. Like you, I think the order of learning is irrelevant and that Bayes’ Theorem has nothing to do with “updating.” But I think that we can codify the situations under which Bayes’ Theorem helps us break down some probabilities into more basic probabilities, and when other theorems will serve us better. Roughly, it has to do with the explanatory relations between our propositions — if H explanatorily prior to E (e.g., H is a possible cause of E), we should apply Bayes’ Theorem. The full story is in my Structure of Epistemic Probabilities manuscript (currently under review at a journal) available here: https://sites.google.com/site/nevinclimenhaga/research.

  8. Like you, I view probability as a form of logic (I’d love to get your reaction to my use of this concept to resolve the Carter Catastrophe https://unobtainabol.wordpress.com/2018/09/14/the-probability-of-the-unknown/) but I’d like to suggest that such probabilities can’t be in the mind because abstract objects are timeless while mental objects exist in time (and physical objects exist in both time and space).

    Consider that you and I are each thinking about the same object, say the Empire State Building. There are some theories of metaphysics that would say that the object of our thoughts is entirely in our mind (because how can we bring a physical thing into the mind to think about it), but the problem with this theory is that when we talk about the Empire State Building, we are thinking about the same building. If we were only thinking about something in our minds, we would each be thinking about a different thing, our own mental state. The situation is similar for abstract objects: they must exist separately from minds or we could never both be thinking about the same object.

    You could try to come up with some relationship that holds between the object in your mind and the object in my mind that would justify talk of both thinking about the same thing even though they were different, but in this case, are you talking about the relationship in your mind or the relationship in my mind? At some point, you need an abstract, objective entity that we can both talk about.

    Oh, and a note about the propensity interpretation: the propensity interpretation tries to turn probabilities into physical properties of the sort that can figure into laws of nature, and the very idea of laws of nature fails for similar reasons that the propensity interpretation fails: that it doesn’t actually describe what happens. Nancy Cartwright (the philosopher, not the voice actress) wrote extensively about this if you are interested.

  9. Using Climenhaga’s words, one of the underlying themes of this blog appears to be “frequencies and probabilities are (not) the same thing”. I admit that after following along here for several years, and even having a go at “Uncertainty”, I just don’t get. I still can’t separate the two in my mind.

    Climenhaga’s example of the self-destructing coin doesn’t help me. I think of this problem by assigning scores to heads and tails, and consider the mean of those scores. With only four coin flips, the estimate of the mean is going to have a large variance, large enough that I would say that the mean is indeterminate for any practical purpose. Applying Alder’s Razor, unless I can find more self-destructing coins to experiment with, discussing probability is pointless.

    Climenhaga’s jury example doesn’t help me either. How do you apply probability to a single event? If someone offered me triple-or-nothing on a coin flip, I would take the bet all day long. But if the stakes were raised to my retirement savings, I would decline – probability is no longer a relevant consideration, since I can only lose the bet once. If I could determine when the game ended, and I could write an IOU when I lose, then I’m back in the game again.

    My mind just can’t seem to make any sense of probability divorced from frequency. What are the odds of that?

  10. Milton, suppose Monte Hall shows you three doors and tells you that behind one of them is a great prize. You can pick a door, and if you pick the one with the prize, you get the prize. This only happens once, so there is no frequency associated with it, but still, it is a fact that your probability of picking the right door is 1/3.

    This is a physically meaningful statement, not an arbitrary assignment of a number. To see why, consider that the chance that you picked the wrong door is 2/3. You pick door number 1, and then Monte says, “Wait, before we open door number 1, let me offer you another option. Instead of taking door number 1, I’ll give you the choice of taking both of the other doors. If the prize is behind either one of them, you can have it, but if it is behind door number 1, you can’t have it. Assuming Monte doesn’t cheat and was always going to offer you this alternative deal, what should you do? Clearly, the logical choice is to choose the other two doors because it doubles your chance of getting a prize.

    This is the sort of problem that you can solve with logical probability but not with frequentist probability, because it is a one-off event. There is no frequency. However, here is why the frequentist explanation is so appealing: if I try to justify to you that this choice is correct, the only way I have to do so is to appeal to potential frequency or virtual frequency. Obviously, I can’t state that you will win the prize if you follow my strategy, so what do I mean exactly by saying that it is better to choose the two doors? Well, what I mean is that if we were to carry out this experiment many times, then you would be better off choosing the two doors.

    So arguably, the frequentist theory of probability is a sort of model underlying logical probability, but it is potential frequency, not physical frequency.

    (Note that in the Monte Hall Problem, Monte distracts you by showing you what is behind one of the other two doors before offering you the alternative deal–and he always shows you a door with nothing behind it–but that’s just to confuse you. What is happening in fact is that he is offering you the two doors instead of the one door).

  11. David – that example helps. I like your term “potential” frequency. But I am still stuck mentally, unable to absorb the concept of probability divorced from frequency or potential frequency.

    I am an engineer – we think in terms of failure rate (when we ship a number of copies of a product) or MTBF (when we create a very few copies of something, such as a special or a test station). I am weak on the concepts behind MTBF calculations – maybe the answer lies there.

  12. Milton,

    Try assessing a problem that a Freqentist can’t answer. Like: what is the probability the Lakers will win their next game. Since no two games are identical, the is no frequency to consider. The answer is purely a level of confidence.

    Yes, it’s possible to look for assumed similarities to previous games but that’s changing the problem.

  13. DAV – I’ll have to ponder on that example. I want to ask myself “why would I care?”, and my answers to that question lead me back to frequency.

    This brings to mind the old Star Trek series, where every other episode Spock says something along the lines of “Indeed, Captain, I estimate our chances of success at one in (a large but oddly precise number).” I guess probability works differently in space; it must be some sort of relativistic effect.

  14. “I admit that after following along here for several years, and even having a go at “Uncertainty”, I just don’t get. I still can’t separate the two in my mind.”

    I would say it’s because you see the reality and not the mathematics.

    The reason is that more refined probability is only slightly better. Guilty of the same basic error.

    The future is unknowable. Proving that one gives less uncertainty than the other is Just an exercise in forgetting.

    To a lay person, (me) tossing a coin over and over again to show that the chances are 1/2 are not redos of time and space. They are simply the experience of repeated coin tosses with a small amount of outcomes to narrow and simplify the problem which remains unsolved.

    You or I or anyone has no certainty, just an idea whether the coin will be heads or tails (given the fair coin proviso).
    People are fooling themselves. Not just the frequentISTS.

    One of the biggest wind ups is straining a very tiny brain to understand why something that’s patently wrong is so. To be told it’s wrong by a fat brain and then told that fat brains are needed to tell tiny brains what they already knew! Fat brains are better at sophistry.

    No brains were hurt in the writing of this comment. Even if you’ve got a fat brain.

  15. Milton,
    I want to ask myself “why would I care?”, and my answers to that question lead me back to frequency.

    Sorry but I’m not going to hunt for a problem in which you care. Also not sure why caring leads back to frequency.

    In the Lakers example, suppose you somehow and magically conclude they have a 40% chance of winning their mect game. That certainly doesn’t have any frequency since the nect game along with its circumstances (players, crowd size, whatever) will occur only once. There certaimly won’t be a infinite succession of identical games which a frequetist outlook requires.

    Say the reason for evaluating the outcome is to place a bet. You can’t say you will win that bet 40% of the time (what is 40% of one?). But you could say you are certain enough in the outcome to bet 40% of your money.

    Another example, what is the probability that a manned missuon to Mars will be attepmted next year? Suppose you conclude it is X%. Whatever would that mean in terms of frequency?

    It’s best to think of probability as a measure of certainty instead of frequency.

  16. There are lots of similar examples.

    What is the probability Trump will win the 2020 presidential election?
    What is the probability Beto O’Rourke will?
    What is the probability X will?
    How would one express those as frequencies?
    There will be only one 2020 presidential election and it will never be repeated.

    If probability is a level of certainty then one can say Trump will win with probability of W% which would be a statement of certainty in outcome. Certainty in outcome always make sense while frequency of outcome not so much.

  17. Indeed, Captain, I estimate our chances of success at one in [N].

    That’s just expressing probability as odds. While you might want to think of odds as a ratio of outcome counts however it could just as easily be a statement of fair value meaning: if you bet $1 you should expect to receive $N for a fair bet (or more precisely $N+1 since you shouldn’t lose your $1 bet).

Leave a Reply

Your email address will not be published. Required fields are marked *