Statistics

Chain Of Argument. Ithaca Teaching Journal, Day 3

One thing can be proven with near certainty: do not stay up late the night before you must write an article on a subtle philosophical topic. I cannot prove that such a course will always be deleterious, but I can say with certainty that it sometimes is.

The truth of that sentence (where I moved from near certainty to certainty itself), like all sentences, follows a chain of argument. The conclusion is that a dissipated night will sometimes lead to an inability to write coherently. That conclusion is certain, but only with respect to the evidence that a particular observation proved that such a night resulted in at least one morning in which a blog author’s writing was strained.

All statements of knowledge, hence of probability, are conditional. That is, no statement is true without providing premises which support that statement. No statement (or event) has a probability without conditioning that statement on some evidence.

Thus, the truth of “Socrates is mortal” depends on what evidence supports or does not support that conclusion. If my evidence is that “All men are mortal and Socrates is a man,” then the statement is true. But if my evidence is “Some men are mortal and Socrates is a man,” the statements is no longer true but only probable.

It does not matter whether the evidence itself is true—and it can only be true with respect to other premises or evidence—it only matters that we accept or assume the evidence is true. This makes the fun with counterfactuals possible.

A counterfactual is a statement which is known (or assumed) to be false conditional on certain evidence. The counterfactual statement itself is assumed true as evidence in another chain of argument. For example, I might argue that “If Hitler did not invade Russia, Germany would have won World War II.” The counterfactual statement “Hitler did not invade Russia” is known to be false conditional on the premise that “Hitler invaded Russia.”

That last statement itself is true conditional on other evidence: books, memories, etc. And we accept the evidence (we assume this evidence true) of those books and memories based on yet other premises or evidence, which in turn are true based on earlier premises, leading in a chain all the way back to our most fundamental beliefs which are true conditional only on our intuitions (see the Day 1 post for more on this).

Thus, if I ask you what the probability a die will land a six, you cannot answer unless you first assign or assume true certain evidence. This is usually “I have a six-sided object, just one side of which is labeled six, which will be tossed, only one side showing when tossed.” Given that, the conclusion the “object will land with a six showing” is still not true, but only probably true, a probability we can quantify with the number 1/6.

You will often see probability treated as if it were unconditional, as if certain events had a fixed numerical value without respect to any evidence. This cannot be. For example, if I change a portion of my evidence to “just two sides of which are labeled six” the probability the conclusion in true changes to 2/6. Change the evidence and you change the probability. Just as when we changed the evidence of the mortality of men the truth of Socrates was mortal changed.

Nothing could be more obvious in logic—in logic we never ask whether an isolated statement is true or false without explicitly detailing or premises—but conditionality is often surprising in probably, perhaps because it is often thought that probability is a separate subject from logic. It is not. Probable statements are no different than logical statements, except that the former have quantifications (of being true) different than 1 or 0.

In advanced books on probability there will usually come a section called “Conditional Probability” which implies that all the probabilities that came before are “unconditional.” Again, this is not so, but there are certain quirky cases that require sophisticated mathematical techniques that are called, in a technical sense, “conditional probability.” These quirks are never met in real life, just in the arcane mathematical world of uncountable infinities.

But even some basic books treat probability as unconditional to some extent. As another proof that this cannot be, consider the probability of this sentence (or others like it), “A certain device takes a given number of states.” What is the probability that “This device takes state UUON”? You cannot give an explicit quantitative answer without first conditioning on more premises which relate to the states the device takes.

This is why it is better to write probability like this:

Pr( S | E )

which reads, “The Probability that S is true given the evidence (or premises) E” and where E might not be true with respect to other premises, but is assumed true for this calculation.

The bar, “|” just means given; it is often left out in elementary texts, but doing so leads to confusion and error.

Categories: Statistics

16 replies »

  1. Probability is made easier for the learner if the teacher is precise in his English. So rather than “If Hitler did not invade Russia..” I’d say “If Hitler had not invaded Russia….”.

  2. It is good to draw attention to the importance of evidence but I would be interested to know what you think of the comment by Ken in a previous post that probability concerns the future and statistics the past. I think he has a good point. For example, while Socrates was alive, the conclusion that Socrates is mortal because he is a man and all men are mortal was logically correct but it was a prediction, and its truth was dependent, as you say, on the truth of the premises i.e. on the evidence available. In contrast, after his death, the status of Socrates’ mortality changed from being a probability to being a statistic – i.e. Socrates was mortal because he died – a statement which is dependent on other evidence quite independent of the original premises, and no logic is needed. So does it earlly make sense to talk of the probability of past events? – either they happened or they did not.

  3. The probability that “what goes up must come down” is true was 1.0 for thousands of years based on evidence. Then we launched the Voyager 1 satellite. Based on the new evidence the probability that “what goes up must come down” is slightly (very slightly) less than one.

    How would you characterize changes in the evidence based probability that “the sun rotates around the earth” over time? Was it ever 1.0?

  4. StephenPickering:

    I think that Briggs would argue that your assumptions have simply changed from:

    P( S. is Mortal | S. is a man, Men are mortal ) = 1

    to…

    P( S. is Mortal | S. died ) = 1

    Obviously, you could include the original premises, but they’re not really necessary once he went and did his mortal deed. It’s still a probability, even though the value is trivially 1.

  5. Matt:

    The probabilities could indeed be written as you suggest, but I think this would be stretching the meaning of probability a bit too far. As you say, its value is trivially =1. Briggs said, and I agree, that probability is a branch of logic. However, the restatement based on the historical evidence of Socrates’ mortality does not involve any logical argumentation, and so I would not think of it as a probability of the kind used within logic.

  6. Statistics involves collecting, summarizing, and making inferences from ascertained data. (One can find this definition in almost all introductory books.)

    It’s correct that either something happened or it didn’t. However, when making statistical inferences/generalizations/predictions about the population of interest, the future, or some unknown properties based on the ascertained evidence, one would rely on the probability and statistical modeling!

  7. Speed,

    The premise “What goes up must go down” is hopelessly vague. What is going up? What is up? What is down? If you say that your premise is
    “All objects above the surface of the earth at a specified distance and in air at a specified density and with no wind and with no obstructions between the object and surface fall to the surface. An apple in Central Park is an object which meets these criteria” then the conclusion “The apples falls to the surface” has probability 1. This is so even if we discover wormholes, roving black holes, and on an on.

    Your duty is to always list specific premises, not vague ones which can be multiply interpreted. What you have done is changed the premises midstream and demanded the probabilities remained constant (which they won’t except coincidentally).

    Matt/Stephen,

    I agree with Matt’s second probability.

  8. P( S. is Mortal | S. died ) = 1

    This is simply the definition of mortal.

    P(Briggs will die | Professor Briggs is a Man) = 1

    Is only slighly more interesting, and rests on the premise that all men are mortal.

    P(there exists imoratal men) * P(Briggs is a member of the imortals| imortals exist) < e

    Where e is arbirtrially small.

  9. A probability is a ratio, and the correct choice of the denominator is not always an easy matter. I understood Speed’s comment to be a question about what evidence is appropriate for use as the denominator. This is a very good question – observations of apples in Central Park are of no relevance in predicting the performance of Voyager 1, for example.

    Similarly, what body of evidence should we be prepared to accept in support of the premise ‘all men are mortal’? The complete public records of births and deaths in the US would certainly yield a large amount of data, but is it relevant to the case of Socrates who lived 2500 years ago in Greece? In fact, if we are willing to entertain the possibility that some (very few) men might be immortal, then the most accurate body of information relevant to the case of Socrates is that comprising just Socrates himself. What has the mortality of other people to do with the mortality of Socrates? It would be begging the question to answer that all men are equal concerning their mortality, and moreover if that were the case then it would suffice to observe just 1 case of mortality (e.g. Socrates) to conclude that: Socrates is a man, S is mortal, therefore all men are mortal.

    All knowledge is conditional, but above all it is conditional on the appropriate selection of information for use as the reference class. Speed had a good point and I think Briggs has not addressed it properly.

  10. Stephen,

    Probabilities are only sometimes ratios, but they are not always so.

    The premise “All men are mortal” need not be true, but only assumed true. For example, “Just half of all Martians are bald. Joe is a Martian.” The probability of “Joe is bald” given these—and no other—premises is 0.5. Which is also not a fraction: you have no denominator, and there are no Martians. But I can only say “there are no Martians” because I accept other premises which lead to this belief.

    We do not need to prove the truth of the premises when we accept them as true.

  11. Briggs:

    I would have thought that “half” as in “half of all Martians are bald” is a fraction i.e. 1/2 of them are bald. You say 0.5 is not a fraction, but I don’t see how you can obtain such a decimal probability value without dividing a numerator by a denominator i.e. 1/2. I would maintain that a denominator is necessary to calculate the (decimal) probability from the information provided, and therefore that it is not correct to say that there is no denominator. The lack of Martians has no bearing on the matter – I suspect you are being disingenuous.

  12. StephenP,

    Here is what I think Mr. Briggs is saying.

    The question is whether probabilities of conditionals are conditional probabilities. Based on the familiar calculus of probability, the definition of a conditional probability involves a numerator and a denominator, and is the ratio of the joint probability over the marginal probability.

    For example, let premise= “Just half of all Martians are bald. Joe is a Martian,” then
    P(Joe is bald | premise) = P(Joe is bald and premise) / P(premise).

    However, the definition is only well-defined provided that the P(premise) is not zero.

    If you agree that P(premise) = 0, (there are no Martians since we don’t live in the Total-Recall world), then the ratio definition can’t be applied to derive the probability P(Joe is bald | premise). Hence, not a ratio.

    I could be wrong here since I am not Mr. Briggs.

  13. StephenPickering:

    I suspect Briggs’ rejection of the use of “fraction” or “ratio” was a knee-jerk anti-frequentist remark rather than a computational statement. Even Baysians use division, IIRC. 🙂

  14. Probability is a ratio?
    Probability is sometimes a ratio?
    There are 48 men and 52 women.
    The ratio of men to women is 0.92.
    The probability that a selected person in the group is a man is 0.48.

  15. “A probability is a ratio, and the correct choice of the denominator is not always an easy matter.”

    The intervall [0,1] contains irrational numbers. Given the continuum a probability must not be a ratio. At least from a theoretical standpoint.

Leave a Reply

Your email address will not be published. Required fields are marked *