William M. Briggs

Statistician to the Stars!

Page 394 of 560

Lesson Five or Six: Abnormality

Say! What happened to lessons three through four or five? Who knows. This morning, I’m dreadfully rushed, so just a sketch. I do not expect anybody to be convinced this fine day.

Where were we?

Suppose I’m interested in the ages (in whole years) of my blog readers. Now, except for about three or four exceptions, I don’t know these ages, do I? Which means I’m uncertain, and thus I’ll use some kind of probability model to quantify my uncertainty in these numbers.

In some cases, I can supply premises (evidence, information) that allow me to deduce the probability model that represents my uncertainty in some observable. This applies to most casino games of chance.

But most times I cannot find such evidence. That is, there do not exist plausible premises that allow me to say that a certain probability model is the probability model that should be used. What to do? Why, just assume for the sake of argument that I do know which probability model that should be used! Problem solved.

Most times, for anything that resembles a number (like ages), a normal distribution is used. This is usually done through laziness, custom, or because other choices are unknown. Before I can describe just what assuming a probability model does, we should understand what a normal distribution is.

It is the bell-shaped curve you’ve heard of, and it gives the probability of every number. And every number is just that: every number. How many are every? Well, from all the way out to negative infinity, progressing through zero, and shooting off towards positive infinity. And in between these infinities, are infinite other numbers. Why, even between the interval 0 and 1 there are an infinite number of numbers.

Because of this quirk of mathematics, when using the the normal to quantify probability, the probability of any number is zero in all problems (not just ages). That is, given we accept a normal distribution, the probability of seeing an age of (say) 40 is precisely zero. The probability of seeing 41 is zero, as it the probability of seeing 42, 43, and so on.

As said, this isn’t just for ages: the probability of any number anywhere in any situation is zero when using normals. But even though the probability of anything happening is zero, we can still (bizarrely) calculate the probability of intervals of numbers. For example, we can say that, given a normal, the chance of seeing ages between 40 and 45 is some percent; even though each of the numbers in that interval can’t happen.

Somewhat abnormal, no? It’s still worse because, as said, normals give probability to every interval, including the interval from negative infinity to zero. Which in our case translates to a definite probability of ages less than 0. It also means that we have positive probability to ages greater than, say, 130. An example later will make this all clearer.

The main point: the normal stinks as a vehicle to describe uncertainty. So why is it used? Because mathematicians love mathematics, and because of a misunderstanding of what statisticians call the central limit theorem. That theorem says that, for any set of numbers, their averages converge to a normal distribution as the sample size grows to infinity.

This theorem is correct; it’s mathematics precise and true. But not all mathematical constructions have any real-life applicability. Anyway, the central limit theorem is a theorem about averages, not actual observations.

Plus we have the problem that we’re not interested in averages of the ages, but of the ages themselves. Another problem: I don’t (sad to say) have infinite numbers of readers.

Yet it is inescapable that normal distributions are used all the time everywhere and that it is sad that they can sometimes give reasonable approximations. Both statements are true. They are ubiquitous (I almost wrote iniquitous). And they can give reasonable approximations. It’s just that they often do not.

We have to understand what is meant by “approximation”. This is tricky; almost as tricky as viewing probability as logic for the first time.

Now, based on my knowledge that ages are in whole years, and that nobody can be less than 0, and that nobody can be of Methuselahian age, the probability that any pronouncement I make using a normal distribution about ages is true is exactly 0; which is to say, it is false. This means I know with certainty that I will be talking gibberish when I use a normal.

Unless I add a premise which goes something like, “All pronouncements will be roughly correct; but none will be exactly correct.” And what does that imply? Well, we shall see.

(Fisher, incidentally, knew of the problems of normals and warned users to be cautious. But like his warning about over-reliance on p-values, the warning was quickly forgotten.)


Lesson Two Again and Again: Logic is Not Opinion

Everybody: thanks for the emails. I do see them. I’ll be answering all on the weekend.

Would it surprise you to learn that to graduate with a degree in statistics—BS, MS, or PhD—you are nowhere required to take courses in epistemology, logic, mathematical logic, and so on? Did you know that this is also true for graduates of physics, chemistry, biology, and for most other sciences?

Oh, there are minor exceptions, but usually in the form of ethics classes, and only required of those who deal in subjects like medicine. Too, some voluntarily take a class or two in these areas. But the philosophy the majority of scientists assimilate is by osmosis; and that consisting of generally misunderstood or misremembered fragments of Popperian falsifiability, the ability to identify (usually) common fallacies, and recognizing that bit about Socrates.

Which is this:

      (P1) All men are mortal
      (P2) Socrates is a man
      (C) Socrates is mortal

Yesterday, we said that given (P1) and (P2), the probability of (C) is 1. The conclusion is only true with respect to the information provided in the premises. Thus, logic is, as was suggested in comments, a measure of information.

We are not free to dispense with either of the given premises, nor can we implicitly use others that are not given. Think of the argument like a mathematical theorem (which, in a real sense, it is): we would never dispense with or add to the conditions of a theorem, and we cannot do so here. Logic is, of course, the study of why arguments like this are valid; and why arguments like “Given (P2) alone, (C) is true” are valid.

We can put this old argument in a more abstract form, which (somewhat against intuition) might help.

      (P1) All F are G
      (P2) X is an F
      (C) X is G

Looks a lot more like math now, doesn’t it? (C) follows in a now familiar, analytical sense from (P1) and (P2). Further, we can see that we cannot dispense with either premise. And we are free to substitute other F, G, and X! It turns out (this is not of main interest here) that we have to put restrictions on what’s allowable for F, G, and X—but this kind of thing is also true in mathematics, where we often start theorems restricting the universe to certain kinds of numbers.

Stating the argument in this form allows us to remove distractions: it is drier and less emotional.

But let’s look at that argument a little more closely; specifically, examine (P1), which is “All men are mortal”. Is that true? Well, nothing can be true or false without reference to something. Given our experience, (P1) has been true so far. But will it always be true? Saying so is to make an inductive argument; and based on our knowledge of logic, we know that all inductive arguments give us conclusions that are not certain; which is, that they do not have conclusions with probabilities of 1 or 0, but somewhere in between.

(P1) is contingent: it might be that sometime in the future, the universe might be such that (P1) is false. But regardless whether (P1) is true, false, or something in between, accepting it is true (“for the sake of argument”) our conclusion (C) follows. Get it? It doesn’t matter whether (P1) is true or false or merely probable with respect to other information: all that matters is that we accept is as true in our argument.

Now, if we change (P1) to align with our experience, then we have

      (P1) So far, all men have been mortal
      (P2) Socrates is a man
      (C) Socrates is mortal.

The conclusion no longer follows deductively; it is now an inductive argument, and the conclusion is only probably true. This also happens if we change our “analytic” form:

      (P1) So far, all F have been G
      (P2) X is an F
      (C) X is G.

Once more, (C) does not follow deductively; it is only probably true.

Logic is a measure of information, a special case where probabilities are 0 or 1. Probability, then, is also a measure of information. We can modify our argument again:

      (P1) Most F are G
      (P2) X is an F
      (C) X is G

We can now see that the probability (C) is true given (P1) and (P2) is not 0, nor is it 1; it is somewhere in between. Further, because of our knowledge of the word “most”, we can say that the probability (C) is true given (P1) and (P2) is greater than 0.5 but less than 1.

As before, we can’t let every conceivable X, F, or G, into the picture; but the schema is still useful for its illustrative purposes.

Quick update: Try substituting for X, F, and G, “Bob”, “Winged Horses”, and “Clever”.


Lesson Two Redux: More Mysticism

Is it written into sport announcers’ contract that they shall speak in nothing but cliché?

Since there is always great confusion about why premises about “fairness” or “randomness” are not needed, we had better cover it in a main post.


    (P1) I have a six-sided object, just one side of which is labeled ‘6’.
    (P2) Upon tossing, only one side will show
    (C) A ‘6’ will show.

The conclusion is not certain with respect to the premises. But we can say

    Pr( C | P1 & P2 ) = 1/6.

Many would like to add this premise:

    (P3) The die is “fair.”

But that is equivalent to

    (P3) Each side of the die is equally likely to show,

which is the same as saying

    (P3′) The probability of a ‘6’ is 1/6.

I can write

    Pr( A ‘6’ will show | P1 & P2 & The probability of a ‘6’ is 1/6) = 1/6.

and since I do not now need (P1) or (P2), I can write

    Pr( A ‘6’ will show | The probability of a ‘6’ is 1/6) = 1/6.

Or in plain English, “The probability a ‘6’ will show given the probability a ‘6’ will show is 1/6 is 1/6.”

This is a fine argument. It is valid. That means the conclusion has been deduced with certainty from the premises. Arguments which have conclusions that are deduced are the strongest there is, and isn’t that glorious.

But the argument is circular: its premises contain the conclusion, the thing we wanted to prove. Aristotle showed us that this argument is the same as saying, “A, therefore A,” where A is any statement.

So give up on (P3), because it is not needed. Instead, think carefully about (P1) and (P2); think hard about what they do not tell us. What words are there in (P1) that tell us that the sides are symmetric, that they are weighted equally, that they are of the same substance, that they have the same friction, or the same of any other probative factor?

Not one word. Too, there is not a shred of evidence that any of these things—and of an infinite number of other things—are such that we should consider them.

There isn’t any word in (P2) about the gravitational field in which the six-sided object—we do not know it is square, nor that it is a die—will be tossed, nor about the amount of spin or other force imparted to the object, nor about viscosity of the air in which it will be tossed, nor do we know that there will even be air! We do not know about the elasticity of the surface upon which the object will land, nor about that surface’s roughness. Nor do we know about an infinite number of other things, each of which could influence the object showing a ‘6’.

If there are 12 Johnstons and 18 Freihaufs in a room and you will select the one closest to the door, then, given this evidence and no other, there is a 60% chance that you will select a Freihauf. Just as above, we do not need to add any evidence that the people in the room are, perhaps by polkaing, distributed “randomly.”

Given only the evidence we have—and no other tacked on imaginatively —we have no idea where anybody in the room is. It is the lack of information that makes the outcome “random”, which is to say, unknown.

The problem we have with these types of arguments is that we cannot help but add premises: the stated ones are never enough; we are always greedy for more. Adding (unstated) premises is like a mathematician changing axioms mid-proof to suit him so that he gets the result he desires. This behavior might be because the above arguments sound like situations in which we have a lot of experience. But here is another example, adapted from Stove, which shows that we should consider arguments as given.

    (P1) Just half of all winged horses are yellow.
    (P2) Bob is a winged horses.
    (C) Bob is yellow.

Here, everybody always agrees that the probability that “Bob is yellow” given just (P1) and (P2) is 50%. We don’t feel the need to talk about “opaque bags” of winged horses, with half of an infinite amount of them yellow, and with the other half some other color or colors. We don’t have the gut feeling that if we were to know that that probability is 50%, that we need to repeat a “random” experiment with winged horses an infinite number of times.

This, obviously, is because there are no winged horses that we know of, yellow or not. Thus, we are able to tackle the argument in its intended sense.

“OK, Briggs. Maybe. But what about real dice?” Well, as to that, stick around.


Lesson 2: Evidence and the Mysticism of Randomness

We made rather merry at the Chapter House last night, an annual tradition. Class begins in two hours. I can still hear the vuvus. Sheesh.

Statements about the unknown made with reference to the known are common enough. The example David Hume gave us is, since all the many flames I have observed before have been hot, I expect the next flame will be hot.

That kind of inference is inductive. The outcome itself—the future flame being hot—is not guaranteed by the universe to be true. By which I mean, there are no set of true premises known to us that allow us to deduce the statement “the next flame will be hot.” It could be, for example, that there exists some mystery physics which will come into play with the next flame, causing it not to be hot.

A deductive inference known to everybody (from yesterday) starts with the premises, “All men are mortal, and Socrates is a man.” We supply the conclusion of interest. If we supply, “Socrates is mortal”, then our intuitions and the given premises tell us the probability of the conclusion is one.

But if we supplied “Socrates the immortal man has the flu” then given just our stated premises we cannot say anything about the probability of this conclusion, because, based on our knowledge of common English words and names, the conclusion is not related to the premises.

In this situation, it is easy to see that the conclusion and premises are unrelated; but of course it isn’t always so easy to gauge the interrelatedness of premises and conclusions. Just think of politics to know what I mean.

Supplying reasonable premises—which are nothing but data, models, and judgments by another name—and relevant conclusions is the basis of all probability modeling. Two people with different premises (models, data) but the same conclusion can come to different probabilities of the truth of that conclusion.

Both of those probabilities are correct, given the premises. It is always the premises—the evidence—that are in dispute.

The important thing to take away is that the truth, falsity, or probability of any statement cannot be made without reference to something. We need some evidence, premises, information upon which to condition any statement.

The probability of a die showing a ‘6’ is not “1/6″l; but the probability of a die showing a ‘6’ given that the die is a six-sided object with just one side labeled ‘6’ and that only one side will show, is 1/6.

This probability, given these premises, has been deduced. If you want to bring up questions about this or that real, physical die, well, then you are adding premises, different evidence. If all—which means everything—you know is that you have a real, physical die with six sides, just one of which is painted with six dots, then the probability of the conclusion is 1 in 6.

If you’re interested whether this next toss will result in a six, well, that’s the same thing. If you’re interested in how many times a six will show in the next dozen tosses—its relative frequency—then that is easy to compute, too (we’ll learn how tomorrow).

If you add the premise that “Something might be wrong with this die; something which subtracts from its symmetry”, then this is equivalent to “The die is asymmetric or it is not.” That is a tautology and is always true, no matter what the state of the universe and no matter whether your real die is symmetric or not.

Tautologies add no information, they cannot change the probability of any conclusion. Pause and think hard about this, because this important fact confuses. If we want to learn about this die we have to change the argument, especially the conclusion. We also need to add premises in the form of observations of actual die tosses. We’ll learn how to do that later, too.

As we have long discussed, we do not need to add any premises about “randomness” or “fair dies” to our premises. Random only means “unknown” and nothing more. Of course the conclusion is not certain: we cannot deduce a ‘6’ will show (without adding premises). The outcome is uncertain, which means it has a probability greater than zero but less than one.

Adding words about “fair” die is equivalent to saying “the probability of a ‘6’ is 1/6″ to the list of premises. This is not wrong, but it makes the argument circular. It puts the conclusion in the premises, from which we can deduce the conclusion.

« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑