# Lesson 1: Limitations and Logic

The internet connectivity here at the Statler almost reaches the category of stinks. Responses will be even slower than usual in appearing.

Act like an Freshman should and crack the spine of any introductory statistics and probability (or probability and statistics) textbook. The one mandated at one location at which I taught was enormous—as all college texts are nowadays—but this book was embossed with a pair of faded blue jeans onto which, if memory serves, was embroidered a normal distribution.

Presumably, the publisher and the committee who approved this work felt that students would appreciate learning about statistics, or would at least attack their studies more zealously, as long as the book they used featured the same poor clothing choice that its readers routinely make.

This book, and every other I have ever seen, all begin with the topic “Measures of central tendency” or “Ways to summarize data” or something similarly named. The student is, before anything else, told how important data is, that a mean is a numerical average and that a median usually is not.

This is an enormous mistake. Not only are data not of paramount importance to statistics, but there is no justifiable reason to begin talking about them until you have first discovered a reason why data matters at all. Just why is it important to calculate a mean? Why collect data in the first place? Can you do probability and statistics without data?

Of course you can.

But there came to be an apparent—not a real—division between “probability” and “statistics”. Probability was thought to be entirely mathematical, and statistics meant “working with data.” This division is arbitrary and unhelpful.

Mathematics has been useful to probability and statistics, but neither of those fields are, or should be, primarily mathematical. Not at the operational level, at least. It is well enough to create a specialized, academic probability which is nothing but mathematics, but this creation must not then be used to argue that all of statistics is best seen as mathematics.

The are four points to this digression. (1) There is no division between probability and statistics. They are one and should be treated as such—I will call the subject “probability.” I leave the term “statistics” to mean particular calculations and numbers; its original definition.

(2) Probability is the working branch of epistemology, not mathematics. Epistemology is about how we know, about what constitutes knowledge. Since not all we know is certain, we need a mechanism to qualify—note that I do not say quantify—this uncertainty.

(3) Treating the subject exclusively mathematically gives results an aura of certainty that is not warranted, as we shall see.

(4) Finally, it is easier to teach probability as logic than as mathematics. It is not that the math should be eliminated or skimped, but the reason for the all equations will finally make sense.

The a priori is that stuff we know by recourse only to intuition. Axioms in mathematics are like this: statements which appear true, which we believe—and desire, but this emotion is not a requirement—to be true, but which we cannot prove to be true.

Proof means deduction from a set of acknowledged premises (or evidence) to a conclusion which follows from those premises. The connection between evidence and conclusion is another of those things we take as a given, as are the validity of the steps used in the proof. I mean, their validity is assumed to be true in an a priori sense.

For example, with the premises “All men are mortal” and “Socrates is a man” we might form the conclusion “Socrates is mortal.” Given we accept the premises—and both are contingent and therefore are not necessarily true—then the conclusion has probability one: it is certain. It has been deduced.

We can also form the conclusion “Socrates has the flu”—we can form any conclusion we wish! But this one isn’t connected, is it? Why not? Well, we use our knowledge of the meanings of words—which we accept as we do the original premises: as contingent statements that are not necessarily true, but assumed true for the sake of the argument—and we also form the belief that we are and that we are not insane (two more premises always lurking), and we conclude that this new statement is unrelated to our main premises.

In this case, the best we can say is that the probability of the conclusion might be true or it might be false: we have no information of any kind; except our premises and knowledge that the conclusion is a contingent statement. Therefore, the best that can be said is that the probability is greater than zero and less than one.

There are arguments against the a priori, which are either skeptical or empirical. Skepticism is an academic-only belief that says nothing can be known, or even that nothing is. Ravings of this sort are designed purely for tenure committees and to induce the adulation of conspecifics. They have nothing to do with reality, which, in any case, is denied to even exist.

Empiricism is the belief that all can be proved by recourse to actual observations. Always left unanswered is how we can tie an observation to a conclusion or statement which we would like to prove. That is, the validity of argument is assumed—not empirically verified—to be true. In other words, the a priori lurks, even here.

Before we leave off, here are two examples of probability as logic.

Premises: There are 12 Johnstons and 18 Freihaufs in a room, from which you will select the one closest to the door. The conclusion, “A Freihauf” is selected has, conditional on these and our hidden premises, a probability 18 / (12+18) = 0.6, or 60%.

Premises: There are more Freihaufs than Johnstons in a room, from which you will select the one closest to the door. The conclusion, “A Freihauf” is selected has, conditional on these and our hidden premises, a probability greater than 50%. And that is the best we can say. Importantly: we cannot assign any less fuzzy numerical value to the probability.

1. DAV says:

Ahhh! But it’s well known that Friehaufs tend to stick together and prefer window seats — about as far from the door as one can get. So much for your hidden premise that the two groups are more or less uniformly distributed throughout the room. The “real” probability of picking a Friehauf is essentially zero.

“No division between probability and statistics”? Well maybe but my two are constantly bickering. Must be nice to have well-behaved children.

2. DAV says:

I don’t remember exactly when but I once observed that I am a Skeptic. Does the observation then make me an Empirical Skeptic? If so, which one represents my feminine side?

3. I hope you were wearing embroidered jeans when you gave the lecture. You need to connect with the students.

4. Ron DeWitt says:

Nice start. I look forward to the following episodes, and I envy your students.

5. Smoking Frog says:

“Premises: There are more Freihaufs than Johnstons in a room, from which you will select the one closest to the door. The conclusion, â€œA Freihaufâ€ is selected has, conditional on these and our hidden premises, a probability greater than 50%. And that is the best we can say. Importantly: we cannot assign any less fuzzy numerical value to the probability.”

I think the probability is ever so slightly greater than 75%, since all Freihauf fractions greater than 1/2 (including 1) are equiprobable.

6. SteveBrooklineMA says:

Consider: “There are an unequal number of Freihaufs than Johnstons in a room, from which you will select the one closest to the door.” Does the conclusion, â€œa Freihauf is selectedâ€ have probability 1/2? Such a conclusion seems to contradict the premise. I call this “Rafe’s Paradox,” named after one of your readers.

7. Smoking Frog says:

SteveBrooklineMA — “Consider: ‘There are an unequal number of Freihaufs than Johnstons in a room, from which you will select the one closest to the door.’ Does the conclusion, ‘a Freihauf is selected’ have probability 1/2? Such a conclusion seems to contradict the premise. I call this ‘Rafeâ€™s Paradox,’ named after one of your readers.”

I don’t understand that. Why would anyone think it was 1/2? Are you talking about lack of a Bayesian prior forcing the prior 1/2, or what?