Skip to content

Category: Philosophy

The philosophy of science, empiricism, a priori reasoning, epistemology, and so on.

May 16, 2008 | 9 Comments

Stats 101: Chapter 3

Three is ready to go.

I should re-emphasize one of the goals of this book. It is meant to be for that large host of unfortunates who are forced—I mean required—to take a statistics course and, importantly, do not want to. This is why a lot of formulas and methods do not make their traditional appearance. Understanding—and not rote—is paramount.

The material is enough to cover in one typical semester. The student will not learn how to handle many different kinds of data, but he damn well will comprehend what somebody is saying when they make a probability statement about data.

Face it. The vast majority of students who sit through statistics classes never again compute their own regression models, factor analyses, etc., etc. But they often read these kinds of results prepared by others. I want them, as their eyes meet a p-value, say to themselves, “Aha! Here is one of those p-value things Stats 101 warned me about! Sure enough, it is being misused yet again. I don’t know the right answer in this study, but I do know what is being claimed is too certain.”

If I can do that, then I will be a happy man.

(The contents of Chapter 3 now follow. If you use Firefox > version 2.0, then you will be able to see all the characters on your screen. Else some of the content may be a little screwy. I apologize for this. If you can’t read everything below, consider this a tease for the real thing. You can always download the chapter and print it out.)

How to Count

1. One, two, three… has a video at this URL

The important part is that “v=wcCw9RHI5mc” business at the end, which essentially means “this is video number wcCw9RHI5mc“. This video is, of course, different than number wcCw9RHI5md, and number wcCw9RHI5me and so on. We can notice that the video number contains 11 different slots (count them), each of which is filled with a number or upper or lower case Latin letter, which means the number is case sensitive; A differs from a. The question is, how many different videos can Youtube host given this numbering scheme? Are they going to run out of numbers anytime soon?

That problem is hard, so we?ll start on a simpler one. Suppose the video numbering scheme only allowed one slot, and that this slot could only contain a single-digit number, chosen from 0-9. Then how many videos could they host? They?d have v=0, v=1 and so on. Ten, right? Now how about if they allowed two slots chosen from 0-9. Just 10 for the first, and 10 for each of the 10 of the first, a confusing way of saying 10 ? 10. For three slots it?s 10 ? 10 ? 10. But you already knew how to do this kind of counting, didn?t you?

Suppose the single slot is allowed only to be the lower case letters a,…,z? This is v=a, v=b, etc. How many in two such slots? Just 26 ? 26 = 676. Which is the same way we got 100 in two slots of the numbers 0-9.

So if allow any number, plus any lower or upper case letter in any slot, we have 10 + 26 + 26 = 62 different possibilities per slot. That means that with 11 slots we have 62 ? 62 ? ? ? ? 62 = 6211 ? 5 ? 1019 , or 50 billion billion different videos that Youtube can host.

2. Arrangements

How many ways are there of arranging things? In 1977, George Thorogood remade that classic John Lee Hooker song, “One Bourbon, One Scotch, and One Beer.” This is because George is, of course, the spiritous counterpart of an oenophile; that is, he is a connoisseur of fine spirits and regularly participates in tastings. Further, George, who is way past 21, is not an idiot and never binge drinks, which is about the most moronic of activities that a person could engage in. He very much wants to arrange his coming week, where he will taste, each night, one bourbon (B) , one scotch (S), and one beer (R). But he wants to be sure that the order he tastes these drinks doesn?t influence his personal ratings. So each night he will sip them in a different order. How many different nights will this take him? Write out what will happen: Night 1, BSR; night 2, BRS; night 3, SBR; night 4, SRB; night 5, RBS; night 6, RSB. Six nights! Luckily, this still leaves Sunday free for contemplation.

Later, George decides to broaden his tasting horizons by adding Vernors (the tasty ginger ale aged in oak barrels that can’t be bought in New York City) to his line up. How many nights does it take him to taste things in different order now? We could count by listing each combination, but there?s an easier way. If you have n items and you want to know how many different ways they could be grouped or ordered, the general formula is:

n! = n ? (n ? 1) ? (n ? 2) ? ? ? ? ? 2 ? 1

The term on the left, n!, reads “n factorial.” With 4 beverages, this is 4 ? 3 ? 2 ? 1 = 24 nights, which is over three weeks! Good thing that George is dedicated.

3. Being choosy

It?s the day before Thanksgiving and you are at school, packing your car for the drive home. You would have left a day earlier, but you didn?t want to miss your favorite class?statistics. It turns out that you have three friends who you know need a ride: Larry, Curly, and Moe. Lately, they have been acting like a bunch of stooges, so you decide to tell them that your car is just too full to bring them along. The question is, how many different ways can you arrange your friends to drive home with you when you plan to bring none of them? This is not a trick question; the answer is as easy as you think. Only one way?that is, with you driving alone.

But, they are your friends, and you love them, so you decide to take just one. Now how many ways can you arrange your friends so that you take just one? Since you can take Larry, Curly, or Moe, and only one, then it?s obviously three different ways, just by taking only Larry, or only Curly, or only Moe. What if you decide to take two, then how many ways? That?s trickier. You might be tempted to think that, given there are 3 of them, that the answer is 3! = 6, but that?s not quite right. Write out a list of the groupings: you can take Larry & Curly, Larry & Moe, or Moe & Curly. That?s three possibilities. The grouping “Curly & Larry,” for example, is just the same as the grouping “Larry & Curly.” That is, the order of your friends doesn?t matter: this is why the answer is 3 instead of 6. Finally, all these calculations have made you so happy that you soften your heart and decide totake all three. How many different groupings taking all of them are possible? Right. Only one.

You won’t be surprised to learn that there is a formula to cover situations like this. If you have n friends and you want to count the number of possible groupings of k of them when the order does not matter, then the formula is

(see the book)

The term on the left is read “n choose k”. By definition (via some fascinating mathematics) 0! = 1. Here are all the answers for the Thanksgiving problem:

(see the book)

There are some helpful facts about this combinatorial function that are useful to know. The first is that n choose 0 always equals 1. This means, out of n things, you take none; or it means there is only one way to arrange no things, namely no arrangement at all. n choose n is also always 1, regardless of what n equals. It means, out of n things, you take all. n choose 1 always equals n, and so does n choose n?1 : these are the number of ways of choosing just 1 or just n ? 1 things. As long as n > 2, n > n , which makes sense, because you can make more groups of 2 than of 1.

4. Counting meets probability: The Binomial distribution

We started the Thanksgiving problem by considering it from your point of view. Now we take Larry, Moe, and Curly’s perspective, who are waiting in their dorm room for your call. They don’t yet know whether which, or if any of them, will get a ride with you. Because they do not know, they want to quantify their uncertainty and they do so using probability. We are now entering a different realm, where counting meets probability. Take your time here, because the steps we follow will the same in every probability problem we ever do.

Moe, reminiscent, recalls an incident wherein he was obliged to poke you in the eyes, and guesses that, since you were somewhat irked at the time, the probability that you take any one of the gang along is only 10%. That is, it is his judgment that the probability that you take him, Moe, is 10%, which is the same as you would also (independently) take Curly and so on. So the boys want to figure out the probability that you take none of them, take one of them, take two of them, or take all three of them.

Start with taking all three. We want the probability that you take Larry and Moe and Curly, where the probability of taking each is 10%. Remember probability rule #2? Those “ands” become “times”: so the probability of taking all three is 0.1 ? 0.1 ? 0.1 = 0.001, or 1 in a 1000. Keep in mind: this is from their perspective, not yours. This is their guess of the chances; because you may already have made up your mind?but they don?t know that.

What about taking none of them? This is the chance that you do not take Larry and you do not take Moe, and you do not take Curly. The key word is still “and;” which makes the probability (1 ? 0.1) ? (1 ? 0.1) ? (1 ? 0.1) = 0.93 ? 0.73, since the probability of not taking Larry etc. is one minus the probability of taking him etc. It is, too, because you can either take Larry or not; these are the only two things that can happen, so the probability of taking Larry or not must be 1. We can write this using our notation: let A = “Take Larry”, then AF = “Don’t take him”. Then Pr(A ? AF |E) = Pr(A|E) + Pr(AF |E) = 1, using probability rule #1. So if Pr(A|E) = 0.1, then Pr(AF |E) = 1?Pr(A|E) = 0.9. In this case, E is the information dictated by Moe (who is the leader), which lead him to say Pr(A|E) = 0.1.

How about taking just one? Well, you can take Larry, not take Moe, and not take Curly, and the chance of that is (using rules #1 and #2 together) 0.1 ? (1 ? 0.1) ? (1 ? 0.1) ? 0.08; but you could just as easily have taken Moe and not Larry, or Curly and not Larry, and the chance you do either of these is just the same as you taking Larry and not the other two. For shorthand, write M as “Take M” and so on, and MF as not take M and so on. Thus you could “LMF CF or LF MCF or LF MF C.” Using probability rule #1, we break up this statement into three pieces (“LMF CF “), and then use probability rule #2 on each piece (“ands” turn to times), then add the whole thing up.

You could do all that, but there is an easier way. You could notice there are three different ways to take just one?which we remember from our choosing formula, eq. (10). This makes the probability 3 0.08 = 3 ? 0.08 = 0.24. Since we already know the probability of taking one of those combinations, we just multiply it by the number of times we see it. We could have also written the answer like this:

0.11 x (1 ? 0.1)^2 = 0.24.

And we could also written the first situation (taking all of them) in the same way

0.13 x (1 ? 0.1)^0 = 0.001.

where you must remember that a^0 = 1 (for any a you will come across).

You see the pattern by now. This means we have another formula to add to our collection. This one is called the binomial and it looks like this:

(see book)

There is a subtle shift in notation with this formula, made to conform with tradition. “k” is shorthand for the statement, in this instance, K = “You take k people.” For general situations, k is the number of “successes”: or, K = “The number of successes is k”. Everything to the right of the “|” is still information that we know. So n is shorthand for N = “There are n possibilities for success”, or in your case, N = “There are three brothers which could be taken.” The p means, P = “The probability of success is p”. We already know EB , written here with a subscript to remind us we are in a binomial situation. This new notation can be damn convenient because, naturally, most of the time statisticians are working with numbers, and the small letters mean “substitute a number here,” and if statisticians are infamous for their lack of personality, at least we have plenty of numbers. This notation can cause grief, too. Just how that is so must wait until later.

Don?t forget this: in order for us to be able to use a binomial distribution to describe our uncertainty, we need three things. (1) The definition of a success: in the Thanksgiving example, a success was a person getting a ride. (2) The probability of a success is always the same. (3) The number of chances for successes is fixed.

May 12, 2008 | 8 Comments

Stats 101: Chapter 2

Chapter 2 is now ready for downloading—it can be found at this link.

This chapter is all about basic probability, with an emphasis on understanding and not on mechanics. Because of this, many details are eliminated which are usually found in standard books. If you already know combinatorial probability (taught in every introductory class), you will probably worry your favorite distribution is missing (“What, no Poisson? No negative binomial? No This One or That One?”). I leave these out for good reason.

In the whole book, I only teach two distributions, the binomial and the normal. I hammer home how these are used to quantify uncertainty in observable statements. Once people firmly understand these principles, they will be able to understand other distributions when they meet them.

Besides, the biggest problem I have found is that people, while they may be able to memorize half a dozen distributions or formulas, do not understand the true purpose of probability distributions. There is also no good reason to do calculations by hand now that computers are ubiquitous.

Comments are welcome. The homework section (like in every other chapter) is unfinished. I will be adding more homework as time goes on, especially after I discover what areas are still confusing to people.

Once again, the book chapter can be downloaded here.

May 3, 2008 | 23 Comments

Stats 101: Chapter 1

UPDATE: If you downloaded the chapter before 6 am on 4 May, please download another copy. An older version contained fonts that were not available on all computers, causing it to look like random gibberish when opened. It now just looks like gibberish

I’ve been laying aside a lot of other work, and instead finishing some books I’ve started. The most important one is (working title only) Stats 601, a professional explanation of logical probability and statistics (I mean the modifier to apply to both fields). But nearly as useful will be Stats 101, the same sort of book, but designed for a (guided or self-taught) introductory course in modern probability and statistics.

I’m about 60% of the way through 101, but no chapter except the first is ready for public viewing. I’m not saying Chapter 1 is done, but it is mostly done.

I’d post the whole thing, but it’s not easy to do so because of the equations. Those of you who use Linux will know of latex2html, which is a fine enough utility, but since it turns all equations into images, documents don’t always end up looking especially beautiful or easy to work with.

So below is a tiny excerpt, with all of Chapter 1 available at this link. All questions, suggestions for clarifications, or queries about the homework questions are welcome.


1. Certainty & Uncertainty

There are some things we know with certainty. These things
are true or false given some evidence or just because they are
obviously true or false. There are many more things about which
we are uncertain. These things are more or less probable given
some evidence. And there are still more things of which nobody
can ever quantify the uncertainty. These things are nonsensical or

First I want to prove to you there are things that are true,
but which cannot be proved to be true, and which are true based
on no evidence. Suppose some statement A is true (A might be
shorthand for “I am a citizen of Planet Earth”; writing just ‘A’ is
easier than writing the entire statement; the statement is every-
thing between the quotation marks). Also suppose some statement
B is true (B might be “Some people are frightfully boring”). Then
this statement: “A and B are true”, is true, right? But also true is
the statement “B and A are true”. We were allowed to reverse the
letters A and B and the joint statement stayed true. Why? Why
doesn?t switching make the new statement false? Nobody knows.
It is just assumed that switching the letters is valid and does not
change the truth of the statement. The operation of switching
does not change the truth of statements like this, but nobody will
ever be able to prove or explain why switching has this property.
If you like, you can say we take it on faith.

That there are certain statements which are assumed true
based on no evidence will not be surprising to you if you have
ever studied mathematics. The basis of all mathematics rests on
beliefs which are assumed to be true but cannot be proved to
be true. These beliefs are called axioms. Axioms are the base;
theorems, lemmas, and proofs are the bricks which build upon
the base using rules (like the switching statements rule) that are
also assumed true. The axioms and basic rules cannot, and can
never, be proved to be true. Another way to say this is, “We hold
these truths to be self-evident.”

Here is one of the axioms of arithmetic: For all natural
numbers x and y, if x = y, then y = x. Obviously true, right? It is just
like our switching statements rule above. There is no way to prove
this axiom is valid. From this axiom and a couple of others, plus
acceptance of some manipulation rules, all of mathematics arises.
There are other axioms?two, actually?that define probability.
Here, due to Cox (1961), is one of those axioms: The probability
of a statement on given evidence determines the probability of its
contradictory on the same evidence. I’ll explain these terms as we

It is the job of logic, probability, and statistics to quantify
the amount of certainty any given statement has. An example
of a statement which might interest us: “This new drug improves
memory in Alzheimer patients by at least ten percent.” How prob-
able is it that that statement is true given some specific evidence,
perhaps in the form of a clinical trial? Another statement: “This
stock will increase in price by at least two dollars within the next
thirty days.” Another: “Marketing campaign B will result in more
sales than campaign A.” In order to specify how probable these
statements are, we need evidence, which usually comes in the form
of data. Manipulating data to provide coherent evidence is why
we need statistics.

Manipulating data, while extremely important, is in some
sense only mechanical. We must always keep in mind that our
goal is to make sense of the world and to quantify the uncertainty
we have in given problems. So we will hold off on playing with data
for several chapters until we understand exactly what probability
really means.

2. Logic

We start with simple logic. Here is a classical logical argument,
slightly reworked:

All statistics books are boring.

Stats 101 is a statistics book.
Therefore, Stats 101 is boring.

The structure of this argument can be broken down as follows.
The two statements above the horizontal line are called premises;
they are our evidence for the statement below the line, which is
the conclusion. We can use the words “premises” and “evidence”
interchangeably. We want to know the probability that the conclusion
is true given these two premises. Given the evidence listed,
it is 1 (probability is a number between, and including, 0 and 1).
The conclusion is true given these premises. Another way to say
this is the conclusion is entailed by the premises (or evidence).

You are no doubt tempted to say that the probability of the
conclusion is not 1, that is, that the conclusion is not certain,
because, you say to yourself, statistics is nothing if not fun. But
that would be missing the point. You are not free to add to the
evidence (premises) given. You must assess the probability of the
conclusion given only the evidence provided.

This argument is important because it shows you that there
are things we can know to be true given certain evidence. Another
way to say this, which is commonly used in statistics, is that the
conclusion is true conditional on certain evidence.

(To read the rest, Chapter 1 is available at this link.)

February 29, 2008 | 10 Comments

The tyranny and hubris of experts

Today, another brief (in the sense of intellectual content) essay, as I’m still working on the Madrid talk, the Heartland conference is this weekend, and I have to, believe it or not, do some work my masters want.

William F. Buckley, Jr. has died, God rest his soul. He famously said, “I’d rather be governed by the first 2000 names in the Boston phone book than by the dons of Harvard.” I can’t usefully add to the praise of this great man that has begun appearing since his death two days ago, but I can say something interesting about this statement.

There are several grades of pine “2 by 4’s”, the studs that make up the walls and ceilings of your house. Superior grades are made for exterior walls, lesser grades are useful for external projects, such as temporary bracing. A carpenter would never think of using a lesser grade to build your roof’s trusses, for example. Now, if you were run into a Home Depot and grab the first pine studs you came to (along with the book How to Build a Wall), thinking you could construct a sturdy structure on your own, you might be right. But you’re more likely to be wrong. So you would not hesitate to call in an expert, like my old dad, to either advise you of the proper materials or to build the thing himself.

Building an entire house, or even just one wall, is not easy. It is a complicated task requiring familiarity with a great number of tools, knowledge of various building techniques and materials, and near memorization of the local building codes. But however intricate a carpenter’s task is, we can see that it is manageable. Taken step by step, we can predict to great accuracy exactly what will happen when we, say, cut a board a certain way and nail it to another. In this sense, carpentry is a simple system.

There is no shortage of activities like this: for example baking, auto mechanics, surgery, accounting, electronic engineering, and even statistics. Each of these diverse occupations are similar in the sense that when we are plying that trade, we can pull a lever and we usually or even certainly know which cog will engage and therefore what output to expect. That is, once one has become an expert in that field. If we are not an expert and we need the services of one of these trades, we reach for phone book and find somebody who knows what he’s doing.

But there are other areas which are not so predictable. One of these is governance, which is concerned with controlling and forecasting the activity and behavior of humans. As everybody knows, it is impossible to reliably project what even one person will do on a consistent basis, let alone say what a city or country full of people will be like in five years. Human interactions are horribly, unimaginably complex and chaotic, and impossible to consistently predict.

Of course, not everyone thinks so. There is an empirically-observed relationship that says the more institutionalized formal education a person has, the more likely it is that that person believes he can predict human behavior. We call these persons academics. These are the people who make statements (usually in peer-reviewed journals) like, “If we eliminate private property, then there will be exact income equality” and “We can’t let WalMart build a store in our town because WalMart is a corporation.” (I cleaned up the language a bit, since this is a PG-rated blog.)

It is true, and it is good, that everybody has opinions on political matters, but most people, those without the massive institutionalized formal education, are smart enough to realize the true value of their opinions. Not so the academics, who are usually in thrall to a theory whose tenets dictate that if you pull this one lever, this exact result will always obtain. Two examples, “If we impose a carbon tax, global warming will cease” and “If the U.S.A. dismantles its nuclear weapons, so too will the rest of the world, which will then be a safer place.”

Political and economic theories are strong stuff and even the worst of them is indestructible. No amount of evidence or argument can kill them because they can always find refuge among the tenured. The academics believe in these theories ardently and often argue that they should be given the chance—because they are so educated and we are not—to implement them. They think that—quite modestly of course–because they are so smart and expert, that they can decide what is best for those not as smart and expert. Their hero is Plato who desired a country run by philosophers, the best of the best thinkers. In other words, people like them.

The ordinary, uneducated man is more likely to just want to be left alone in most matters and would design his laws accordingly. He would in general opt for freedom over guardianship. He is street-smart enough to know that his decisions often have unanticipated outcomes, and is therefore less lofty in his goals. And this is why Buckley would choose people from the phone book rather the from the campus.