Stats 101: Chapter 5

Update: 21 May 4:45 am. I forgot to actually upload the file until right this moment. Thanks to Mike and Harry for the reminder.

Chapter 5 is ready to go.

This is purely a mechanical chapter, introducing R. Thrilling reading, it is not. But it’s necessary to learn in order to be able to carry out the analysis in later chapters. The book website is not fully up; only the datasets are there. To learn to install R, just look on the R website.

I’ll be posting Chapters 6 and 7 in short order and then we finally get to the good stuff.

R

1. R

R is a fantastic, hugely supported, rapidly growing, infinitely extensible, operating-system agnostic, free and open source statistical software platform. Nearly everybody who is anybody uses R, and since I want you to be somebody, you will use it, too. Some things in R are incredibly easy to do; other tasks are bizarrely difficult. Most of what makes R hard for the beginner is the same stuff that makes any piece of software hard; that is, getting used to expressing your statistical desires in computerese. As such an environment can be strange and perplexing at first, some students experience a kind of peculiar stress that is best described by example. Here is a video from a Germany showing a young statistics student who experienced trouble understanding R:

http://youtube.com/watch?v=PbcctWbC8Q0

Be sure that this doesn?t happen to you. Remember what Douglas Adams said: Don?t panic.

The best way to start is by going to r-project.org and click the CRAN under the Download heading. You can?t miss it. After that, you have to choose a mirror, which means one of the hundreds of computers around the world that host the software. Obviously, pick a site near you. Once that?s done, and choose your platform (your operating system, like Linux or one of the others), and then choose the base package. Step-by-step instructions are at this book?s website: wmbriggs.com/book. It is no more difficult to install than any other piece of software.

This is not the place to go over all the possibilities of R; just the briefest introduction will be given, because there are far better places available online (see the book website for links). But there are a few essential commands that you should not do without.

These are

Command Description
help(command) Does the obvious: always scroll down to
the bottom of the help to see examples of
the command.
?command Same as help()
apropos(?string?) If you cannot remember the name of a
command?and I always forget?but re-
member is started with co?something,
then just type apropos(?co?) and you?ll
get a complete list of commands that have
co anywhere in their names.
c() This is the concatenation function: typing
c(1,2) concatenates a 2 to 1, or sticks on
the end 1 the number 2, so that we have
a vector of numbers.

The Appendix gives a fuller list of R commands.

It is important to understand that R is a command-line language, which we may interpret as meaning that all commands in R are functions which must be typed into the console. These are objects that are a command name plus a left and right parenthesis, with variables (called arguments) stuck in between, thus: plot(x,y). Remember that you are dealing with computers, which are literal, intolerant creatures (much like the people who want to ban smoking), and so cannot abide even the slightest deviation from its expectations. That means, if instead of plot(x,y), you type lot(x,y), or plot x,y), or plot(,y), or plot(x,y things will go awry. R will try to give you an idea of what went wrong by giving you an error message. Except in cases like that last typo, which will cause you to develop stress lines, because all you?ll see is this

+

and every attempt of yours to type anything new, or hit enter 100 times, will not do a thing except give you more lines of + or other screwy errors. Because why? Because you typed plot(x,y; that is, you typed a left parenthesis (right before the x) and you never “closed” it with a right parenthesis, and R will simply wait forever for you to type one in.

The solution is to enter a right parenthesis, or hit

ctrl+c

which means the control key plus the c key simultaneously, which “breaks” the current computation.

Using R means that you have to memorize (!) and type in commands instead of using a graphical user interface (GUI), which is the standard point-and-click screen with which you are probably familiar. It is my experience that students who are not used to computers start freaking out at this point; however, there is no need to. I have made everything very, very easy and all you have to do is copy what you see in the book to the R screen. All will be well. I promise.

GUIs are very nice things, incidentally, and R has one that you can download and play with. It is called the R Commander. Like all GUIs, some very basic functionality is included that allows you to, well, point and click and get a result. Problem is, the very second you want to do something different than what is available from the GUI, you are stuck. With statistics, we often want to do something differently, so we will stick with the command line.

2. R binomially

By now, you are eagerly asking yourself:”?Can R help up with those binomial calculations like in the Thanksgiving example?” Let?s type apropos(‘bino’) and see, because, after all, ‘bino’ is something like binomial. The most likely function is called binomial, so let?s type ?binomial and see how it works. Uh oh. Weird words about “family objects” and the function glm(), and that doesn’t sound right. What about one of the functions like dbinom()? Jackpot. We’ll look at these in detail, since it turns out that this structure of four functions is the same for every distribution. The functions are in this table:

dbinom The probability of density function: given the size,
or n, and prop, or p, this calculates the probability
that we see x successes; this is equation (11).
pbinom The distribution function, which calculates the probability that the number of successes is less than or
equal to some a.
qbinom This is the “quantile” function, which calculates,
given a probability from the distribution function,
which value of q it is associated with. This will be
made clear with some examples with the normal distribution later.
rbinom This generates a “random” binomial number; and
since random means unknown, this means it gener-
ates a number that is unknown in some sense; we?ll
talk about this later.

Let’s go back to the Thanksgiving example, which used a binomial. Moe can calculate, given n = size = 3, p = prob = 0.1,
his probabilities using R:

dbinom(0,3,.1)

which gives the probability of taking nobody along for the ride. The answer is [1] 0.729. The ?[1] in front of the number just means that you are only looking at line number 1. If you asked for dozens of probabilities, for example, R would space them out over several lines. Let?s now calculate the probability of taking just 0, just 1, etc.

dbinom(c(0,1,2,3),3,.1)

where we have “nested” two functions into one: the first is the concatenation function c(), where we have stuck the numbers 0 through 3 together, and which shows you the dbinom() function can calculate more than one probability at a time. What pops out is

[1] 0.729 0.243 0.027 0.001;

that is, the exact values we got above for taking 0 or 1 or 2 etc. along for the ride. Now we can look at the distribution function:

pbinom(c(0,1,2,3),3,.1);

and we get

[1] 0.729 0.972 0.999 1.000.

This is the probability of taking 0 or less, 1 or less, 2 or less, and 3 or less. The last probability very obviously has to be 1, and will always be 1 for any binomial (as long as the last value in the function c(0,1,…,n) equals n).

There turns out to be a shortcut to typing the concatenation function for simple numbers, and here it is:

c(0,1,2,…,n) = 0:n.

So we can rewrite the first function as dbinom(0:3,3,.1) and get the same results.

We can nest functions again and make pretty pictures

plot(dbinom(0:3,3,.1))

And that’s it for any binomial function. Isn’t that simple? (The answer is yes.) The commands never change for any binomial you want to do.

3. R normally

Can R do normal distributions as well? Can it! Let’s type in apropos(‘normal’) and see what we get. A lot of gibberish, that’s what. Where’s the normal distribution? Well, it turns out that computer programmers are a lazy bunch, and they often do not use all the letters of a word to name a function (too much typing). Let’s try apropos(‘norm’) instead (which no matter what should give us at least as many results, right? This is a question of logic, not computers.). Bingo. Among all the rest, we see dnorm and pnorm etc., just like with the biomial. Now type ?dnorm so we can learn about our fun function. Same layout as the binomial; only difference being we need to supply a “mean” and “sd” (the m and s). Sigh. This is an example of R being naughty and misusing the terminology that I earlier forbade: m and s are not a mean and standard deviation. It?s a trap too many fall into. We?ll work with it, but just remember “mean” and “sd” actually imply our parameters m and s.

You will recall from our discussion of normals that we cannot compute a probability of seeing a single number (and if you don’t remember, shame on you: go back and read Chapter 4). The function dnorm does not give you this number, because that probability is always 0; instead, it gives you a “density”, which means little to us. But we can calculate the probability of values being in some interval using the pnorm function. For example, to calculate Pr(x < 10|m = 10, s = 20, EN ), use pnorm(10,10,20) and you should see [1] 0.5. But you already knew that would be the answer before you typed it in, right? (Right?) Let?s try a trickier one: Pr(x < 0|m = 10, s = 20, EN ); type pnorm(0,10,20) and get [1] 0.3085375. So what is this probability: Pr(x > 0|m = 10, s = 20, EN ) (x greater than 0)? Think about it. x can either be less than or greater than 0; the probability it is so is 1. So Pr(x < 0|m = 10, s = 20, EN ) + Pr(x > 0|m = 10, s = 20, EN ) = 1. Thus, Pr(x < 0|m = 10, s = 20, EN ) Pr(x > 0|m = 10, s = 20, EN ) = 1 ? Pr(x < 0|m = 10, s = 20, EN ). We can get that in R by typing 1-pnorm(0,10,20) and you should get [1] 0.6914625, which is 1 ? 0.3085375 as expected. By the way, if you are starting to feel the onset of a freak out, and wonder "Why, O why, can't he give us a point and click way to do this!" Because, dear reader, a point and click way to do this does not exist. Stop worrying so much. You'll get it. What is Pr(15 < x < 18|m = 15, s = 5, EN ) (which might be reasonable numbers for the temperature example)? Any interval splits the data into three parts: the part less than the lower bound (15), the part of the interval itself (15-18), and the part larger than the upper bound (18). We already know how to get Pr(x < 15|m = 15, s = 5, EN ), which is pnorm(15,15,5), and which equals 0.5. We also know how to get Pr(x > 18|m = 15, s = 5, EN ), which is 1-pnorm(18,15,5), and which equals 0.2742531. This means that Pr(x < 15 or x > 18|m = 15, s = 5, EN ), using probability rule number 1, is 0.5 + 0.2742531 = 0.7742531. Finally, 0.7742531 is the probability of not being in the interval, so the probability of being in the interval must be one minus this, or 1 ? 0.7742531 = 0.2257469. A lot of work. We could have jumped right to it by typing

pnorm(18,15,5)-pnorm(15,15,5).

This is the way you write the code to compute the probability of any interval?remembering to input your own m and s of course!

4. Advanced

. You don?t need to do this section, because it is somewhat more complicated. Not much, really, but enough that you have to think more about the computer than you do the probability.

Our goal is to plot the picture of a normal density. The function dnorm(x,15,5) will give you the value of the normal density, with an m = 15 and s = 5, for some value of x. To picture the normal, which is a picture of densities for a range of x, we somehow have to specify this range. Unfortunately, there is no way to know in advance which range you want to plot, so getting the exact picture you want takes some work. Here is one way:

x = seq(-4,4,.01)

which gives us a sequence of numbers from -4 to 4 in increments of 0.01. Thus, x = ?4.00, ?3.99, ?3.99, . . . , 4. Calculating the density of each of these values of x is easy:

dnorm(x)

where you will have noticed that I did not type a m or s. Type ?dnorm again. It reads dnorm(x, mean=0, sd=1, log = FALSE). Ignoring the log = FALSE bit, we can see that R supplies helpfully default values of the parameters. They are default, because if you are happy with the values chosen, you do not have to type in your own. In this case, m = 0 and s = 1, which is called a standard normal. Anyway, to get the plot is now easy:

plot(x,dnorm(x),type=?l?)

This means, for every value of x, plot the value of dnorm at that value. I also changed the plot type to a line with type=’l’ , and which makes the graph prettier. Try doing the plot without this argument and see what you get.

Stats 101: Chapter 4

Chapter 4 is ready to go.

This is where it starts to get weird. The first part of the chapter introduces the standard notation of “random” variables, and then works through a binomial example, which is simple enough.

Then come the so-called normals. However, they are anything but. For probably most people, it will be the first time that they hear about the strange creatures called continuous numbers. It will be more surprising to learn that not all mathematicians like these things or agree with their necessity, particularly in problems like quantifying probability for real observable things.

I use the word “real” in its everyday, English sense of something that is tangible or that exists. This is because mathematicians have co-opted the word “real” to mean “continuous”, which in an infinite amount of cases means “not real” or “not tangible” or even “not observable or computable.” Why use these kinds of numbers? Strange as it might seem, using continuous numbers makes the math work out easier!

Again, what is below is a teaser for the book. The equations and pictures don’t come across well, and neither do the footnotes. For the complete treatment, download the actual Chapter.

Distributions

1. Variables

Recall that random means unknown. Suppose x represents the number of times the Central Michigan University football team wins next year. Nobody knows what this number will be, though we can, of course, guess. Further suppose that the chance that CMU wins any individual game is 2 out of 3, and that (somewhat unrealistically), a win or loss in any one game is irrelevant to the chance that they win or lose any other game. We also know that there will be 12 games. Lastly, suppose that this is all we know. Label this evidence E. That is, we will ignore all information about who the future teams are, what the coach has leaked to the press, how often the band has practiced their pep songs, what students will fail their statistics course and will thus be booted from the team, and so on. What, then, can we say about x?

We know that x can equal 0, or 1, or any number up to 12. It’s unlikely that CMU will loss or win every game, but they?ll prob ably win, say, somewhere around 2/3s, or 6-10, of them. Again, the exact value of x is random, that is, unknown.

Now, if last chapter you weren?t distracted by texting messages about how great this book is, this situation might feel a little familiar. If we instead let x (instead of k?remember these letters are place holders, so whichever one we use does not mat
ter) represent the number of classmates you drive home, where the chance that you take any of them is 10%, we know we can figure out the answer using the binomial formula. Our evidence then was EB . And so it is here, too, when x represents the number of games won. We?ve already seen the binomial formula written in two ways, but yet another (and final) way to write it is this:

x|n, p, EB ? Binomial(n, p).

This (mathematical) sentence reads “Our uncertainty in x, the number of games the football team will win next year, is best represented by the Binomial formula, where we know n, p, and our information is EB .” The “?” symbol has a technical definition: “is distributed as.” So another way to read this sentence is “Our uncertainty in x is distributed as Binomial where we know n, etc.” The “is distributed as” is longhand for “quantified.” Some people leave out the “Our uncertainty in”, which is OK if you remember it is there, but is bad news otherwise. This is because people have a habit of imbuing x itself with some mystical properties, as if “x” itself had a “random” life. Never forget, however, that it is just a placeholder for the statement X = “The team will win x games”, and that this statement may be true or false, and it?s up to us to quantify the probability of it being true.

In classic terms, x is called a “random variable”. To us, who do not need the vague mysticism associated with the word random, x is just an unknown number, though there is little harm in calling it a “variable,” because it can vary over a range of numbers. However, all classical, and even much Bayesian, statistical theory uses the term “random variable”, so we must learn to work with it.

Above, we guessed that the team would win about 6-10 games. Where do these number come from? Obviously, based on the knowledge that the chance of winning any game was 2/3 and there?d be twelve games. But let?s ask more specific questions. What is the probability of winning no games, or X = “The team will win x = 0 games”; that is, what is Pr(x = 0|n, p, EB )? That’s easy: from our binomial formula, this is (see the book) ? 2 in a million. We don’t need to calculate n choose 0 because we know it?s 1; likewise, we don?t need to worry about 0.670^0 because we know that?s 1, too. What is the chance the team wins all its games? Just Pr(x = 12|n, p, EB ). From the binomial, this is (see the book) ? 0.008 (check this). Not very good!

Recall we know that x can take any value from zero to twelve. The most natural question is: what number of games is CMU most likely to win? Well, that’s the value of x that makes (see the book) the largest, i.e. the most probable. This is easy for a computer to do (you’ll learn how next Chapter). It turns out to be 8 games, which has about a one in four chance of happening. We could go on and calculate the rest of the probabilities, for each possible x, just as easily.

What is the most likely number of games the team will win is the most natural question for us, but in pre-computer classical statistics, there turns out to be a different natural question, and this has something to do with creatures called expected values. That term turns out to be a terrible misnomer, because we often do not, and cannot, expect any of the values that the “expected value” calculations give us. The reason expected values are of interest has to do with some mathematics that are not of especial interest here; however, we will have to take a look at them because it is expected of one to do so.

Continue reading →

Stats 101: Chapter 3

Three is ready to go.

I should re-emphasize one of the goals of this book. It is meant to be for that large host of unfortunates who are forced—I mean required—to take a statistics course and, importantly, do not want to. This is why a lot of formulas and methods do not make their traditional appearance. Understanding—and not rote—is paramount.

The material is enough to cover in one typical semester. The student will not learn how to handle many different kinds of data, but he damn well will comprehend what somebody is saying when they make a probability statement about data.

Face it. The vast majority of students who sit through statistics classes never again compute their own regression models, factor analyses, etc., etc. But they often read these kinds of results prepared by others. I want them, as their eyes meet a p-value, say to themselves, “Aha! Here is one of those p-value things Stats 101 warned me about! Sure enough, it is being misused yet again. I don’t know the right answer in this study, but I do know what is being claimed is too certain.”

If I can do that, then I will be a happy man.

(The contents of Chapter 3 now follow. If you use Firefox > version 2.0, then you will be able to see all the characters on your screen. Else some of the content may be a little screwy. I apologize for this. If you can’t read everything below, consider this a tease for the real thing. You can always download the chapter and print it out.)

How to Count

1. One, two, three…

Youtube.com has a video at this URL

http://www.youtube.com/watch?v=wcCw9RHI5mc

The important part is that “v=wcCw9RHI5mc” business at the end, which essentially means “this is video number wcCw9RHI5mc“. This video is, of course, different than number wcCw9RHI5md, and number wcCw9RHI5me and so on. We can notice that the video number contains 11 different slots (count them), each of which is filled with a number or upper or lower case Latin letter, which means the number is case sensitive; A differs from a. The question is, how many different videos can Youtube host given this numbering scheme? Are they going to run out of numbers anytime soon?

That problem is hard, so we?ll start on a simpler one. Suppose the video numbering scheme only allowed one slot, and that this slot could only contain a single-digit number, chosen from 0-9. Then how many videos could they host? They?d have v=0, v=1 and so on. Ten, right? Now how about if they allowed two slots chosen from 0-9. Just 10 for the first, and 10 for each of the 10 of the first, a confusing way of saying 10 ? 10. For three slots it?s 10 ? 10 ? 10. But you already knew how to do this kind of counting, didn?t you?

Suppose the single slot is allowed only to be the lower case letters a,…,z? This is v=a, v=b, etc. How many in two such slots? Just 26 ? 26 = 676. Which is the same way we got 100 in two slots of the numbers 0-9.

So if allow any number, plus any lower or upper case letter in any slot, we have 10 + 26 + 26 = 62 different possibilities per slot. That means that with 11 slots we have 62 ? 62 ? ? ? ? 62 = 6211 ? 5 ? 1019 , or 50 billion billion different videos that Youtube can host.

2. Arrangements

How many ways are there of arranging things? In 1977, George Thorogood remade that classic John Lee Hooker song, “One Bourbon, One Scotch, and One Beer.” This is because George is, of course, the spiritous counterpart of an oenophile; that is, he is a connoisseur of fine spirits and regularly participates in tastings. Further, George, who is way past 21, is not an idiot and never binge drinks, which is about the most moronic of activities that a person could engage in. He very much wants to arrange his coming week, where he will taste, each night, one bourbon (B) , one scotch (S), and one beer (R). But he wants to be sure that the order he tastes these drinks doesn?t influence his personal ratings. So each night he will sip them in a different order. How many different nights will this take him? Write out what will happen: Night 1, BSR; night 2, BRS; night 3, SBR; night 4, SRB; night 5, RBS; night 6, RSB. Six nights! Luckily, this still leaves Sunday free for contemplation.

Later, George decides to broaden his tasting horizons by adding Vernors (the tasty ginger ale aged in oak barrels that can’t be bought in New York City) to his line up. How many nights does it take him to taste things in different order now? We could count by listing each combination, but there?s an easier way. If you have n items and you want to know how many different ways they could be grouped or ordered, the general formula is:

n! = n ? (n ? 1) ? (n ? 2) ? ? ? ? ? 2 ? 1

The term on the left, n!, reads “n factorial.” With 4 beverages, this is 4 ? 3 ? 2 ? 1 = 24 nights, which is over three weeks! Good thing that George is dedicated.

3. Being choosy

It?s the day before Thanksgiving and you are at school, packing your car for the drive home. You would have left a day earlier, but you didn?t want to miss your favorite class?statistics. It turns out that you have three friends who you know need a ride: Larry, Curly, and Moe. Lately, they have been acting like a bunch of stooges, so you decide to tell them that your car is just too full to bring them along. The question is, how many different ways can you arrange your friends to drive home with you when you plan to bring none of them? This is not a trick question; the answer is as easy as you think. Only one way?that is, with you driving alone.

But, they are your friends, and you love them, so you decide to take just one. Now how many ways can you arrange your friends so that you take just one? Since you can take Larry, Curly, or Moe, and only one, then it?s obviously three different ways, just by taking only Larry, or only Curly, or only Moe. What if you decide to take two, then how many ways? That?s trickier. You might be tempted to think that, given there are 3 of them, that the answer is 3! = 6, but that?s not quite right. Write out a list of the groupings: you can take Larry & Curly, Larry & Moe, or Moe & Curly. That?s three possibilities. The grouping “Curly & Larry,” for example, is just the same as the grouping “Larry & Curly.” That is, the order of your friends doesn?t matter: this is why the answer is 3 instead of 6. Finally, all these calculations have made you so happy that you soften your heart and decide totake all three. How many different groupings taking all of them are possible? Right. Only one.

You won’t be surprised to learn that there is a formula to cover situations like this. If you have n friends and you want to count the number of possible groupings of k of them when the order does not matter, then the formula is

(see the book)

The term on the left is read “n choose k”. By definition (via some fascinating mathematics) 0! = 1. Here are all the answers for the Thanksgiving problem:

(see the book)

There are some helpful facts about this combinatorial function that are useful to know. The first is that n choose 0 always equals 1. This means, out of n things, you take none; or it means there is only one way to arrange no things, namely no arrangement at all. n choose n is also always 1, regardless of what n equals. It means, out of n things, you take all. n choose 1 always equals n, and so does n choose n?1 : these are the number of ways of choosing just 1 or just n ? 1 things. As long as n > 2, n > n , which makes sense, because you can make more groups of 2 than of 1.

4. Counting meets probability: The Binomial distribution

We started the Thanksgiving problem by considering it from your point of view. Now we take Larry, Moe, and Curly’s perspective, who are waiting in their dorm room for your call. They don’t yet know whether which, or if any of them, will get a ride with you. Because they do not know, they want to quantify their uncertainty and they do so using probability. We are now entering a different realm, where counting meets probability. Take your time here, because the steps we follow will the same in every probability problem we ever do.

Moe, reminiscent, recalls an incident wherein he was obliged to poke you in the eyes, and guesses that, since you were somewhat irked at the time, the probability that you take any one of the gang along is only 10%. That is, it is his judgment that the probability that you take him, Moe, is 10%, which is the same as you would also (independently) take Curly and so on. So the boys want to figure out the probability that you take none of them, take one of them, take two of them, or take all three of them.

Start with taking all three. We want the probability that you take Larry and Moe and Curly, where the probability of taking each is 10%. Remember probability rule #2? Those “ands” become “times”: so the probability of taking all three is 0.1 ? 0.1 ? 0.1 = 0.001, or 1 in a 1000. Keep in mind: this is from their perspective, not yours. This is their guess of the chances; because you may already have made up your mind?but they don?t know that.

What about taking none of them? This is the chance that you do not take Larry and you do not take Moe, and you do not take Curly. The key word is still “and;” which makes the probability (1 ? 0.1) ? (1 ? 0.1) ? (1 ? 0.1) = 0.93 ? 0.73, since the probability of not taking Larry etc. is one minus the probability of taking him etc. It is, too, because you can either take Larry or not; these are the only two things that can happen, so the probability of taking Larry or not must be 1. We can write this using our notation: let A = “Take Larry”, then AF = “Don’t take him”. Then Pr(A ? AF |E) = Pr(A|E) + Pr(AF |E) = 1, using probability rule #1. So if Pr(A|E) = 0.1, then Pr(AF |E) = 1?Pr(A|E) = 0.9. In this case, E is the information dictated by Moe (who is the leader), which lead him to say Pr(A|E) = 0.1.

How about taking just one? Well, you can take Larry, not take Moe, and not take Curly, and the chance of that is (using rules #1 and #2 together) 0.1 ? (1 ? 0.1) ? (1 ? 0.1) ? 0.08; but you could just as easily have taken Moe and not Larry, or Curly and not Larry, and the chance you do either of these is just the same as you taking Larry and not the other two. For shorthand, write M as “Take M” and so on, and MF as not take M and so on. Thus you could “LMF CF or LF MCF or LF MF C.” Using probability rule #1, we break up this statement into three pieces (“LMF CF “), and then use probability rule #2 on each piece (“ands” turn to times), then add the whole thing up.

You could do all that, but there is an easier way. You could notice there are three different ways to take just one?which we remember from our choosing formula, eq. (10). This makes the probability 3 0.08 = 3 ? 0.08 = 0.24. Since we already know the probability of taking one of those combinations, we just multiply it by the number of times we see it. We could have also written the answer like this:

0.11 x (1 ? 0.1)^2 = 0.24.

And we could also written the first situation (taking all of them) in the same way

0.13 x (1 ? 0.1)^0 = 0.001.

where you must remember that a^0 = 1 (for any a you will come across).

You see the pattern by now. This means we have another formula to add to our collection. This one is called the binomial and it looks like this:

(see book)

There is a subtle shift in notation with this formula, made to conform with tradition. “k” is shorthand for the statement, in this instance, K = “You take k people.” For general situations, k is the number of “successes”: or, K = “The number of successes is k”. Everything to the right of the “|” is still information that we know. So n is shorthand for N = “There are n possibilities for success”, or in your case, N = “There are three brothers which could be taken.” The p means, P = “The probability of success is p”. We already know EB , written here with a subscript to remind us we are in a binomial situation. This new notation can be damn convenient because, naturally, most of the time statisticians are working with numbers, and the small letters mean “substitute a number here,” and if statisticians are infamous for their lack of personality, at least we have plenty of numbers. This notation can cause grief, too. Just how that is so must wait until later.

Don?t forget this: in order for us to be able to use a binomial distribution to describe our uncertainty, we need three things. (1) The definition of a success: in the Thanksgiving example, a success was a person getting a ride. (2) The probability of a success is always the same. (3) The number of chances for successes is fixed.

Stats 101: Chapter 2

Chapter 2 is now ready for downloading—it can be found at this link.

This chapter is all about basic probability, with an emphasis on understanding and not on mechanics. Because of this, many details are eliminated which are usually found in standard books. If you already know combinatorial probability (taught in every introductory class), you will probably worry your favorite distribution is missing (“What, no Poisson? No negative binomial? No This One or That One?”). I leave these out for good reason.

In the whole book, I only teach two distributions, the binomial and the normal. I hammer home how these are used to quantify uncertainty in observable statements. Once people firmly understand these principles, they will be able to understand other distributions when they meet them.

Besides, the biggest problem I have found is that people, while they may be able to memorize half a dozen distributions or formulas, do not understand the true purpose of probability distributions. There is also no good reason to do calculations by hand now that computers are ubiquitous.

Comments are welcome. The homework section (like in every other chapter) is unfinished. I will be adding more homework as time goes on, especially after I discover what areas are still confusing to people.

Once again, the book chapter can be downloaded here.