Author: Briggs

May 22, 2008 | 3 Comments

Stats 101: Chapter 6

It was one of those days yesterday. I got two chapters up, but did not give anybody a way to get them! Here it is

These are the last two “basics” Chapters. 6 first, and it is a little thin, so I’ll probably expand it later. It’s sort of a transition between probability where we know everything to statistics where we don’t. And by “everything” I mean the parameters of probability models. I want the reader to build up a little intuition before it starts to get rough.

The most important part of 6 is the homework, which I usually spend a lot of time with in class.

In a couple of days we start the good stuff. Book link.

CHAPTER 6

Normalities & Oddities

1. Standard Normal

Suppose x|m, s, EN ? N(m, s), then there turns out to be a trick that can make x easier to work with, especially if you have to do any calculations by hand (which, nowadays, will be rarely). Let z = (x-m)/s, then z|m, s, EN ? N(0, 1). It works for any m and s. Isn’t that nifty? Lots of fun facts about z can be found in any statistics textbook that weighs over 1 pound (these tidbits are usually in the form of impenetrable tables located in the back of the books).

What makes this useful is that Pr(z > 2|0, 1, EN ) ? Pr(z > 1.96|0, 1, EN ) = 0.025 and Pr(z < ?2|0, 1, EN ) ? Pr(z < ?1.96|0, 1, EN ) = 0.025: or, in words, the probability that z is bigger than 2 or less than negative 2 is about 0.05, which is a magic (I mean real voodoo) value in classical statistics. We already learned how to do this in R, last Chapter. In Chapter 4, a homework question explained the rules of petanque, which is a game more people should play. Suppose the distance the boule lands from the cochonette is x centimeters. We do not know what x will be in advance, and so we (approximately) quantify our uncertainty in it using a normal distribution with parameters m = 0 cm and s = 10 cm. If x > 0 cm it means the boule lands beyond the cochonette, and if x < 0 cm is means the boule lands in front of the cochonette. You are out on the field playing, far from any computer, and the urge comes upon you to discover the probability that x > 30 cm. First thing to do is to calculate z which equals (30cm ? 0cm)/10cm = 3 (the cm cancel). What is Pr(z > 3|0, 1, EN )? No idea; well, some idea. It must be less than 0.025, since we have all memorized that Pr(z > 2|0, 1, EN ) ? 0.025. The larger z is, the more improbable it becomes (right?). Let?s say as a guess 1%. When you get home, you can open R and plug in 1-pnorm(3) and see that the actually probability is 0.1%, so we were off by an order of magnitude (a power of 10), which is a lot, and which proves once again that computers are better at math than we are.

2. Nonstandard Normal

The standard normal example is useful for developing your probabilistic intuition. Since normal distributions are used so often, we will spend some more time thinking about some consequences of using them. Doing this will give you a better feel for how to quantify uncertainty.

Below is a picture of two normal distributions. The one with the solid line has m1 = 0 and s1 = 1; the dashed line has m2 = 0.5 and also s2 = 1. In other words, the two distributions differ only in their central parameter, they have the same variance parameter. Obviously, large values are more likely according to distribution 2, and smaller values are more likely given distribution 1, as a simple consequence of m2 > m1 . However, once we get to values of about x = 4 or so, it doesn?t look like the distributions are that different. (Cue the spooky music.) Or are they?.

Under the main picture are two others. The one on the left is exactly like the main picture, except that it focuses only on the range of x = 3.5 to x = 5. If we blow it up like this, we can see that it is still more likely to see large values of x using distribution 2.

How much more likely? The picture on the right divides the probabilities of seeing x or larger with distribution 2 by distribution 1, and so shows how much more likely it is to see larger values with distribution 2 than 1. For example, pick x = 4. It is about 7.5 times more likely to see an x = 4 or larger with distribution 2. That?s a lot! By the time we get out to x = 5, we are 12 times more likely to see values this large with distribution 2. The point is that even very small changes in the central parameters lead to large differences in the probabilities of “extreme”, values of x.

(see the book)

This next picture again shows two different distributions, this time with m1 = m2 = 0 with s1 = 1 and s1 = 1.1. In other words, both distributions have the same central parameters, but distribution 2 has a variance parameter that is slightly larger. The normal density plots do not look very different, do they? The dashed line, which is still distribution 2, has a peak slightly under distribution 1’s, but the differences looks pretty small.

(see the book)

The bottom panels are the same as before. The one on the left blows up the area where x > 3.5 and x < 5. A big difference still exists. And the ratio of probabilities is still very large. It's not shown, but the plot of the right would be duplicated (or mirrored, actually) if we looked at x > ?5 and x < ?3.5. It is more probable to see extreme events in either direction (positive or negative) using distribution 2. The surprising consequence is that very small changes in either the central parameter or the variance parameter can lead to very large differences at the extremes. Examples of these phenomena are easily found in real life, but my heightened political sensitivity precludes me from publicly pointing any of these out.

3. Intuition

We have learned probability and some formal distributions, but we have not yet moved to statistics. Before we do so, let us try to develop some intuition about the kinds of problems and solutions we will see before getting to technicalities. There are a number of concepts that will be important, but I don?t want to give them a name, because there is no need to memorize jargon, while it is incredibly important that you develop a solid under- standing of uncertainty.

The well-known Uncle Ted Nugent’s chain of Kill ’em and Grill ’em Vension Burger restaurants sell both Coke and Pepsi, and their internal audit shows they sell about an equal amount of each. The busy Times Square branch of the chain has about 5000 customers a day, while the store in tiny Gaylord, Michigan sees only about 100 customers. Which location is more likely to sell, on any given day, at least 2 times more Pepsi than Coke?

A useful technique for solving questions like this is exaggeration. For instance, the question is asking about a difference in location. What differs between those places? Only one thing, the number of customers. One site gets about 5000 people a day, the other only 100. Let?s exaggerate that difference and solve a simpler problem. For example, suppose Times Square still gets 5000 a day, but Gaylord only gets 1 a day. The information is that selling a Coke is roughly equal to the probability of selling a Pepsi. This means that, at Gaylord, to that 1 customer on that day, they will either sell 1 Coke or 1 Pepsi. If they sell a Pepsi, Gaylord has certainly sold more than 2 times as much Pepsi as Coke. The chance of that happening is 50%. What is two times as much Pepsi as Coke at Times Square? A lot more Pepsi, certainly. So it’s far more likely for Gaylord to sell a greater proportion of Pepsi because they see fewer customers. The lesson is that when the “sample size” is small, we are more likely to see extreme events.

What is the length of the first Chinese Emperor Qin Shi Huangdi’s nose? You don’t know? Well, you can make a guess. How likely is it that your guess is correct? Not very likely. Suppose that you decide to ask everybody you know to also guess, and then average all the answers together in an attempt to get a better guess. How likely is it that this averaged-guess is perfectly correct? No more likely. If you haven’t a clue about the nose, and nobody else does either, than averaging ignorance is no better than single ignorance. The lesson is that just because a large group of people agree on an opinion, it is not necessarily more probable that that opinion, or average of opinions, is correct. Uninformed opinion of a large group of people is not necessarily more likely to be correct than the opinion of the lone nut job on the corner. Think about this the next time you hear the results of a poll or survey.

You already posses other probabilistic intuition. For example, suppose, given some evidence E, the probability of A is 0.0000001 (A is something that might be given many opportunities to happen, e.g. winning the lottery). How often will A happen? Right. Not very often. But if you give A a lot of chances to occur, will A eventually happen? It?s very likely to.

Every player in petanque gets to throw three boules. What are the chances that I get all three within 5 cm? This is a compound problem, so let?s break it apart. How do we find out how likely it is to be within 5 cm of the cochonette? Well, that means the boule can be 5 cm in front of the cochonette, right near it, or up to 5cm beyond it. The chance of this happening is Pr(?5cm < x < 5cm|m = 0cm, s = 10cm, EN ). We learned how to calculate the probability of being in an interval last chapter: pnorm(5,0,10)-pnorm(-5,0,10). This equals about 0.38, which is the chance that one boule lands within, or +/- 5 cm, from the cochonette. What is the chance that all of them land that close? Well, that means the first one does and the second one and the third. What probability rule do we use now? The second, which tells us to multiple the probabilities together, which is 0.383 ? 0.14. The important thing to recall, when confronted with problems of this sort: do not panic. Try to break apart the complex problem into bite-size pieces.

Thanks to a hot tip from Lucia, over at the Diet Diary, I have become wiser about spam. I installed the wp-spamfree plug-in and we’ll see how that works.

OLD “I have been getting an enormous amount of spam over the past week (1000s of postings a day; all caught by the spam filter), so I am shutting off comments for 24 hours in the hope this will get me off some spam lists. Sorry for the inconvenience. “

May 20, 2008 | 3 Comments

Stats 101: Chapter 5

Update: 21 May 4:45 am. I forgot to actually upload the file until right this moment. Thanks to Mike and Harry for the reminder.

Chapter 5 is ready to go.

This is purely a mechanical chapter, introducing `R`. Thrilling reading, it is not. But it’s necessary to learn in order to be able to carry out the analysis in later chapters. The book website is not fully up; only the datasets are there. To learn to install R, just look on the `R` website.

I’ll be posting Chapters 6 and 7 in short order and then we finally get to the good stuff.

R

1. R

R is a fantastic, hugely supported, rapidly growing, infinitely extensible, operating-system agnostic, free and open source statistical software platform. Nearly everybody who is anybody uses R, and since I want you to be somebody, you will use it, too. Some things in R are incredibly easy to do; other tasks are bizarrely difficult. Most of what makes R hard for the beginner is the same stuff that makes any piece of software hard; that is, getting used to expressing your statistical desires in computerese. As such an environment can be strange and perplexing at first, some students experience a kind of peculiar stress that is best described by example. Here is a video from a Germany showing a young statistics student who experienced trouble understanding R:

Be sure that this doesn?t happen to you. Remember what Douglas Adams said: Don?t panic.

The best way to start is by going to r-project.org and click the CRAN under the Download heading. You can?t miss it. After that, you have to choose a mirror, which means one of the hundreds of computers around the world that host the software. Obviously, pick a site near you. Once that?s done, and choose your platform (your operating system, like Linux or one of the others), and then choose the base package. Step-by-step instructions are at this book?s website: wmbriggs.com/book. It is no more difficult to install than any other piece of software.

This is not the place to go over all the possibilities of R; just the briefest introduction will be given, because there are far better places available online (see the book website for links). But there are a few essential commands that you should not do without.

These are

Command Description
help(command) Does the obvious: always scroll down to
the bottom of the help to see examples of
the command.
?command Same as help()
apropos(?string?) If you cannot remember the name of a
command?and I always forget?but re-
member is started with co?something,
then just type apropos(?co?) and you?ll
get a complete list of commands that have
co anywhere in their names.
c() This is the concatenation function: typing
c(1,2) concatenates a 2 to 1, or sticks on
the end 1 the number 2, so that we have
a vector of numbers.

The Appendix gives a fuller list of R commands.

It is important to understand that R is a command-line language, which we may interpret as meaning that all commands in R are functions which must be typed into the console. These are objects that are a command name plus a left and right parenthesis, with variables (called arguments) stuck in between, thus: plot(x,y). Remember that you are dealing with computers, which are literal, intolerant creatures (much like the people who want to ban smoking), and so cannot abide even the slightest deviation from its expectations. That means, if instead of plot(x,y), you type lot(x,y), or plot x,y), or plot(,y), or plot(x,y things will go awry. R will try to give you an idea of what went wrong by giving you an error message. Except in cases like that last typo, which will cause you to develop stress lines, because all you?ll see is this

+

and every attempt of yours to type anything new, or hit enter 100 times, will not do a thing except give you more lines of + or other screwy errors. Because why? Because you typed plot(x,y; that is, you typed a left parenthesis (right before the x) and you never “closed” it with a right parenthesis, and R will simply wait forever for you to type one in.

The solution is to enter a right parenthesis, or hit

ctrl+c

which means the control key plus the c key simultaneously, which “breaks” the current computation.

Using R means that you have to memorize (!) and type in commands instead of using a graphical user interface (GUI), which is the standard point-and-click screen with which you are probably familiar. It is my experience that students who are not used to computers start freaking out at this point; however, there is no need to. I have made everything very, very easy and all you have to do is copy what you see in the book to the R screen. All will be well. I promise.

GUIs are very nice things, incidentally, and R has one that you can download and play with. It is called the R Commander. Like all GUIs, some very basic functionality is included that allows you to, well, point and click and get a result. Problem is, the very second you want to do something different than what is available from the GUI, you are stuck. With statistics, we often want to do something differently, so we will stick with the command line.

2. R binomially

By now, you are eagerly asking yourself:”?Can R help up with those binomial calculations like in the Thanksgiving example?” Let?s type apropos(‘bino’) and see, because, after all, ‘bino’ is something like binomial. The most likely function is called binomial, so let?s type ?binomial and see how it works. Uh oh. Weird words about “family objects” and the function glm(), and that doesn’t sound right. What about one of the functions like dbinom()? Jackpot. We’ll look at these in detail, since it turns out that this structure of four functions is the same for every distribution. The functions are in this table:

 dbinom The probability of density function: given the size, or n, and prop, or p, this calculates the probability that we see x successes; this is equation (11). pbinom The distribution function, which calculates the probability that the number of successes is less than or equal to some a. qbinom This is the “quantile” function, which calculates, given a probability from the distribution function, which value of q it is associated with. This will be made clear with some examples with the normal distribution later. rbinom This generates a “random” binomial number; and since random means unknown, this means it gener- ates a number that is unknown in some sense; we?ll talk about this later.

Let’s go back to the Thanksgiving example, which used a binomial. Moe can calculate, given n = size = 3, p = prob = 0.1,
his probabilities using R:

dbinom(0,3,.1)

which gives the probability of taking nobody along for the ride. The answer is [1] 0.729. The ?[1] in front of the number just means that you are only looking at line number 1. If you asked for dozens of probabilities, for example, R would space them out over several lines. Let?s now calculate the probability of taking just 0, just 1, etc.

dbinom(c(0,1,2,3),3,.1)

where we have “nested” two functions into one: the first is the concatenation function c(), where we have stuck the numbers 0 through 3 together, and which shows you the dbinom() function can calculate more than one probability at a time. What pops out is

[1] 0.729 0.243 0.027 0.001;

that is, the exact values we got above for taking 0 or 1 or 2 etc. along for the ride. Now we can look at the distribution function:

pbinom(c(0,1,2,3),3,.1);

and we get

[1] 0.729 0.972 0.999 1.000.

This is the probability of taking 0 or less, 1 or less, 2 or less, and 3 or less. The last probability very obviously has to be 1, and will always be 1 for any binomial (as long as the last value in the function c(0,1,…,n) equals n).

There turns out to be a shortcut to typing the concatenation function for simple numbers, and here it is:

c(0,1,2,…,n) = 0:n.

So we can rewrite the first function as dbinom(0:3,3,.1) and get the same results.

We can nest functions again and make pretty pictures

plot(dbinom(0:3,3,.1))

And that’s it for any binomial function. Isn’t that simple? (The answer is yes.) The commands never change for any binomial you want to do.

3. R normally

Can R do normal distributions as well? Can it! Let’s type in apropos(‘normal’) and see what we get. A lot of gibberish, that’s what. Where’s the normal distribution? Well, it turns out that computer programmers are a lazy bunch, and they often do not use all the letters of a word to name a function (too much typing). Let’s try apropos(‘norm’) instead (which no matter what should give us at least as many results, right? This is a question of logic, not computers.). Bingo. Among all the rest, we see dnorm and pnorm etc., just like with the biomial. Now type ?dnorm so we can learn about our fun function. Same layout as the binomial; only difference being we need to supply a “mean” and “sd” (the m and s). Sigh. This is an example of R being naughty and misusing the terminology that I earlier forbade: m and s are not a mean and standard deviation. It?s a trap too many fall into. We?ll work with it, but just remember “mean” and “sd” actually imply our parameters m and s.

You will recall from our discussion of normals that we cannot compute a probability of seeing a single number (and if you don’t remember, shame on you: go back and read Chapter 4). The function dnorm does not give you this number, because that probability is always 0; instead, it gives you a “density”, which means little to us. But we can calculate the probability of values being in some interval using the pnorm function. For example, to calculate Pr(x < 10|m = 10, s = 20, EN ), use pnorm(10,10,20) and you should see [1] 0.5. But you already knew that would be the answer before you typed it in, right? (Right?) Let?s try a trickier one: Pr(x < 0|m = 10, s = 20, EN ); type pnorm(0,10,20) and get [1] 0.3085375. So what is this probability: Pr(x > 0|m = 10, s = 20, EN ) (x greater than 0)? Think about it. x can either be less than or greater than 0; the probability it is so is 1. So Pr(x < 0|m = 10, s = 20, EN ) + Pr(x > 0|m = 10, s = 20, EN ) = 1. Thus, Pr(x < 0|m = 10, s = 20, EN ) Pr(x > 0|m = 10, s = 20, EN ) = 1 ? Pr(x < 0|m = 10, s = 20, EN ). We can get that in R by typing 1-pnorm(0,10,20) and you should get [1] 0.6914625, which is 1 ? 0.3085375 as expected. By the way, if you are starting to feel the onset of a freak out, and wonder "Why, O why, can't he give us a point and click way to do this!" Because, dear reader, a point and click way to do this does not exist. Stop worrying so much. You'll get it. What is Pr(15 < x < 18|m = 15, s = 5, EN ) (which might be reasonable numbers for the temperature example)? Any interval splits the data into three parts: the part less than the lower bound (15), the part of the interval itself (15-18), and the part larger than the upper bound (18). We already know how to get Pr(x < 15|m = 15, s = 5, EN ), which is pnorm(15,15,5), and which equals 0.5. We also know how to get Pr(x > 18|m = 15, s = 5, EN ), which is 1-pnorm(18,15,5), and which equals 0.2742531. This means that Pr(x < 15 or x > 18|m = 15, s = 5, EN ), using probability rule number 1, is 0.5 + 0.2742531 = 0.7742531. Finally, 0.7742531 is the probability of not being in the interval, so the probability of being in the interval must be one minus this, or 1 ? 0.7742531 = 0.2257469. A lot of work. We could have jumped right to it by typing

pnorm(18,15,5)-pnorm(15,15,5).

This is the way you write the code to compute the probability of any interval?remembering to input your own m and s of course!

. You don?t need to do this section, because it is somewhat more complicated. Not much, really, but enough that you have to think more about the computer than you do the probability.

Our goal is to plot the picture of a normal density. The function dnorm(x,15,5) will give you the value of the normal density, with an m = 15 and s = 5, for some value of x. To picture the normal, which is a picture of densities for a range of x, we somehow have to specify this range. Unfortunately, there is no way to know in advance which range you want to plot, so getting the exact picture you want takes some work. Here is one way:

x = seq(-4,4,.01)

which gives us a sequence of numbers from -4 to 4 in increments of 0.01. Thus, x = ?4.00, ?3.99, ?3.99, . . . , 4. Calculating the density of each of these values of x is easy:

dnorm(x)

where you will have noticed that I did not type a m or s. Type ?dnorm again. It reads dnorm(x, mean=0, sd=1, log = FALSE). Ignoring the log = FALSE bit, we can see that R supplies helpfully default values of the parameters. They are default, because if you are happy with the values chosen, you do not have to type in your own. In this case, m = 0 and s = 1, which is called a standard normal. Anyway, to get the plot is now easy:

plot(x,dnorm(x),type=?l?)

This means, for every value of x, plot the value of dnorm at that value. I also changed the plot type to a line with type=’l’ , and which makes the graph prettier. Try doing the plot without this argument and see what you get.

May 18, 2008 | 4 Comments

Stats 101: Chapter 4

Chapter 4 is ready to go.

This is where it starts to get weird. The first part of the chapter introduces the standard notation of “random” variables, and then works through a binomial example, which is simple enough.

Then come the so-called normals. However, they are anything but. For probably most people, it will be the first time that they hear about the strange creatures called continuous numbers. It will be more surprising to learn that not all mathematicians like these things or agree with their necessity, particularly in problems like quantifying probability for real observable things.

I use the word “real” in its everyday, English sense of something that is tangible or that exists. This is because mathematicians have co-opted the word “real” to mean “continuous”, which in an infinite amount of cases means “not real” or “not tangible” or even “not observable or computable.” Why use these kinds of numbers? Strange as it might seem, using continuous numbers makes the math work out easier!

Again, what is below is a teaser for the book. The equations and pictures don’t come across well, and neither do the footnotes. For the complete treatment, download the actual Chapter.

Distributions

1. Variables

Recall that random means unknown. Suppose x represents the number of times the Central Michigan University football team wins next year. Nobody knows what this number will be, though we can, of course, guess. Further suppose that the chance that CMU wins any individual game is 2 out of 3, and that (somewhat unrealistically), a win or loss in any one game is irrelevant to the chance that they win or lose any other game. We also know that there will be 12 games. Lastly, suppose that this is all we know. Label this evidence E. That is, we will ignore all information about who the future teams are, what the coach has leaked to the press, how often the band has practiced their pep songs, what students will fail their statistics course and will thus be booted from the team, and so on. What, then, can we say about x?

We know that x can equal 0, or 1, or any number up to 12. It’s unlikely that CMU will loss or win every game, but they?ll prob ably win, say, somewhere around 2/3s, or 6-10, of them. Again, the exact value of x is random, that is, unknown.

Now, if last chapter you weren?t distracted by texting messages about how great this book is, this situation might feel a little familiar. If we instead let x (instead of k?remember these letters are place holders, so whichever one we use does not mat
ter) represent the number of classmates you drive home, where the chance that you take any of them is 10%, we know we can figure out the answer using the binomial formula. Our evidence then was EB . And so it is here, too, when x represents the number of games won. We?ve already seen the binomial formula written in two ways, but yet another (and final) way to write it is this:

x|n, p, EB ? Binomial(n, p).

This (mathematical) sentence reads “Our uncertainty in x, the number of games the football team will win next year, is best represented by the Binomial formula, where we know n, p, and our information is EB .” The “?” symbol has a technical definition: “is distributed as.” So another way to read this sentence is “Our uncertainty in x is distributed as Binomial where we know n, etc.” The “is distributed as” is longhand for “quantified.” Some people leave out the “Our uncertainty in”, which is OK if you remember it is there, but is bad news otherwise. This is because people have a habit of imbuing x itself with some mystical properties, as if “x” itself had a “random” life. Never forget, however, that it is just a placeholder for the statement X = “The team will win x games”, and that this statement may be true or false, and it?s up to us to quantify the probability of it being true.

In classic terms, x is called a “random variable”. To us, who do not need the vague mysticism associated with the word random, x is just an unknown number, though there is little harm in calling it a “variable,” because it can vary over a range of numbers. However, all classical, and even much Bayesian, statistical theory uses the term “random variable”, so we must learn to work with it.

Above, we guessed that the team would win about 6-10 games. Where do these number come from? Obviously, based on the knowledge that the chance of winning any game was 2/3 and there?d be twelve games. But let?s ask more specific questions. What is the probability of winning no games, or X = “The team will win x = 0 games”; that is, what is Pr(x = 0|n, p, EB )? That’s easy: from our binomial formula, this is (see the book) ? 2 in a million. We don’t need to calculate n choose 0 because we know it?s 1; likewise, we don?t need to worry about 0.670^0 because we know that?s 1, too. What is the chance the team wins all its games? Just Pr(x = 12|n, p, EB ). From the binomial, this is (see the book) ? 0.008 (check this). Not very good!

Recall we know that x can take any value from zero to twelve. The most natural question is: what number of games is CMU most likely to win? Well, that’s the value of x that makes (see the book) the largest, i.e. the most probable. This is easy for a computer to do (you’ll learn how next Chapter). It turns out to be 8 games, which has about a one in four chance of happening. We could go on and calculate the rest of the probabilities, for each possible x, just as easily.

What is the most likely number of games the team will win is the most natural question for us, but in pre-computer classical statistics, there turns out to be a different natural question, and this has something to do with creatures called expected values. That term turns out to be a terrible misnomer, because we often do not, and cannot, expect any of the values that the “expected value” calculations give us. The reason expected values are of interest has to do with some mathematics that are not of especial interest here; however, we will have to take a look at them because it is expected of one to do so.