Stats 101: Chapter 6

It was one of those days yesterday. I got two chapters up, but did not give anybody a way to get them! Here it is

These are the last two “basics” Chapters. 6 first, and it is a little thin, so I’ll probably expand it later. It’s sort of a transition between probability where we know everything to statistics where we don’t. And by “everything” I mean the parameters of probability models. I want the reader to build up a little intuition before it starts to get rough.

The most important part of 6 is the homework, which I usually spend a lot of time with in class.

In a couple of days we start the good stuff. Book link.

CHAPTER 6

Normalities & Oddities

1. Standard Normal

Suppose x|m, s, EN ? N(m, s), then there turns out to be a trick that can make x easier to work with, especially if you have to do any calculations by hand (which, nowadays, will be rarely). Let z = (x-m)/s, then z|m, s, EN ? N(0, 1). It works for any m and s. Isn’t that nifty? Lots of fun facts about z can be found in any statistics textbook that weighs over 1 pound (these tidbits are usually in the form of impenetrable tables located in the back of the books).

What makes this useful is that Pr(z > 2|0, 1, EN ) ? Pr(z > 1.96|0, 1, EN ) = 0.025 and Pr(z < ?2|0, 1, EN ) ? Pr(z < ?1.96|0, 1, EN ) = 0.025: or, in words, the probability that z is bigger than 2 or less than negative 2 is about 0.05, which is a magic (I mean real voodoo) value in classical statistics. We already learned how to do this in R, last Chapter. In Chapter 4, a homework question explained the rules of petanque, which is a game more people should play. Suppose the distance the boule lands from the cochonette is x centimeters. We do not know what x will be in advance, and so we (approximately) quantify our uncertainty in it using a normal distribution with parameters m = 0 cm and s = 10 cm. If x > 0 cm it means the boule lands beyond the cochonette, and if x < 0 cm is means the boule lands in front of the cochonette. You are out on the field playing, far from any computer, and the urge comes upon you to discover the probability that x > 30 cm. First thing to do is to calculate z which equals (30cm ? 0cm)/10cm = 3 (the cm cancel). What is Pr(z > 3|0, 1, EN )? No idea; well, some idea. It must be less than 0.025, since we have all memorized that Pr(z > 2|0, 1, EN ) ? 0.025. The larger z is, the more improbable it becomes (right?). Let?s say as a guess 1%. When you get home, you can open R and plug in 1-pnorm(3) and see that the actually probability is 0.1%, so we were off by an order of magnitude (a power of 10), which is a lot, and which proves once again that computers are better at math than we are.

2. Nonstandard Normal

The standard normal example is useful for developing your probabilistic intuition. Since normal distributions are used so often, we will spend some more time thinking about some consequences of using them. Doing this will give you a better feel for how to quantify uncertainty.

Below is a picture of two normal distributions. The one with the solid line has m1 = 0 and s1 = 1; the dashed line has m2 = 0.5 and also s2 = 1. In other words, the two distributions differ only in their central parameter, they have the same variance parameter. Obviously, large values are more likely according to distribution 2, and smaller values are more likely given distribution 1, as a simple consequence of m2 > m1 . However, once we get to values of about x = 4 or so, it doesn?t look like the distributions are that different. (Cue the spooky music.) Or are they?.

Under the main picture are two others. The one on the left is exactly like the main picture, except that it focuses only on the range of x = 3.5 to x = 5. If we blow it up like this, we can see that it is still more likely to see large values of x using distribution 2.

How much more likely? The picture on the right divides the probabilities of seeing x or larger with distribution 2 by distribution 1, and so shows how much more likely it is to see larger values with distribution 2 than 1. For example, pick x = 4. It is about 7.5 times more likely to see an x = 4 or larger with distribution 2. That?s a lot! By the time we get out to x = 5, we are 12 times more likely to see values this large with distribution 2. The point is that even very small changes in the central parameters lead to large differences in the probabilities of “extreme”, values of x.

(see the book)

This next picture again shows two different distributions, this time with m1 = m2 = 0 with s1 = 1 and s1 = 1.1. In other words, both distributions have the same central parameters, but distribution 2 has a variance parameter that is slightly larger. The normal density plots do not look very different, do they? The dashed line, which is still distribution 2, has a peak slightly under distribution 1’s, but the differences looks pretty small.

(see the book)

The bottom panels are the same as before. The one on the left blows up the area where x > 3.5 and x < 5. A big difference still exists. And the ratio of probabilities is still very large. It's not shown, but the plot of the right would be duplicated (or mirrored, actually) if we looked at x > ?5 and x < ?3.5. It is more probable to see extreme events in either direction (positive or negative) using distribution 2. The surprising consequence is that very small changes in either the central parameter or the variance parameter can lead to very large differences at the extremes. Examples of these phenomena are easily found in real life, but my heightened political sensitivity precludes me from publicly pointing any of these out.

3. Intuition

We have learned probability and some formal distributions, but we have not yet moved to statistics. Before we do so, let us try to develop some intuition about the kinds of problems and solutions we will see before getting to technicalities. There are a number of concepts that will be important, but I don?t want to give them a name, because there is no need to memorize jargon, while it is incredibly important that you develop a solid under- standing of uncertainty.

The well-known Uncle Ted Nugent’s chain of Kill ’em and Grill ’em Vension Burger restaurants sell both Coke and Pepsi, and their internal audit shows they sell about an equal amount of each. The busy Times Square branch of the chain has about 5000 customers a day, while the store in tiny Gaylord, Michigan sees only about 100 customers. Which location is more likely to sell, on any given day, at least 2 times more Pepsi than Coke?

A useful technique for solving questions like this is exaggeration. For instance, the question is asking about a difference in location. What differs between those places? Only one thing, the number of customers. One site gets about 5000 people a day, the other only 100. Let?s exaggerate that difference and solve a simpler problem. For example, suppose Times Square still gets 5000 a day, but Gaylord only gets 1 a day. The information is that selling a Coke is roughly equal to the probability of selling a Pepsi. This means that, at Gaylord, to that 1 customer on that day, they will either sell 1 Coke or 1 Pepsi. If they sell a Pepsi, Gaylord has certainly sold more than 2 times as much Pepsi as Coke. The chance of that happening is 50%. What is two times as much Pepsi as Coke at Times Square? A lot more Pepsi, certainly. So it’s far more likely for Gaylord to sell a greater proportion of Pepsi because they see fewer customers. The lesson is that when the “sample size” is small, we are more likely to see extreme events.

What is the length of the first Chinese Emperor Qin Shi Huangdi’s nose? You don’t know? Well, you can make a guess. How likely is it that your guess is correct? Not very likely. Suppose that you decide to ask everybody you know to also guess, and then average all the answers together in an attempt to get a better guess. How likely is it that this averaged-guess is perfectly correct? No more likely. If you haven’t a clue about the nose, and nobody else does either, than averaging ignorance is no better than single ignorance. The lesson is that just because a large group of people agree on an opinion, it is not necessarily more probable that that opinion, or average of opinions, is correct. Uninformed opinion of a large group of people is not necessarily more likely to be correct than the opinion of the lone nut job on the corner. Think about this the next time you hear the results of a poll or survey.

You already posses other probabilistic intuition. For example, suppose, given some evidence E, the probability of A is 0.0000001 (A is something that might be given many opportunities to happen, e.g. winning the lottery). How often will A happen? Right. Not very often. But if you give A a lot of chances to occur, will A eventually happen? It?s very likely to.

Every player in petanque gets to throw three boules. What are the chances that I get all three within 5 cm? This is a compound problem, so let?s break it apart. How do we find out how likely it is to be within 5 cm of the cochonette? Well, that means the boule can be 5 cm in front of the cochonette, right near it, or up to 5cm beyond it. The chance of this happening is Pr(?5cm < x < 5cm|m = 0cm, s = 10cm, EN ). We learned how to calculate the probability of being in an interval last chapter: pnorm(5,0,10)-pnorm(-5,0,10). This equals about 0.38, which is the chance that one boule lands within, or +/- 5 cm, from the cochonette. What is the chance that all of them land that close? Well, that means the first one does and the second one and the third. What probability rule do we use now? The second, which tells us to multiple the probabilities together, which is 0.383 ? 0.14. The important thing to recall, when confronted with problems of this sort: do not panic. Try to break apart the complex problem into bite-size pieces.

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

3 Comments

Mike D.

May 23, 2008, 10:48 pm

Difficult to get away from the math! I like this chapter, though. You are getting there. There is a Law of Large Numbers, but small numbers are almost complete anarchists.
Briggs

May 24, 2008, 4:40 am

Plus, the math is by far the most important thing in applying probability to real-world applications. The math is easy, but understanding is difficult.

Too, the traditional method of concentrating on the math has led many astray and is what accounts for people saying ridiculous things like “the data is normal” or they ask “how can you tell your data is normal?” No data is “normal”. You can only — approximately — quantify your uncertainty in something using a normal distribution.

I’ll come to this more later.

Chapter 8 will probably appear on Tuesday.
Mike D.

May 24, 2008, 9:17 pm

Right now I’m working with those extreme values out in the tails. R is great for that: log pearson3, GEV, L-moments. But there is no getting away from the fact that the data are sparse, and anything can happen out there. Short of extraterrestrials landing in Time Square. Although, many do not rule that out.

CCReed on The Sad End Of RadioJuly 3, 2025
A somewhat different take on the late Z-Man: https://bestservedcole.substack.com/p/the-z-man-story
CorkyAgain on The Sad End Of RadioJuly 3, 2025
I think I stopped listening to radio around the time when Paul Harvey was no longer there to tell us…
The True Nolan on The Sad End Of RadioJuly 3, 2025
@Alex: "The other abomination on the am dial is black talk radio. In Chicongo. Pure raciacist hatred of all things…
Cary Cotterman on The Sad End Of RadioJuly 3, 2025
I've got a beautifully restored 1936 RCA Victor super-heterodyne "Magic-Eye" console radio with standard broadcast (AM), short-wave, and medium-wave bands,…
Cloudbuster on The Sad End Of RadioJuly 3, 2025
"(Note: I was going to recommend Zman for his tight and disciplined podcasts. Alas, this good man died before his…