NEWS: Vicodin info Diprolene Picture of xanax Maximum dosage of phentermine Phentermine from the uk Fast delivery phentermine, Cyclobenzaprine Purchase vicodin Fluorouracil Xanax for dogs Xanax no prescription needed Cyber pharmacy phentermine Buy tramadol online Viagra online store. Norvasc Acyclovir Phentermine guaranteed overnight shipping Zyprexa Free shipping on phentermine diet pills Phentermine without doctor's approval Coreg Cialis lowest price Ambien coupon cr Order phentermine on line Free pack sample viagra Side effects of xanax mylan, Vicodin prescription Purchase tramadol online, Actonel Viagra online Methaqualone Oxycontin xanax bars percasettes and lor tabs Herbal alternatives to viagra Imuran Luvox Dalteparin: What does xanax do Nabumetone: Foscarnet Tramadol 200 mg Cheap generic viagra online Side effects of phentermine Tramadol hcl 50 mg tab Ambien and pregnancy! Viagra on line Phentermine online without a prescription! Prescription tramadol Tolmetin Trimethobenzamide Physical symptoms of high blood pressure and xanax Diet pill addiction phentermine Viagra drug! Phentermine drug interactions Piperacetazine! 2 mg xanax Lozol, Xanax withdrawal effects Cipro, Cialis impotence drug eli lilly co Buy cheap phentermine free fedex Adipex Tramadol 50mg. Phentermine 37.5mg tablets Amiodarone Mixing viagra and cialis Viagra prescription drug. Hydrocodone cod only Buy meridia Mark martin viagra On line doctor phentermine Viagra price comparison Cheap phentermine free shipping Bupropion Medrol Niacin Iothalamate Lowest cost phentermine guarantee free shipping Phentermine no perscription needed Buy viagra Antipyrine. Hydrocodone information Viagra levivia Phentermine without doctor's approval Hydroxyurea, Tripelennamine Phentermine compare prices Norvasc Generic lowest price viagra Phentermine shipped to missouri Soma sale Get online viagra Bacitracin Robaxin Phentermine 37.5 cash on delivery No overnight prescription xanax Oxycontin Doxycycline Methazolamide! Tools needed for injecting xanax Phentermine 30 Cialis comparison levitra viagra Tetracycline? Soma addiction Acquisto cialis Sell viagra Cymbalta Phentermine prescription Generic viagra india Discount generic cialis Viagra versus levivia Xanax side effect Cialis for woman Atrovent Buy hydrocodone overnight What happens when women take viagra Cialis online discount Cheap phentermine online 37 5 Phentermine canada Nevirapine Viagra testimonials Viagra shelf life Viagra overdose Cleocin Phentermine and sibutramine be combined? Epoprostenol Viagra cialis: Genaric viagra Ups cod phentermine? Colesevelam Buy phentermine no prescription Ativan re valium vs vs xanax Online viagra sales Is xanax addictive Kaopectate Phenazopyridine Taking viagra or levitra as a booster for cialis, Avalide Phentermine & health risks Snorting vicodin Buy no phentermine prescription Natural supplement for viagra Buying tramadol online Macular degeneration caused by viagra Tegretol: Cialis viagra levitra Phentermine from canada Xanax withdrawl symptoms Phentermine online cod Hydrocodone prescription Get viagra, Synthroid Viagra alternative for women Ethynodiol Add a link viagra! Secobarbital Buying viagra online uk Difference between viagra and levivia Crestor! Ritonavir Tocainide Cheap vicodin Pyridoxine! Albuterol What does phentermine do to your heart Xanax versus klonopin for chronic anxiety Cheaper viagra levivia cyalis Filling online prescription viagra Trimeprazine Quazepam Iodipamide Cialis compare levitra Cheap phentermine online no prescription Enoxaparin Imipramine Methylergonovine Phentermine prescription Without prescription phentermine Stavudine Cash on delivery shipping of phentermine Cheap viagra pills Cyclophosphamide Trihexyphenidyl Generic name online qoclick tramadol Aricept Viagra conviaindications Viagra alternates Cheap hydrocodone Mesalamine Tramadol hydrochloride overdose Better than viagra Locoid I need to identify pictures of phentermine: Sell viagra online Mixing viagra and cialis: Online cialis Allopurinol Woman taking viagra Levaquin No prescription needed phentermine Keyword tramadol: Phentermine no credit card required Mobic Probenecid Xanax online without prescription Mechlorethamine Dacarbazine Overnight xanax or alprazolam delivery Chlorpromazine Phentermine ingredients Viagra substitutes. Adipex p phentermine vs Dextromethorphan Motrin Spectinomycin Androgel On line phentermine Half price viagra Phendimetrazine, Order viagra visit your doctor online Phentermine fda Drug phentermine 37.5 pdr Nonoxynol Phentermine 37 5mg Viagra female sexual inhancement Phentermine online without a prescription Amsterdam holland viagra Norethynodrel Compare viagra cialis levitra Phentermine from canada Cialis purchase Lorazepam Viagra drug interaction Buying xanax online 100 mg tramadol Cod phentermine Felodipine. Generic cialis uk About xanax Cialis drug for impotence Accolate Long term phentermine use Ethynodiol: Cycloserine Viagra for sale online, Fluoxetine Phentermine info Online tramadol Erythromycin: Diethylpropion Phentermine dangerous Ambien cr Phentermine airborne express+cod Femara Ssri phentermine heart Bush inauguration speech draft viagra bastard of Celebrex Cialis tablets Vitamin Info on meridia Cod phentermine? Niacin Cheap generic viagra substitutes: Overnight phentermine Generic xanax online Ash of soma Mucomyst Epivir Buy get online prescription viagra Link buy online viagra info domain Phentermine hc: Prozac and xanax induced mood disorder Climara Clomid Comparison levivia viagra, Soma sale Hydrochlorothiazide Order viagra online Order cialis online: Adderall Phentermine 37.5 mg sale Hydrochlorothiazide Phentermine next day Podophyllum Viagra experience Hydrocodone withdrawal Buy cheap purchase uk viagra! Effects of phentermine Ritalin Canada viagra Xanax zoloft Crystal meth and xanax Apomorphine Phentermine without a prescription Hydrocodone cough Diet pill phentermine Anagrelide Extra cheap phentermine Online pharmacy phentermine Natural viagra alternative Pyrimethamine! Cheapest cialis Custom hrt phentermine Phentermine risks Viagra alternatives? Phentermine priority mail Buy online salescom viagra? Naprosyn Buying vicodin Nifedipine Thyrotropin Isoxsuprine Dipyridamole. Nizoral Tinzaparin? Aminophylline Fioricet description Hyzaar Is tramadol a narcotic Soma muscle Phentermine discount no prescription? Imiquimod Estrogen Next day phentermine Buy generic viagra Exelon Dantrolene. Estradiol Xanax picture Phentermine interactions Yasmin Viagra levivia Generic sample viagra Crestor Mexican pharmacy viagra: Half life of xanax Fast phentermine: Buy viagra internet Adipexdrug addiction order phentermine online: Atarax Cialis viagra, Cheapest phentermine price Side effects of the drug tramadol: Hydrocodone ap ap Viagra online consultation. Vicodin information Lovenox On line viagra Trifluoperazine! Purchase viagra Fosamax Buy vicodin Abbr href rel title title viagra: Xanax drug testing Actos? Viagra alternatives Viagra like pill. Phentermine by fedex Lowest prices on phentermine Xanax half life Fluconazole? Afrin Trimethobenzamide? Effects phentermine side strong Cortisol Generic cialis Phentermine 37.5 no prescription Filing income tax tramadol Why phentermine. Viagra women Xanax xr 3 mg, Phentermine without rx Viagra substitute Lynestrenol Lexapro and phentermine Ethinamate Phentermine delivered overnight, Cialis review Ceftizoxime How to get xanax Buy cialis uk! Physican's desk reference phentermine Meridia weight loss Per day buy phentermine Phentermine eprescriptions Herbal phentermine does it work 0 buy by popl powered viagra wordpress? Congress viagra Buy generic hydrocodone? Trimetrexate Coreg Phentermine 30mg Phentermine diet medication. 2005 comment december leave viagra Vicodin abuse Online phentermine order Mefloquine Atenolol viagra Ativan xanax Viagra canada prescription Dutasteride Leflunomide Cialis generic? Cilexetil Hydrocodone description Phentermine side effects dangers Adipex cheap phentermine Recreational viagra Mirtazapine. Purchase xanax online Noroxin, Cialis price Per day buy phentermine Cheap phentermine Phentermine np Avandia Discount drug viagra: Butaperazine Pentaerythritol Buy vicodin online Buy cialis generic online Isotretinoin Xanax gg 258 Hyperalimentation No prior perscription tramadol Pfizer xanax Meperidine, Meridia order Phentermine overnight Buy viagra online uk Paxil Bretylium Generic viagra canada Hydrocodone drug test Lithium Cyclamate Cidofovir Buy phentermine Felbamate! Flexeril Adipex loss phentermine weight, Identify xanax Purchase xanax: Fenoldopam Laetrile Cheapest phentermine pill Xenical hgh phentermine quit smoking detox. Differin Phentermine success story Xanax gg 258 Cheapest xanax! Medication drug mylan online search phentermine diet Online pharmacy tramadol! Trovafloxacin Cefoperazone Generic viagra online Xanax prescriptions, Phentermine dosage Viagra online cheap: From generic india viagra Herbal viagra affiliate Buy cheap phentermine Gitalin Cod overnight tramadol Nalbuphine Watson soma Natural alternatives to viagra? Isoetharine Hydrocodone online! Viagra alternates Xanax 2mg generic alprazolam 180 pills: Phentermine 90 day Prozac and phentermine: Mepenzolate Uk viagra suppliers Women viagra Lexapro, Phentermine amide Buy online tramadol Cialis comparison levitra Hyperalimentation! Ethopropazine Arthrotec Lindane Can woman take cialis Xanax detox Tramadol used for Is tramadol a narcotic Haldol: Viagra no prescription Levothyroxine! Xanax withdrawal muscle joint nerve pain Blue xanax Leuprolide Spironolactone Pfizer xanax information Viagra sale Ambien and pregnancy Nolvadex Perscription phentermine Ipratropium Clofazimine Phentermine online prescriptions Fioricet line Loss phentermine story success weight Ketamine Amphetamine Phentermine prescriptions Xanax abuse: Phentermine no prescription Cheap generic viagra: Cheapest cialis generic Xanax alcohol? Phentermine sales Ibuprofen: Ciprofloxacin Asa Buy phentermine without a prescription Viagra commercials Xanax online without a prescription Tramadol active ingredient Phentermine 15 mgs Polythiazide Viagra price compare Paroxetine Tobramycin Pfizer viagra Phentermine depression Meridia side effects Caffeine Tranylcypromine Methadone and xanax Vicodin Altace Combivent Brand drug generic name viagra Canada cialis Alternative viagra Diflucan Guanethidine No perscription tramadol: Betamethasone Phentermine wholesale Avapro Hyzaar Mexican pharmacies online+no precription xanax Diazepam. Piroxicam Viagra sale online Phentermine pharmacys online Generic cialis softtabs Phentermine Clonazepam Abacavir Side effects from viagra Clomipramine Mexican pharmacies online+no precription xanax Cheapest xanax Mexican phentermine Cialis vs viagra Soma san diego: Buy phentermine on line Zithromax Primidone Phentermine cheap no prescription Bad side effects of viagra Phentermine 37.5 adipex 37.5 mg? Thioridazine Glucophage Climara Penicillamine: Amerge C.o.d. Phentermine Cheap viagra order online Generic cialis overnight Diet inexpensive phentermine pill Tramadol hcl Chemical name for viagra Generic viagra overnight, Enalapril Lisinopril with viagra Phentermine weight loss medication Free viagra without a perscription Phenytoin Phentermine buy cheap Delivery florida online pharmacy phentermine Ambien sleep aid Cimetidine Compare viagra cialis levivia Online adipex phentermine prescriptions Cheapest diet phentermine pill Misoprostol 50 hcl mg tramadol. Ciguatoxin Methimazole Buy generic ambien Tramadol 100 mg no prescription Ergotamine Buy phentermine prozac Nasonex Cheap soma online Free overnight phentermine shipping Omeprazole Xanax no prescription Phenylephrine. Viagra without prescription Phendimetrazine versus phentermine! Viagra cialis Cetirizine! Xanax weight loss Discount viagra sales? Viagra Buy xanax Phentermine 37.5 adipex 37.5 mg Viagra online shop Viagra alternative Xanax detoxification! Tramadol withdrawal symptoms Phentermine pill online discount: Herbal alternative to viagra Misoprostol Addiction recovery xanax Phentermine no prescription needed Ambien side effect Pyridium Benztropine Tramadol narcotic:

Archive for May, 2008

Starting to lose you: Stats 101 Chapter 9

Here is the link.

The going started getting tough last Chapter. It doesn’t get any easier here. But stick with it, because once you finish with this Chapter you will see the difference between classical/Bayesian and modern statistics.

Here is the gist:

  1. Start with quantifying your uncertainty in an observable using a probability distribution
  2. The distribution will have parameters which you do not know
  3. Quantify your uncertainty in the parameters using probability
  4. Collect observable data ,which will give you your updated information about the parameters which you still do not know and which still have to be quantified by a probability distribution
  5. Since you do not care about the parameters, and you do care about future observables, you quantify your uncertainty in these future observables given the uncertainty you still have in the parameters (through the information present in the old data).

If you stop at the parameters, step 4, then you are a regular Bayesian, and you will be too certain of yourself.

This Chapter shows you why. The computer code mentioned in the homework won’t be on-line for a week or so. Again, some of you won’t be able to see all Greek characters, and none of the pictures are given. You have to download the chapter. Here is the link.

CHAPTER 9

Estimating and Observables

1. Binomial estimation

In the 2007-2008 season, the Central Michigan football team won 7 out of 12 regular season games. How many games will they win in the 2008-2009 season? In Chapter 4, we learned to quantify the probability in this number using a binomial distribution, but we assumed we knew p, the probability of winning any single game. If we do not know p, we can use the old data from last season to help us make a guess about its value. It helps to think of this old data as a string of wins and losses. So that, for the old x, we saw x1 = 0, x2 = 1, . . . , x12 = 1, which we can summarize by k = i xi , where k = 7 is the total number of wins in n = 12 games.

Here’s the binomial distribution written with an unknown parameter

(see the book)

where θ is the success parameter and k the number of successes we observed out of n chances.

How do we estimate θ? Two ways again, a classical and a modern. The classical consists of picking some function of the observed data and calling it θ, and then forming a confidence interval. In R we can get both at once with this function

binom.test(7,12)

where you will see, among other things (ignore those other things for now),

95 percent confidence interval:
0.2766697 0.8483478
sample estimates:
probability of success
0.5833333

This means that θ = 0.58 = 7/12 so again, the estimate is just the arithmetic mean. The 95% confidence interval is 0.28 to 0.84. Easy. This confidence interval has the same interpretation as the one for the μ, which means you cannot say there is a 95% chance that θ is in this interval. You can only say, “either θ is in this interval or it is not.”

Here is Bayes’s theorem again, written as functions like we did for the normal distribution

(see the book)

We know p(k|n, θ, EB ) (this is the binomial distribution), but we need to specify p(θ|EB ), which describes what we know about the success parameter before we see any data, given only EB (p(k|n, EB ) will pop out using the same mathematics that gave us p(x|EN ) in equation (17)). We know that θ can be any number between 0 and 1: we also know that it cannot be exactly 0 or 1 (see the homework). Since it can be any number between 0 and 1, and we have no a priori knowledge which number is more likely than any other, it may be best to suppose that each possible value is equally likely. This is the flat prior again (1Like before, there are more choices for this prior distribution, but given even a modest sample size, the differences in the distribution of future observables due to them is negligible). Again, technically EB should be modified to contain this information. After we take the data, we can plot p(θ|k, n, EB ) and see the entire uncertainty in θ, or we can pick a “best” value, which is (roughly) θ = 0.58 = 7/12, or we can say that there is a 95% chance that θ is in the (approximate) interval 0.28 to 0.84. I say “roughly” and “approximate” here, because the classical approximation to the exact Bayesian solution isn’t wonderful for the binomial distribution when the sample size is small. The homework will show you how to compute the precise answers using R.

2. Back to observables

In our hot little hands, we now have an estimate of θ which equals about 0.58. Does this answer the question we started with?

That question was How many games will CMU win in the 2008-2009 season? Knowing that θ equals something like 0.58 does not answer this. Knowing that there is a 95% chance that θ is some number between 0.28 to 0.84 also does not answer the question. This question is not about the unobservable parameter θ, but about the future (in the sense of not yet seen) observable data. Now what? This is one of the key sections in this entire book, so take a steady pace here.

Suppose θ was exactly equal to 0.58. Then how many games will CMU win? We obviously don’t know the exact number even if we knew θ, but we could calculate the probability of winning 0 through 12 games using the binomial distribution, just as we did in Chapters 3 and 4. We could even draw the picture of the entire probability distribution given that θ was exactly equal to 0.58. But θ might not be 0.58, right? There is some uncertainty in its value, which is quantified by p(θ|kold , nold , EB ), where now I have put the subscript “old” on the old data values to make it explicit that we are talking about the uncertainty in θ given previously observed data. The parameter might equal, say, 0.08, and it also might equal 0.98, or any other value between 0 and 1. In each of these cases, given that θ exactly equalled these numbers, we could draw a probability distribution for future games won, or knew given nnew = 12 (12 games next season) and given the value of θ.

Let us draw the probability distribution expressing our uncertainty in knew given nnew = 12 (and EB ) for three different possible values of θ.

(see the book)

If θ does equal 0.08, we can see that the most likely number of games next season is 1. But if θ equals 0.58, the most likely number of games won is 7; while if θ equals 0.98, then CMU will most likely win all their games.

This means that the picture on the far left describes our uncertainty in knew if θ = 0.08. What is the probability that θ = 0.08? We can get it from equation (19), from p(θ|kold = 7, nold =12, EB ). The chance of θ = 0.08 is about 1 in 100 million (we’ll learn how the computer does these calculations in the homework). Not very big! This means that we are very very unlikely to have our uncertainty quantified by the picture on the left. What is the chance that θ = 0.98? About 3 in a trillion! Even less likely. How about 0.58? About 3 in 10,000. Still not too likely, but far more likely than either of those other values.

We could go through the same exercise for all the other values that θ could take, each time drawing a picture of the probability distribution of knew . Each one of these would have a certain probability of being the correct probability distribution for the future data, given that its value of θ was the correct value. But since we don’t know the actual value of θ, but we do know the chance that θ takes any value, we can take a weighted sum of these individual probability distributions to produce one overall probability distribution that completely specifies our uncertainty in knew given all the possible values of θ. This will leave us with

(see the book)

Stare at equation (20) for two minutes without blinking. This, in words, is the probability distribution that tells us everything we need to know about future observables knew given that we know there will be nnew chances for success this year, also given that we have seen the past observables kold and nold , and assuming EB is true. Think about this. You do not know what future values of k will be, do you? You do know what the past values are, right? So this is the way to describe your uncertainty in what you do not know given what you do know, taking full account of the uncertainty in θ, which is not of real interest anyway.

The way to get to this equation uses math that is beyond what we can do in this class, but that is unimportant, because the software can handle it for you. This picture shows you what happens. The solid lines are the probability distribution in equation (20). The circles plotted over it are the probability distribution of a regular binomial assuming θ exactly equals 0.58. The key thing to notice is that the circles distribution, which assumes θ ≡ 0.58 is too tight, too certain. It says the center values of 6 to 8 are more certain than is warranted (their probability is higher than the actual distribution). It agrees, coincidentally only, with the probability that the future number of wins will be 5 or 9, but then gives too little probability for wins less than 5 or greater than 9.

The actual distribution of future observable data (20) will always be wider, more diffuse and spread out, less certain, than any distribution with a fixed θ. This means we must account for uncertainty in the parameter. If we do not, we will be too certain. And if all we do is focus on the parameter, using classical or Bayesian estimates, and we do not think about the future observables, we will be far, far more certain than we should be.

3. Even more observables

Let’s return to the petanque example and see if we can do the same thing for the normal distribution that we just did for the binomial. The classical guess of the central parameter was μ = −1.8 cm, which matches the best guess Bayesian estimate. The confidence/credible interval was -6.8 cm to 2.8 cm. In modern statistics, we can say that there is a 95% chance that μ is in this interval. We also have a guess for σ, and a corresponding interval, but I didn’t show it; the software will calculate it. We do have to think about σ as well as μ, however—both parameters are necessary to fully specify the normal distribution.

As in the binomial example, we do not know what the exact value of (μ, σ) is. But we have the posterior probability distribution p(μ, σ|xold , EN ) to help us make a guess. For every particular possible value of (μ, σ), we can draw a picture of the probability distribution for future x given that that particular value is the exact value.

(see the book)

The picture shows the probability densities for xnew for three possible values of (μ, σ). If (μ = −6.8 cm, σ = 4.4 cm), the most likely values of xnew are around 10 cm, with most probability given to values from -20 cm to 0 cm. On the other hand, if (μ = 2.8 cm, σ = 8.4 cm), the most likely values of new x are a little larger than 0 cm, but with most probability for values between -20 cm and 30 cm. If (μ = −1.8 cm, σ = 6.4 cm), future values of x are intermediate of the other two guesses. These three pictures were drawn (using the Advanced code from Chapter 5) assuming that the values of (μ, σ) are the correct ones. Of course, they might be the right values, but we do not know that. Instead, each of these three guesses, and every other possible combination of (μ, σ), has a certain probability, given xold , of being true.

Given the old data, we can calculate the probability that (μ, σ) equals each of these guesses (and equals every other possible combination of values). We can then weight each of the new x distributions according to these probabilities and draw a picture of the distributions of new values given old ones (and the evidence EN ) like we just did for the binomial distribution. This is

(see the book)

Here is a picture of this distribution (generated by the computer, of course)

(see the book)

The solid line is equation (21), and dashed is a normal distribution with (μ = −1.8 cm, σ = 6.4 cm). The two distributions do not look very different, but they certainly are, especially for very large or very small values of xnew . The dashed line is too narrow, giving too much probability for too narrow a range of xnew . In fact, for distribution (21), values greater than 10 cm are from the true distribution are twice as likely as the normal distribution where we plugged in a single guess of (μ, σ); values greater than 20 cm are six times as likely. The same thing is repeated for values less than -10 cm, or less than -20 cm, and so on. Go back and read Chapter 6 to refamiliarize yourself with the fact that very small changes in the central or variance parameter can cause large changes in the probability of extreme numbers.

The point again, like in the binomial example, is that using the plug-in normal distribution, the one where you assume you know the exact value of (μ, σ), leads you to be far more certain than you really should be. You need to take full account of the uncertainty in your guesses of (μ, σ), only then will you be able to full quantify the uncertainty in the future values xnew .

4 comments May 30th, 2008

Stats 101: Chapter 8

Here is the link.

This is where it starts to get complicated, this is where old school statistics and new school start diverging. And I don’t even start the new new school.

Parameters are defined and then heavily deemphasized. Nearly all of old and new school statistics entire purpose is devoted to unobservable parameters. This is very unfortunate, because people go away from a parameter analysis far, far too certain about what is of real interest. Which is to say, observable data. New new school statistics acknowledges this, but not until Chap 9.

Confidence intervals are introduced and fully disparaged. Few people can remember that a confidence interval has no meaning; which is a polite way of saying they are meaningless. In finite samples of data, that is, which are the only samples I know about. The key bit of fun is summarized. You can only make one statement about your confidence interval, i.e. the interval you created using your observed data, and it is this: this interval either contains the true value of the parameter or it does not. Isn’t that exciting?

Some, or all, of the Greek letter below might not show up on your screen. Sorry about that. I haven’t the time to make the blog posting look as pretty as the PDF file. Consider this, as always, a teaser.

For more fun, read the chapter: Here is the link.

CHAPTER 8

Estimating

1. Background

Let’s go back to the petanque example, where we wanted to quantify our uncertainty in the distance x the boule landed from the cochonette. We approximated this using a normal distribution with parameters m = 0 cm and s = 10 cm. With these parameters in hand, we could easily quantify uncertainty in questions like X = “The boule will land at least 17 cm away” with the formula Pr(X|m = 0 cm, s = 10 cm, EN ) = Pr(x > 17 cm|m = 0 cm, s = 10 cm, EN ). R even gave us the number with 1-pnorm(17,0,10) (about 4.5%). But where did the values of m = 0 cm and s = 10 cm come from?

I made them up.

It was easy to compute the probability of statements like X when we knew the probability distribution quantifying its uncertainty and the value of that distribution’s parameters. In the petanque example, this meant knowing that EN was true and also knowing the values of m and s. Here, knowing means just what it says: knowing for certain. But most of the time we do not know EN is true, nor do we know the values of m and s. In this Chapter, we will assume we do in fact know EN is true. We won’t question that assumption until a few Chapters down the road. But, even given EN is true, we still have to discern the values of its parameters somehow.

So how do we learn what these values are? There are some situations where are able to deduce either some or all of the parameter’s values, but these situations are shockingly few in number. Nearly all the time, we are forced to guess. Now, if we do guess—and there is nothing wrong with guessing when you do not know—it should be clear that we will not be certain that the values we guessed are the correct ones. That is to say, we will be uncertain, and when we are uncertain what do we do? We quantify our uncertainty using probability.

At least, that is what we do nowadays. But then-a-days, people did not quantify their uncertainty in the guesses they made. They just made the guesses, said some odd things, and then stopped. We will not stop. We will quantify our uncertainty in the parameters and then go back to what is of main interest, questions like what is the probability that X is true? X is called an observable, in the sense that it is a statement about an observable number x, in this case an actual, measurable distance. We do not care about the parameter values per se. We need to make a guess at them, yes, otherwise we could not get the probability of X. But the fact that a parameter has a particular value is usually not of great interest.

It isn’t of tremendous interest nowadays, but again, then-a-days, it was the only interest. Like I said, people developed a method to guess the parameter values, made the guess, then stopped. This has led people to be far too certain of themselves, because it’s easy to get confused about the values of the parameters and the values of the observables. And when I tell you that then-a-days was only as far away as yesterday, you might start to be concerned.

Nearly all of classical statistics, and most of Bayesian statistics is concerned with parameters. The advantage the latter method has over the former, is that Bayesian statistics acknowledges the uncertainty in the parameters guesses and quantifies that uncertainty using probability. Classical statistics—still the dominate method in use by non-statisticians1—makes some bizarre statements in order to avoid directly mentioning uncertainty. Since classical statistics is ubiquitous, you will have to learn these methods so you can understand the claims people (attempt to) make.

So we start with making guesses about parameters in both the old and new ways. After we finish with that, we will return to reality and talk about observables.

2. Parameters and Observables

Here is the situation: you have never heard of petanque before and do not know a boule from a bowl from a hole in the ground. You know that you have to quantify x, which is some kind of distance. You are assuming that EN is true, and so you know you have to specify m and s before you can make a guess about any value of x.

Before we get too far, let’s set up the problem. When we know the values of the parameters, like we have so far, we write them in Latin letters, like m and s for the Normal, or p for the binomial. We always write unknown and unobservable parameters as Greek letters, usually μ and σ for the normal and θ for the binomial. Here is the normal distribution (density function) written with unknown parameters:

(see the book)

where μ is the central parameter, and σ 2 is the variance parameter, and where the equation is written as a function of the two unknowns, N(μ, σ). This emphasizes that we have a different uncertainty in x for every possible value of μ and σ (it makes no difference if we talk of σ or σ 2 , one is just the square root of the other).

You may have wondered what was meant by that phrase “unobservable parameters” last paragraph (if not, you should have wondered). Here is a key fact that you must always remember: not you, not me, not anybody, can ever measure the value of a parameter (of a probability distribution). They simply cannot be seen. We cannot even see the parameters when we know their values. Parameters do not exist in nature as physical, measurable entities. If you like, you can think of them as guides for helping us understand the uncertainty of observables. We can, for example, observe the distance the boule lands from the cochonette. We cannot, however, observe the m even if we know its value, and we cannot observe μ either. Observables, the reason for creating the probability distributions in the first place, must always be of primary interest for this reason.

So how do we learn about the parameters if we cannot observe them? Usually, we have some past data, past values of x, that we can use to tell us something about that distribution’s parameters. The information we gather about the parameters then tell us something about data we have not yet seen, which is usually future data. For example, suppose we have gathered the results of hundreds, say 200, of past throws of boules. What can we say about this past data? We can calculate the arithmetic mean of it, the median, the various quantiles and so on. We can say this many throws were greater than 20 cm, this many less. We can calculate any function of the observed data we want (means and medians etc. are just functions of the data), and we can make all these calculations never knowing, or even needing to know, what the parameter values are. Let me be clear: we can make just about any statement we want about the past observed data and we never need to know the parameter values! What possible good are they if all we wanted to know was about the past data?

There is only one reason to learn anything about the parameters. This is to make statements about future data (or to make statements about data that we have not yet seen, though that data may be old; we just haven’t seen it yet; say archaeological data; all that matters is that the data is unknown to you; and what does “unknown” mean?). That is it. Take your time to understand this. We have, in hand, a collection of data xold , and we know we can compute any function (mean etc.) we want of it, but we know we will, at some time, see new data xnew (data we have not yet seen), and we want to now say something about this xnew . We want to quantify our uncertainty in xnew , and to do that we need a probability distribution, and a probability distribution needs parameters.

The main point again: we use old data to make statements about data we have not yet seen.

3. Classical guess

We need to find some way to map our evidence E and the past values of x into information about the parameters. There are lots of different ways to guess at parameter values, some easy and some hard, and these all fall into two broad classifications: yes, a classical and a modern.

We have past values of x and we want to know about future, or at least other, unknown values of x. Our evidence is E, which at least means that we know the probability distribution (Normal, say) of the observables. In this book we will also assume that E also means that knowledge of each individual observation is irrelevant to knowing what each other observation with be. We have to find a way to guess, or estimate, these unknown and unobservable parameters given E and the old data xold .

The classical way to do this is to pick an ad hoc function of the old data and label it f (xold ) = μ, where that “hat” indicates that the value of μ is only a guess. Most classical estimates have the goal that the estimate is “unbiased”, or Ex (μ − μ) = Ex (μ − f (xold )) = 0, meaning that the expected distance between the actual value of μ and the guess μ is 0. Sounds like a nice thing to have, unbiasedness, and it surely isn’t a bad idea, but it turns out to cause a lot of problems, most of which I cannot tell you about without introducing a lot of math. However, this criterion is not compelling because of that expected value business. Expected value with respect to what? Well, with respect to an infinite number of future (not yet observed) data x…which is just the data that we are trying to quantify the uncertainty of. Anyway, in R, to estimate the parameters of a normal distribution classically is easy, and you already know how to do it! If x is our old, previously observed data, x1 , x2 , …, xn , then

μ = mean(x) σ = sd(x)

The mean you already know to calculate. It is often written bar(x), and called “x bar”. When you see a data value with a bar over it, you know it is a mean. The observed variance of old data is (see the book), and the observed standard deviation of old data is the square root of that. Look at the formula and notice that the standard deviation is a measure of how far, on average, the old data values are away from the observed mean. The square is taken, (xi − bar(x))^22 , so that data values that were lower than the observed mean are treated the same as data values that were higher. (If you have missing data in x, recall Chapter 7, where we had to modify the function like this mean(x, na.rm=T; same for the sd function).

We’ll never calculate the observed standard deviation by hand. But it’s pretty convenient to have the observed mean stand in for our guess of μ. Unfortunately, because μ = mean, a lot of people have taken to calling μ (without the hat) the mean, which it most assuredly is not. μ is an unobservable parameter, while the mean is just the weighted sum of a bunch of data we have already observed. This is a subject that I’ll return to later.

4. Confidence intervals

OK, it might have been hard to understand all this so far, but it’s about to get weird, so be steady. The value μ we got before was precise; it is a known, observed number (it is the mean). But do we really believe, given the data and other evidence, that the exact, all-time, incorruptible, immutable value of μ is, to as many decimal places as you like, equal to μ? You may have guessed, by the subtle way I’ve asked that question, that the answer is “no.” And you’d be right! Suppose μ = 5.41. Maybe μ is 5.41, but it might also be, say, 5.40, or 5.39, or other values close by, mightn’t it? This is a fancy way to state that we are uncertain what the value of μ is. How do we express this uncertainty? Use probability? No. It is forbidden to use probability to quantify the uncertainty of parameter values in classical statistics.

Instead, classical statisticians use something called a confidence interval, which is an interval on the order of μ ± c(n), where c(n) is some number that usually depends on the number n of your data points and on the old data itself. Bigger c(n) lead to wider intervals; smaller c(n) lead to narrower ones. So you might expect that when you say that “I think μ is 5.41 plus or minus 4″ you have a better chance of being right then when you say “I think μ is 5.41 plus or minus 1″, because the former interval allows you greater scope of covering the actual (unobservable) value of μ. And, classically, you’d be dead wrong.

Which is why confidence intervals are one of the screwiest things to come out of the classical tradition, in that they fail utterly to do what they set out to do. But their use is so ubiquitous not to say iniquitous) that I’m afraid you are going to have to learn to interpret them. And they are one of the most important things you must learn in this book! because you will see confidence intervals everywhere, thus it is imperative you learn what they are and what they are not.

Part of the problem is that you simply cannot learn what a confidence interval is by reading most introductory statistics books. Take, for example, the very typical book Statistics: Informed Decisions Using Data by Sullivan (2007, pp. 448-449), often used in Stats 101 courses. He officially defines a confidence interval for an unknown parameter as “an interval of numbers” (p. 449), which is as pure a tautology as you’re ever likely to meet, and being a tautology, it is therefore, of course, true, but of no help (it says the confidence interval is an interval). But a page earlier, we find Sullivan implying that smaller intervals give us less confidence in the value of the parameter than larger intervals. This implication is, as I said above, false, and is no part of the actual, mathematical definition of a confidence interval.

Maybe something like this is more accurate:

[A] 95% level of confidence…implies that, if 100 different confidence intervals are constructed…we will expect 95 of the intervals to include the parameter and 5 to not include the parameter [p. 449].

Actually, we can expect nothing like this. And though this definition is closer to the truth, it is still false (to find out why, keep reading). Incidentally, classical theory lets you calculate confidence intervals at any level you want, but the only one you ever really see is the 95% interval, so that one is all I will talk about.

Here’s the actual definition. Suppose you gather some data and construct a confidence interval using the formula C1 = {μ ± c(n)} (the actual formula is not of much interest to us; the software will give us the interval automatically). That is, C1 is the interval calculated using the data we collected. Now imagine (incidentally, this is all you can do) that you re-collect your data in exactly the same way, where every physical thing is exactly the same as it was when you collected it the first time. That is, the state of the universe has to be identical to where it was when you first collected your data. Except that it must be “randomly” different, or different in ways that you know nothing about. Very well, you now have a second data set equal in every way to the first, except that it is “randomly” different, whatever that means. You then construct a new confidence interval C2 using the exact same formula on this second set of data (which is also the same size, n). Now do it all again and construct C3 , and again for C4, and again and again an infinite number of times. When you are done, 95% of those intervals will cover the actual value of μ.

(see the book)

This is shown in the picture for the first eight confidence intervals (this is all simulated data). The true value of μ is indicated by the solid line. Some of the intervals “cover”, i.e. contain, the true value of μ, and some do not. More than that, we cannot say. Our confidence interval, the bottom bold one, is the only confidence interval we’ll actually see; the others are hypothesized entities that are conjured into existence if confidence intervals are properly interpreted.

I only showed the first 8 (out of an infinite number of) confidence intervals (that must exist for every problem you ever do). If you only repeat your experiment a finite number of times, and therefore only have a finite number of confidence intervals, say, 1,000,000, then it is false that we expect any number of them will cover the true value of μ: stopping constructing confidence intervals at any finite value invalidates the interpretation that 95% of intervals will cover the actual value of μ.

Yes, this is the actual definition, but saying it this way leaves a bad taste in one’;s mouth, especially because of that bit about “infinite” numbers of repetitions. Statisticians, feeling uneasy about infinities, and their physical impossibility, usually resort to the euphemism “long run” to describe the number of repetitions needed. They know very well that, mathematically, long run equals infinite, but saying “in the long run” gives the comfortable impression that all you need is a lot, and not an infinite number, of repetitions.

By now you are thinking, “OK, I get it. So what? What you’re saying is just a quibble. Who cares about infinities or long runs, anyway. Give me some information I can use! What do you do with your confidence interval, the one you just constructed? What does it mean?”

Nothing. Not a thing. It certainly does not mean that you are 95% sure that your interval contains the actual value of μ. That is, you cannot, under any circumstances, say that “There is a 95% chance that the true value of μ lies in the 95% confidence interval I have constructed.” This statements is a direct probabilistic statement about the interval you have just created. Recall our key rule: it is forbidden in classical statistics to make direct probability statements about unobservable parameters. Memorize this. Your confidence interval only has an interpretation as part of an infinite set of other confidence intervals.

We have just hit upon the dirtiest open secret of classical statistics. There is no interpretation of your confidence interval other than this: the best you can say is that your interval either contains the actual value of μ or it does not, a statement which is a tautology, and, again therefore, true, but of no help (incidentally, Sullivan (2007) finally acknowledges this on p. 500). So what do you do with the interval you have just created? Why even bother, since it has no direct relation to the problem at hand? It’s even worse. Pick any two different numbers, say, 12 and 42. It is a true statement to say that this interval either contains μ or it does not for any statistical problem done by anybody with any data any time whatsoever (make sure you understand that before reading further).

The guy that invented confidence intervals, Dzerzij (Jerzy) Neyman, a statistician, knew about the interpretational problems of confidence intervals, and was concerned. But he was even more concerned about something called inductive arguments. An example due to Stove (1986): All the flames I have observed before have been hot (the premise); therefore, this flame will be hot (the conclusion). Neyman, and many other influential 20th century statisticians, rejected inductive arguments a basis for probability. They felt arguments like these were “groundless” or that inductive arguments were fallible because of the true statement that, for the flames example, there was nothing in the universe guaranteeing that this flame will be hot2. Inductive arguments are needed to make direct probabilistic statements about things like confidence intervals. If you reject them, then you cannot use probability. So Neyman, and those who followed him (which was nearly everybody), tried to take refuge in arguments like this: “Well, you cannot say that there is a 95% chance that the actual value of the parameter is in your interval; but if statisticians everywhere were to use confidence intervals, then in the long run, 95% of their intervals will contain their actual values.” Thirty-two extra credit points to those who can show the obvious flaw in this argument (see the homework).

The flaw in that argument was so obvious that it was evident to Neyman himself. And so, with nowhere else to turn, Neyman recommended a dodge and said this: “The statistician…may be recommended…to state that the value of the parameter μ is within (the just calculated interval)” merely by an act of will (Neyman (1937), quoted in Franklin (2001a)).

What you would like to be able to say is that “I have 95% (or whatever) confidence that this interval covers the true value of μ.” But you can never do this in classical statistics.

In R, to get the confidence interval of a normal distribution classically is a little more work than just getting the estimates, but it isn’t really that hard. This is for the appendicitis data, the White.Blood.Count (don’t forget to read the data in and attach it):

confint(glm(White.Blood.Count∼1))

The function confint calculates 95% confidence intervals. The inside function glm, with that funny argument ∼1, basically says, “The uncertainty in the variable should be quantified by a normal distribution.” Just take my word for it now; we’ll see this function later and this notation will become clear then. Anyway, after you run the command you will see something like this:

2.5 % 97.5 %
9.991874 10.818126

Ignore the word (Intercept), it is actually White.Blood.Count (this is because this function works for any variable name you care to enter). The 2.5 % and 97.5 % are like the quantiles; subtract 2.5 from 97.5 and get the length of the interval, which is 97.5%-2.5% = 95%.

We could use another R function and compute the confidence interval for σ, but it is not of great interest because later, we’ll see how to do all these things more or less automatically. Besides, we want to concentrate on what these intervals mean. If you’ve already forgotten, then go back and read this section from the beginning. One thing that is certain is that confidence intervals say nothing about the observables, the data x. If they say anything, they say something about the unobservable parameters. But what? The interval we computed for white blood count was about [10, 11]. This is an interval about estimated central parameter μ and not about the mean. We know the mean (it is…? find it in R). The confidence interval is an attempt to put a measure of precision on the guess μ. It says nothing about the mean, and nothing about actual values of white blood count. Never forget this.

5. Bayesian way

The idea behind modern statistics is that you quantify any and all uncertainty you have in anything using probability. We’ve already seen how to quantify uncertainty using probability for observables; that is, for actual data. That turns out to be done the same way classically and Bayesianly. This is what we did the first few Chapters, was it not? We wrote down some probability distribution, with known parameters, and made probability statements about observable data. Classical and Bayesian statistics begin to diverge when we start to talk about unknown parameters and how to make guesses about these parameters.

We made guesses classically by specifying some ad hoc function of the data, giving us θ; afterwards, we created a confidence interval for this guess. I stressed, heavily, that this confidence interval is not designed to express any actual uncertainty in θ, because that goes against the classical philosophy: which is that you cannot directly express uncertainty in unobservable parameters using probability.

In Bayesian statistics, you can, and must, express uncertainty in unobservable parameters using probability. How this works might sound complicated, and some of it is, but once you get how it works for, say, normal distributions, you will then know how it works for every other statistics problem in the world. This is not so for classical statistics, where you have to memorize a new set of ad hoc functions for every problem. In this way, Bayesian statistics is a vast simplification; however, before you can reach this simplification plateau, you initially have to climb up a steeper hill than you do classically. However, the good news is that there is only one hill to climb.

Let’s recall the normal probability distribution (density function):

(see the book)

written here as a function of x, or p(x|μ, σ, EN ) (we could have use N() as before; the actual letter does not matter). Do you remember probability rule number 4, or Bayes’s rule? If not, go back and re-read Chapter 2. Pay special attention to equation (6). I’ll wait here until you’re done.

Back? OK, let’s write equation (6) using different letters, so that

(see the book)

becomes

(see the book)

where B is now (μ, σ) and A is x. Remember, (μ, σ) is shorthand for the statement “The value of the central parameter is μ and the value of the variance parameter is σ”, and x is shorthand for the statement X = “The value of the observed data is x.” We already know how to write p(x|μ, σ, EN ) mathematically. Our goal is to discover how to write the left-hand side, which is the probability distribution of (μ, σ) given the data and EN . This quantifies our uncertainty in (μ, σ) given what we learned in the data (and considering the evidence EN ). In order to calculate the left-hand side, we then also need to know p(μ, σ|EN ). We also need p(x|EN ), but once we know p(μ, σ|EN ), it automatically pops out because of some math that need not concern us here.

What is p(μ, σ|EN )? Well, it quantifies our uncertainty in (μ, σ) before seeing any data, that is, it is only conditional on EN . p(μ, σ|EN ) is a probability distribution that you have to specify before you can get to the probability p(μ, σ|x, EN ). It also has an official name, which is the prior, because it’s what you know about (μ, σ) prior to adding in information in the data. Not surprisingly, then, p(μ, σ|x, EN ) is called the posterior, which is the probability distribution expressing everything we know, all our uncertainty, about (μ, σ) after having seen some data x.

How about the value of p(μ, σ|EN )? Well, it’s turns out to be a complicated situation, but the gist of it is that p(μ, σ|EN ) explains the probability of each possible value of (μ, σ), and since we initially know very little of (μ, σ), every possible value of (μ, σ) is more or less equally probable. This situation is called assigning a flat prior, the “flat” describing the shape of the probability distribution picture (i.e., a flat line)3. Once you have the prior, (There is more than one prior that you can use besides this “flat” one, but the differences it makes in the posteriors is minimal). Another problem is that the parameters are usually assumed to be continuous numbers, and if p(x|μ, σ, EN ), you can then calculate the posterior using equation (17). Technically, since we are saying (μ, σ) has a certain probability distribution, this is also information that we should keep note of, but we’ll append this on EN so that it now means “The uncertainty in the observable is quantified by a normal distribution and the prior on the parameters is ‘flat’.” If we need to be careful about this, and sometimes we do (not in this book), we can expand the notation to indicate the exact kind of prior we use.

Now here is another little secret: for very simple situations, the Bayesian results are the same as the classical results! No new calculations have to be learned or done!

After we take some old data, we can calculate our full uncertainty in (μ, σ) by drawing pictures of the probability distributions (we’ll do this later). If we are forced to pick just one “best” value, we would pick the arithmetic mean and standard deviation, exactly like in classical statistics. If we wanted to express our uncertainty a little more fully than just using one number (for each parameter), we could give the best number and an interval, some plus/minus bound on how certain that best value actually matches the true value of (μ, σ). Here is the best part: the confidence interval, which was meaningless before, is this interval, and is now called a credible interval. It has the natural interpretation that there is a 95% chance that the true value of the parameter lies in this interval. Isn’t that wild?

Before you start thinking, “Hey, if the results are the same, why did you go on and on and on about how confidence intervals are meaningless? All you did was to give them a new name! Big deal. You are wasting my time and trying to confuse me.” Hold on a minute, though. The Bayesian results are the same as the classical ones, but only for simple situations. The good news for you is, that in Stats 101, you hardly move beyond these very simple situations. Once you do move into the great statistical beyond, like using Binomial instead of normal distributions, the Bayesian methods really come into their own, and then you cannot you recall the discussion from Chapter 4, you know these can be a problem. We will ignore all these difficulties in this book. assume the classical computations give you the correct answer. I’ll talk about these techniques as we move along.

1I mean those people who were not formally trained in the mathematical subjects of probability and statistics. The vast numbers of people who compute statistics have not had this training beyond, say, a class given in a Psychology department by a professor who himself was not so trained, etc.

2To which you can argue; Ok, if you doubt it, stick your hand into this flame.

Add comment May 27th, 2008

Stats 101: Chapter 7

Update #2. I moronically uploaded a blank document. I have no idea how. It’s all better now.
Update. I idiotically forgot to put a link. Here it is.

Chapter 7 is Reality. This is usually Chapter 1 in most intro stats books. Those other books invariably start students with topics like “measures of central tendency” and “kinds of experiments” etc. Nothing necessarily wrong with any of this, but the student usually has no idea why he should care about “central tendency” in the first place. Why memorize formulas for means and (population or other) standard deviations? What use are these things in understanding how to quantify uncertainty?

So I put these topics off until the reader realizes that understanding uncertainty is paramount. The whole chapter is nuts and bolts about how to read data into R and do some elementary manipulations. Like Chapter 5, it’s not thrilling reading, but necessary. The homework for 7 asks readers to download a set of R functions at http://wmbriggs.com/book/Rcode.R, but it’s not there yet because I’m still polishing the code.

Some of the formatting is off in the Latex source, but I won’t fix that until I’m happy with the final text. No pictures are here; all are in the book.

CHAPTER 7

Reality

1. Kinds of data

Somewhere, sometime, somehow, somebody is going to ask you to create some kind of data set (that time is sooner than you think; see the homework). Here is an example of such a set, written as you might see it in a spreadsheet (a good, free open-source spreadsheet is Open Office, www.openoffice.org):

Q1, …, Sex, Income, Nodules, Ridiculous
rust, …, M, 10, 7 , Y
taupe, …, F, , 3 , N
….
ochre, …, F, 12, 2 , Y

This data is part of a survey asking people their favorite colors (Q1), while recording their sex, annual income, the number of sub-occipital nodules on their brain, and whether or not the interviewee thought the subject ridiculous or not. There is a lot we can learn from this simple fragment.

The first is always use full, readable, English names for the variables. What about Q1, which was indeed the first question on the survey. Why not just call it “Q1″? “Q1″ is a lot easier to type than “favorite color”. Believe me, two weeks after you store this data, you will not, no matter how much you swear you will, remember that Q1 was favorite color. Neither will anybody else. And nobody will be able to guess that Q1 means favorite color.

Can you suggest a better name? How about “favcol”, which has fewer letters than “favorite color”, and therefore easier to type? What are you, lazy? You can’t type a few extra letters to save yourself a lot of grief later on?

How about just “favorite color.” Well, not so good either, because why? Because of that space between “favorite” and “color”; most software cannot handle spaces in names. Alternatives are to put underscore or period between words “favorite color”, or “favorite ̇ color”. Some people like to cram the words together camel style, like “favoriteColor” (the occasional bump of capital letters is supposed to look like a camel: I didn’t name it). Whichever style you choose, be consistent! In any case, nobody will have any trouble understanding that “favoriteColor” means “favorite color”.

Notice, too, that the colors entered under “Q1″ use the full English name for the color. Spaces are OK in the actual data, just not in variable names: for example, “burnt orange” is fine. Do not do what many sad people do and use a code for the colors. For example, 1=taupe, 2=envy green, 3=fuschia, etc. What are you trying to do with a code anyway? Hide your work from Nazi spies? Never use codes.

That goes for variables like “Sex”, too. I cannot tell you how many times I have opened up a data set where I have seen Sex coded as “1″ and “2″, or “0″ and “1″. How can anybody remember which number was which sex? They cannot. And there is no reason too. With data like this, abbreviation is harmless. Nobody, except for the politically correct, will confuse the fact that “M” means male and “F” female. But if you are worried about it, then type out the whole thing.

Similarly for “Ridiculous”, where I have used the abbreviation “Y” for yes and “N” for no. Sometimes a “0″ and “1″ for “N” and “Y” are acceptable. For example, in the data set we’ll use in a moment, “Vomiting” is coded that way. And, after all, 0/1 is the binary no/yes of computer language, so this is OK. But if there is the least chance of ambiguity for a data value, type the whole answer out. Do not be lazy, you will be saving yourself time later.

It should be obvious, but store numbers as numbers. Height, weight, income, age, etc., etc. Do not use any symbols with the numbers. Store a weight as “213″ and not “213 lbs”. If you are worried you will forget that weight is in pounds, name the variable Weight.LBS or something similar.

What if one of your interviewees refused to answer a question? This will often happen for questions like “Income”. How should you code that? Leave his answer blank! For God’s sake, whatever you do, do not think you are being clever and put in some mystery code that, to you, means “missing.” I have seen countless times where somebody thought that putting in a “99″ or a “999″ for a missing income was a good idea. The computer does not know that 999 means “missing”; it thinks it is just what it looks like—the number 999. So when you compute an average income, that 999 becomes part of the average. Also don’t use a period, the full stop. That’s a holdover from an ancient piece of software (that some people are still forced to use).

There are times when an answer is purposely missing, and a blank should not be used. For example, if “Income” is less than 20000, then the interviewee gets an extra question that people who make more than 20000 do not get. Usually, this kind of rule can be handled trivially in the analysis, but if you want to show that somebody should not have answered and not that they did not answer, then use a code such as “PM” for “purposely missing”. Even better would be to write “purposely missing”, so that somebody who is looking at your data three months down the road doesn’t have to expend a great deal of energy on interpreting what “purposely missing” means.

Try to use a real database to store your data, and keep away from spreadsheets if you can. A real database can be coded so that all possible responses for a variable like “Race” are pre-coded, eliminating the chance of typos, which are certain to occur in spreadsheets.

Here’s something you don’t often get from those other textbooks, but which is a great truth. You will spend from 80 to 90% of your time, in any statistical analysis just getting the data into the form readable for you and your software. This may sound like the kind of thing you often hear from teachers, while you think to yourself, “Ho, ho, ho. He has to tell us things like that just to give us something to worry about. But it’s a ridiculous exaggeration. I’ll either (a) spend 10-15% of my time, or (b) have somebody do it for me.” I am here to tell you that the answers to these are (a) there is no known way in the universe for this to be true, and (b) Ha ha ha!

2. Databases

The absolute best thing to do is to store you data in a database. I often use the free and open source MySQL (.com, of course). Knowing how to design, set up, and use such a database is beyond what most people want to do on their own. So most, at least for simple studies, opt for spreadsheets. These can be fine, though they are prone to error, usually typos. For instance, the codings “Y” and “Y ” might look the same to you, but they are different inside a computer: one has a space, one doesn’t. The computer thinks these are as different as “Q” and “W”. This kind of typo is extraordinarily common because you cannot see blank spaces easily on a computer screen. To see if you have suffered from it, after you get your data into R type levels(my variable name) and each of the levels, like “Y” and “Y ” will be displayed. If you see something like this, you’ll have to go back to your spreadsheet and locate the offending entries and correct them.

A lot of overhead is built into spreadsheets. Most of it has to do with prettifying the rows and columns—bold headings, colored backgrounds, and so on. Absolutely none of this does anything for the statistical analysis, so we have to simplify the spreadsheet a bit.

The most common way to do this is to save the spreadsheet as a CSV file. CSV stands for Comma Separated Values. It means exactly what it says. The values from the spreadsheet are saved to an ordinary text file (ASCII file), and each column is separated by a comma. An example from one row from the dataset we’ll be using is

0,0,0,0,39,"black","male","Y",17.1,80,102.4,0

Note the clever insertion of commas between each value.

What this means is that you cannot actually use commas in your data. For example, you cannot store an income value as “10,000″; instead, you should use “10000″. Also note that there is no dollar sign.

Now, in some countries, where the tendrils of modern society have not yet reached, people unfortunately routinely use commas in place of decimal points. Thus, “3.42″ written here is “3,42″ written there. You obviously cannot save the later in a CSV file because the computer will think that comma in “3,42″ is one of the commas that separates the values, which it does not. The way to overcome this without having to change the data is to change the delimiter to something other than a comma; perhaps a semicolon or a pound sign; any kind of symbol which you know won’t be in the regular data. For example, if you used an @ symbol, your CSV file would look like

0@0@0@0@39@"black"@"male"@"Y"@17.1@80@102.4@0

The only trick will be figuring out how to do this. In Open Office, it’s particularly easy: after opening up the spreadsheet and selecting “Save As”, select the box “Edit Filter settings” and choose your own symbol instead of the default comma. A common mistake is to type an entry into, say, an Opinion variable, where a person’s exact words are the answer. Guard against using a comma in these words else the computer will think you have extra variables: the computer thinks there is a variable between each comma.

3. Summaries

It’s finally time to play with real data. This is, in my experience, another panic point. But it need not be. Just take your time and follow each step. It is quite easy.

The first trick is to download the data onto your computer. Go to the book website and download the file appendicitis.csv and save it somewhere on your hard disk in a place where you can remember. The place where it is is called the path. That is, your hard drive has a sort of hierarchy, a map where the files are stored. In you are on a Windows machine, this is usually the C:/ drive (yes, the slash is backwards on purpose, because R thinks like a Linux computer, or Apple, which has the slashes the other way). Create your own directory, say, mydata (do not put a space in the name of the folder), and put the appendicitis file there. So the path to the file is C:/mydata/appendicitis.csv. Easy, right? If you are on a Linux or Mac, it’s the same idea. The path on a Mac is usually something like /Users/YOURNAME/mydata/appendicitis.csv. On a Linux box it might be /home/YOURNAME/mydata/appendicitis.csv. Simple!

Open R. Then type this exact command:

x = read.csv(url("http://wmbriggs.com/book/appendicitis.csv"))

There is a lot going on here, so let’s go through it step by step. Ignore the x = bit for a moment and concentrate on the part that reads read.csv(...). This built-in R function reads a CSV file. Well, what else would you have expected from its name? Inside that function is another one called url(), whose argument is the same thing you type into any web browser. The thing you type is called the URL, the Uniform Resource Locater, or web address. What we are doing is telling R to read a CSV file directly off the web. Pretty neat!

If you had saved the file directly to your hard drive, you would have loaded it like this

x = read.csv("C:/mydata/appendicitis.csv")

where you have to substitute the correct path, but otherwise is just as easy.

The last thing to know is that when the CSV file is read in it is stored in R’s memory in the object I called x. R calls these objects data frames. Why didn’t they call them data sets? I have no idea. How did I know to use an x, why did I choose that name to store my data? No reason at all except habit. You can call the dataset anything you want. Call it mydata if you want. It just doesn’t matter.

Now type just x and hit enter. You’ll see all the data scroll by. Too much to look at, so let’s summarize it:

summary(x)

This is data taken on patients admitted to an emergency room with right lower quadrant pain (in the area the appendix is located) in order to find a model to better predict appendicitis (Birkhahn et al., 2006). Each of the variables was thought to have some bearing on this question. We’ll talk more about this data later. Right now, we’re just playing around. When we run the command we get the summary statistics for each variable in x. What it shows is the mean, which is just the arithmetic average of the data, the median, which is the point at which 50% of the data values are larger and 50% smaller, the 1st Qu., which is the first quartile and is the point at which 25% of the data values are smaller, the 3rd Qu. which is the third quartile and is the point at which 75% of the data values are smaller (and 25% are larger, right?). Also given in the Min. which is the minimum value and Max which is the maximum. Last is NA’s, which are the number, if any, of missing values. These kinds of statistics only show for data coded as numbers, i.e. numerical data. For data that is textual, also called categorical or factorial data, the first few levels of categories are shown with a count of the number of rows (observations) that are in that category.

You will notice that variables like Pregnancy are not categorical, but are numerical, which is why we see the statistics and not a category count. Pregnancy is a 0/1 variable and is technically categorical; however, like I said above, it is obvious that “0″ means “not pregnant”, so there is no ambiguity. The advantage to storing data in this way is that the numerical mean is then the proportion of people having Pregnancy =1 (think about this!).

Let’s just look at the variable Age for now. It turns out we can apply the summary function on individual variables, and not just on data frames. Inside the computer, the variable age is different than Age (why?). So try summary(Age). What happens? You get the error message Error in summary(Age) : object "Age" not found. But it’s certainly there!

You can read lots of different datasets into R at the same time, which is very convenient. I work on a lot of medical datasets and every one of them has the variable Age. How does R know which Age belongs to which dataset? By only recognizing one dataset at a time, through the mechanism of attaching the dataset directly to memory, to R’s internal search path. To attach a dataset, type

attach(x)

Yes, this is painful to remember, but necessary to keep different datasets separate. Anyway, try summary(Age) again (by using the up arrow on your keyboard to recall previously typed commands) and you’ll see it works.

Incidentally, summary is one of those functions that you can always try on anything in R. You can’t break anything, so there is no harm in giving it a go.

4. Plots

The number one, unalterable rule that you must obey when beginning work with a new dataset is always look at the data first! Too many people forget this rule to their ultimate embarrassment.

The summary() function is easy and gives you information on the distribution of your data in text. But it’s usually easier to see what’s going on with pictures. The visual equivalents of summary are boxplot, hist, and table. Let’s do a boxplot first—it’s easy, boxplot(Age).

The y-axis are the values of Age. The center line on the boxplot is the median, the outer edges of the box are the first and third quartile, and the far ends of the lines are the 5% and 95% quantiles, defined in just the same way as the other quartiles. Boxplots will often also stick dots beyond the far ends for numbers that exceed that 99% quantile and numbers that are less than the 1% quantile.

Next up is hist(Age), that tries to do exactly the same thing as boxplot, which is to give you a visual summary of the range and likelihood of various data values.

You can’t do a boxplot on data like Race, because that variable is categorical. Instead, do a table by table(Race) to get a count of each category. This is OK, but just gives the counts when frequently you want the frequencies. To get that, you have to make a table of the table (yes, this is a pain): prop.table(table(Race)).

plot is another one of those commands, like summary, that you can always try on anything. It never hurts and you can’t break anything.

I originally included these plots in the book so you could see them, but I decided against doing this to guard against your laziness in the homework. Do these commands yourself!

5. Extra: Advanced topics

Temperature is one of the variables. You can try the summary command on it and it works just fine. Sometimes you only want the mean and don’t need all the other business, so you can use the function mean(Temperature). Try it and you get [1] NA. What gives? Do a summary(Temperature) and you’ll see that there are 7 missing values. The function mean is too stupid to give you a mean in the presence of missing values. In a way, this is a good thing, because it forces you to recall that you have an incomplete dataset, and that should give you pause. Why are the values missing? It could be important. You can get around the missing values by typing mean(Temperature, na.rm=T), which says take the mean, and remove (rm) the missing (na) values. The =T means TRUE (you could also type the whole word out as TRUE; use capitals). The mean will then be computed. R is wonderful, but sometimes the way it handles missing values is a pain in the ass.

A back-of-the envelope drawing that you can make by hand is called a stem-and-leaf plot: it does not require you to first sort your data, but you do have to discover the minimum and maximum values. In R it is stem(x).

Histograms and boxplots are very old, were wonderful in their day, and in some cases (discrete data) are just the thing, but we can do better with numbers that more are approximated as continuous (see Chapter 4), like Age. For those, use a density estimate, which is, in a sense, an automated superior histogram. To do this in R type plot(density(Age)).

You can assign the output of any function to a new variable, created by you. So, if you want to store the table for Race, type fit = table(Race), where I chose the name fit for no good reason. All the table results are now in fit. To see it, just type fit. This makes getting proportions easier because you can now prop.table(fit). You could also plot(table(Sex)) or plot(prop.table(table(Sex))) or any categorical variable; try plot(fit).

Also try plot(x) or pairs(x, panel=panel.smooth) and see what happens.

9 comments May 22nd, 2008

Stats 101: Chapter 6

It was one of those days yesterday. I got two chapters up, but did not give anybody a way to get them! Here it is

These are the last two “basics” Chapters. 6 first, and it is a little thin, so I’ll probably expand it later. It’s sort of a transition between probability where we know everything to statistics where we don’t. And by “everything” I mean the parameters of probability models. I want the reader to build up a little intuition before it starts to get rough.

The most important part of 6 is the homework, which I usually spend a lot of time with in class.

In a couple of days we start the good stuff. Book link.

CHAPTER 6

Normalities & Oddities

1. Standard Normal

Suppose x|m, s, EN ∼ N(m, s), then there turns out to be a trick that can make x easier to work with, especially if you have to do any calculations by hand (which, nowadays, will be rarely). Let z = (x-m)/s, then z|m, s, EN ∼ N(0, 1). It works for any m and s. Isn’t that nifty? Lots of fun facts about z can be found in any statistics textbook that weighs over 1 pound (these tidbits are usually in the form of impenetrable tables located in the back of the books).

What makes this useful is that Pr(z > 2|0, 1, EN ) ≈ Pr(z > 1.96|0, 1, EN ) = 0.025 and Pr(z < −2|0, 1, EN ) ≈ Pr(z < −1.96|0, 1, EN ) = 0.025: or, in words, the probability that z is bigger than 2 or less than negative 2 is about 0.05, which is a magic (I mean real voodoo) value in classical statistics. We already learned how to do this in R, last Chapter.

In Chapter 4, a homework question explained the rules of petanque, which is a game more people should play. Suppose the distance the boule lands from the cochonette is x centimeters. We do not know what x will be in advance, and so we (approximately) quantify our uncertainty in it using a normal distribution with parameters m = 0 cm and s = 10 cm. If x > 0 cm it means the boule lands beyond the cochonette, and if x < 0 cm is means the boule lands in front of the cochonette. You are out on the field playing, far from any computer, and the urge comes upon you to discover the probability that x > 30 cm. First thing to do is to calculate z which equals (30cm − 0cm)/10cm = 3 (the cm cancel). What is Pr(z > 3|0, 1, EN )? No idea; well, some idea. It must be less than 0.025, since we have all memorized that Pr(z > 2|0, 1, EN ) ≈ 0.025. The larger z is, the more improbable it becomes (right?). Let’s say as a guess 1%. When you get home, you can open R and plug in 1-pnorm(3) and see that the actually probability is 0.1%, so we were off by an order of magnitude (a power of 10), which is a lot, and which proves once again that computers are better at math than we are.

2. Nonstandard Normal

The standard normal example is useful for developing your probabilistic intuition. Since normal distributions are used so often, we will spend some more time thinking about some consequences of using them. Doing this will give you a better feel for how to quantify uncertainty.

Below is a picture of two normal distributions. The one with the solid line has m1 = 0 and s1 = 1; the dashed line has m2 = 0.5 and also s2 = 1. In other words, the two distributions differ only in their central parameter, they have the same variance parameter. Obviously, large values are more likely according to distribution 2, and smaller values are more likely given distribution 1, as a simple consequence of m2 > m1 . However, once we get to values of about x = 4 or so, it doesn’t look like the distributions are that different. (Cue the spooky music.) Or are they?.

Under the main picture are two others. The one on the left is exactly like the main picture, except that it focuses only on the range of x = 3.5 to x = 5. If we blow it up like this, we can see that it is still more likely to see large values of x using distribution 2.

How much more likely? The picture on the right divides the probabilities of seeing x or larger with distribution 2 by distribution 1, and so shows how much more likely it is to see larger values with distribution 2 than 1. For example, pick x = 4. It is about 7.5 times more likely to see an x = 4 or larger with distribution 2. That’s a lot! By the time we get out to x = 5, we are 12 times more likely to see values this large with distribution 2. The point is that even very small changes in the central parameters lead to large differences in the probabilities of “extreme”, values of x.

(see the book)

This next picture again shows two different distributions, this time with m1 = m2 = 0 with s1 = 1 and s1 = 1.1. In other words, both distributions have the same central parameters, but distribution 2 has a variance parameter that is slightly larger. The normal density plots do not look very different, do they? The dashed line, which is still distribution 2, has a peak slightly under distribution 1’s, but the differences looks pretty small.

(see the book)

The bottom panels are the same as before. The one on the left blows up the area where x > 3.5 and x < 5. A big difference still exists. And the ratio of probabilities is still very large. It's not shown, but the plot of the right would be duplicated (or mirrored, actually) if we looked at x > −5 and x < −3.5. It is more probable to see extreme events in either direction (positive or negative) using distribution 2.

The surprising consequence is that very small changes in either the central parameter or the variance parameter can lead to very large differences at the extremes. Examples of these phenomena are easily found in real life, but my heightened political sensitivity precludes me from publicly pointing any of these out.

3. Intuition

We have learned probability and some formal distributions, but we have not yet moved to statistics. Before we do so, let us try to develop some intuition about the kinds of problems and solutions we will see before getting to technicalities. There are a number of concepts that will be important, but I don’t want to give them a name, because there is no need to memorize jargon, while it is incredibly important that you develop a solid under- standing of uncertainty.

The well-known Uncle Ted Nugent’s chain of Kill ‘em and Grill ‘em Vension Burger restaurants sell both Coke and Pepsi, and their internal audit shows they sell about an equal amount of each. The busy Times Square branch of the chain has about 5000 customers a day, while the store in tiny Gaylord, Michigan sees only about 100 customers. Which location is more likely to sell, on any given day, at least 2 times more Pepsi than Coke?

A useful technique for solving questions like this is exaggeration. For instance, the question is asking about a difference in location. What differs between those places? Only one thing, the number of customers. One site gets about 5000 people a day, the other only 100. Let’s exaggerate that difference and solve a simpler problem. For example, suppose Times Square still gets 5000 a day, but Gaylord only gets 1 a day. The information is that selling a Coke is roughly equal to the probability of selling a Pepsi. This means that, at Gaylord, to that 1 customer on that day, they will either sell 1 Coke or 1 Pepsi. If they sell a Pepsi, Gaylord has certainly sold more than 2 times as much Pepsi as Coke. The chance of that happening is 50%. What is two times as much Pepsi as Coke at Times Square? A lot more Pepsi, certainly. So it’s far more likely for Gaylord to sell a greater proportion of Pepsi because they see fewer customers. The lesson is that when the “sample size” is small, we are more likely to see extreme events.

What is the length of the first Chinese Emperor Qin Shi Huangdi’s nose? You don’t know? Well, you can make a guess. How likely is it that your guess is correct? Not very likely. Suppose that you decide to ask everybody you know to also guess, and then average all the answers together in an attempt to get a better guess. How likely is it that this averaged-guess is perfectly correct? No more likely. If you haven’t a clue about the nose, and nobody else does either, than averaging ignorance is no better than single ignorance. The lesson is that just because a large group of people agree on an opinion, it is not necessarily more probable that that opinion, or average of opinions, is correct. Uninformed opinion of a large group of people is not necessarily more likely to be correct than the opinion of the lone nut job on the corner. Think about this the next time you hear the results of a poll or survey.

You already posses other probabilistic intuition. For example, suppose, given some evidence E, the probability of A is 0.0000001 (A is something that might be given many opportunities to happen, e.g. winning the lottery). How often will A happen? Right. Not very often. But if you give A a lot of chances to occur, will A eventually happen? It’s very likely to.

Every player in petanque gets to throw three boules. What are the chances that I get all three within 5 cm? This is a compound problem, so let’s break it apart. How do we find out how likely it is to be within 5 cm of the cochonette? Well, that means the boule can be 5 cm in front of the cochonette, right near it, or up to 5cm beyond it. The chance of this happening is Pr(−5cm < x < 5cm|m = 0cm, s = 10cm, EN ). We learned how to calculate the probability of being in an interval last chapter:

pnorm(5,0,10)-pnorm(-5,0,10).

This equals about 0.38, which is the chance that one boule lands within, or +/- 5 cm, from the cochonette. What is the chance that all of them land that close? Well, that means the first one does and the second one and the third. What probability rule do we use now? The second, which tells us to multiple the probabilities together, which is 0.383 ≈ 0.14. The important thing to recall, when confronted with problems of this sort: do not panic. Try to break apart the complex problem into bite-size pieces.

3 comments May 22nd, 2008

Comments restored

Thanks to a hot tip from Lucia, over at the Diet Diary, I have become wiser about spam. I installed the wp-spamfree plug-in and we’ll see how that works.

OLD “I have been getting an enormous amount of spam over the past week (1000s of postings a day; all caught by the spam filter), so I am shutting off comments for 24 hours in the hope this will get me off some spam lists. Sorry for the inconvenience. “

Add comment May 22nd, 2008

Stats 101: Chapter 5

Update: 21 May 4:45 am. I forgot to actually upload the file until right this moment. Thanks to Mike and Harry for the reminder.

Chapter 5 is ready to go.

This is purely a mechanical chapter, introducing R. Thrilling reading, it is not. But it’s necessary to learn in order to be able to carry out the analysis in later chapters. The book website is not fully up; only the datasets are there. To learn to install R, just look on the R website.

I’ll be posting Chapters 6 and 7 in short order and then we finally get to the good stuff.

R

1. R

R is a fantastic, hugely supported, rapidly growing, infinitely extensible, operating-system agnostic, free and open source statistical software platform. Nearly everybody who is anybody uses R, and since I want you to be somebody, you will use it, too. Some things in R are incredibly easy to do; other tasks are bizarrely difficult. Most of what makes R hard for the beginner is the same stuff that makes any piece of software hard; that is, getting used to expressing your statistical desires in computerese. As such an environment can be strange and perplexing at first, some students experience a kind of peculiar stress that is best described by example. Here is a video from a Germany showing a young statistics student who experienced trouble understanding R:

http://youtube.com/watch?v=PbcctWbC8Q0

Be sure that this doesn’t happen to you. Remember what Douglas Adams said: Don’t panic.

The best way to start is by going to r-project.org and click the CRAN under the Download heading. You can’t miss it. After that, you have to choose a mirror, which means one of the hundreds of computers around the world that host the software. Obviously, pick a site near you. Once that’s done, and choose your platform (your operating system, like Linux or one of the others), and then choose the base package. Step-by-step instructions are at this book’s website: wmbriggs.com/book. It is no more difficult to install than any other piece of software.

This is not the place to go over all the possibilities of R; just the briefest introduction will be given, because there are far better places available online (see the book website for links). But there are a few essential commands that you should not do without.

These are

Command Description
help(command) Does the obvious: always scroll down to
the bottom of the help to see examples of
the command.
?command Same as help()
apropos(’string’) If you cannot remember the name of a
command—and I always forget—but re-
member is started with co–something,
then just type apropos(’co’) and you’ll
get a complete list of commands that have
co anywhere in their names.
c() This is the concatenation function: typing
c(1,2) concatenates a 2 to 1, or sticks on
the end 1 the number 2, so that we have
a vector of numbers.

The Appendix gives a fuller list of R commands.

It is important to understand that R is a command-line language, which we may interpret as meaning that all commands in R are functions which must be typed into the console. These are objects that are a command name plus a left and right parenthesis, with variables (called arguments) stuck in between, thus: plot(x,y). Remember that you are dealing with computers, which are literal, intolerant creatures (much like the people who want to ban smoking), and so cannot abide even the slightest deviation from its expectations. That means, if instead of plot(x,y), you type lot(x,y), or plot x,y), or plot(,y), or plot(x,y things will go awry. R will try to give you an idea of what went wrong by giving you an error message. Except in cases like that last typo, which will cause you to develop stress lines, because all you’ll see is this

+

and every attempt of yours to type anything new, or hit enter 100 times, will not do a thing except give you more lines of + or other screwy errors. Because why? Because you typed plot(x,y; that is, you typed a left parenthesis (right before the x) and you never “closed” it with a right parenthesis, and R will simply wait forever for you to type one in.

The solution is to enter a right parenthesis, or hit

ctrl+c

which means the control key plus the c key simultaneously, which “breaks” the current computation.

Using R means that you have to memorize (!) and type in commands instead of using a graphical user interface (GUI), which is the standard point-and-click screen with which you are probably familiar. It is my experience that students who are not used to computers start freaking out at this point; however, there is no need to. I have made everything very, very easy and all you have to do is copy what you see in the book to the R screen. All will be well. I promise.

GUIs are very nice things, incidentally, and R has one that you can download and play with. It is called the R Commander. Like all GUIs, some very basic functionality is included that allows you to, well, point and click and get a result. Problem is, the very second you want to do something different than what is available from the GUI, you are stuck. With statistics, we often want to do something differently, so we will stick with the command line.

2. R binomially

By now, you are eagerly asking yourself:”“Can R help up with those binomial calculations like in the Thanksgiving example?” Let’s type apropos(’bino’) and see, because, after all, ‘bino’ is something like binomial. The most likely function is called binomial, so let’s type ?binomial and see how it works. Uh oh. Weird words about “family objects” and the function glm(), and that doesn’t sound right. What about one of the functions like dbinom()? Jackpot. We’ll look at these in detail, since it turns out that this structure of four functions is the same for every distribution. The functions are in this table:

dbinom The probability of density function: given the size,
or n, and prop, or p, this calculates the probability
that we see x successes; this is equation (11).
pbinom The distribution function, which calculates the probability that the number of successes is less than or
equal to some a.
qbinom This is the “quantile” function, which calculates,
given a probability from the distribution function,
which value of q it is associated with. This will be
made clear with some examples with the normal distribution later.
rbinom This generates a “random” binomial number; and
since random means unknown, this means it gener-
ates a number that is unknown in some sense; we’ll
talk about this later.

Let’s go back to the Thanksgiving example, which used a binomial. Moe can calculate, given n = size = 3, p = prob = 0.1,
his probabilities using R:

dbinom(0,3,.1)

which gives the probability of taking nobody along for the ride. The answer is [1] 0.729. The “[1] in front of the number just means that you are only looking at line number 1. If you asked for dozens of probabilities, for example, R would s