NEWS: Vicodin info Diprolene Picture of xanax Maximum dosage of phentermine Phentermine from the uk Fast delivery phentermine, Cyclobenzaprine Purchase vicodin Fluorouracil Xanax for dogs Xanax no prescription needed Cyber pharmacy phentermine Buy tramadol online Viagra online store. Norvasc Acyclovir Phentermine guaranteed overnight shipping Zyprexa Free shipping on phentermine diet pills Phentermine without doctor's approval Coreg Cialis lowest price Ambien coupon cr Order phentermine on line Free pack sample viagra Side effects of xanax mylan, Vicodin prescription Purchase tramadol online, Actonel Viagra online Methaqualone Oxycontin xanax bars percasettes and lor tabs Herbal alternatives to viagra Imuran Luvox Dalteparin: What does xanax do Nabumetone: Foscarnet Tramadol 200 mg Cheap generic viagra online Side effects of phentermine Tramadol hcl 50 mg tab Ambien and pregnancy! Viagra on line Phentermine online without a prescription! Prescription tramadol Tolmetin Trimethobenzamide Physical symptoms of high blood pressure and xanax Diet pill addiction phentermine Viagra drug! Phentermine drug interactions Piperacetazine! 2 mg xanax Lozol, Xanax withdrawal effects Cipro, Cialis impotence drug eli lilly co Buy cheap phentermine free fedex Adipex Tramadol 50mg. Phentermine 37.5mg tablets Amiodarone Mixing viagra and cialis Viagra prescription drug. Hydrocodone cod only Buy meridia Mark martin viagra On line doctor phentermine Viagra price comparison Cheap phentermine free shipping Bupropion Medrol Niacin Iothalamate Lowest cost phentermine guarantee free shipping Phentermine no perscription needed Buy viagra Antipyrine. Hydrocodone information Viagra levivia Phentermine without doctor's approval Hydroxyurea, Tripelennamine Phentermine compare prices Norvasc Generic lowest price viagra Phentermine shipped to missouri Soma sale Get online viagra Bacitracin Robaxin Phentermine 37.5 cash on delivery No overnight prescription xanax Oxycontin Doxycycline Methazolamide! Tools needed for injecting xanax Phentermine 30 Cialis comparison levitra viagra Tetracycline? Soma addiction Acquisto cialis Sell viagra Cymbalta Phentermine prescription Generic viagra india Discount generic cialis Viagra versus levivia Xanax side effect Cialis for woman Atrovent Buy hydrocodone overnight What happens when women take viagra Cialis online discount Cheap phentermine online 37 5 Phentermine canada Nevirapine Viagra testimonials Viagra shelf life Viagra overdose Cleocin Phentermine and sibutramine be combined? Epoprostenol Viagra cialis: Genaric viagra Ups cod phentermine? Colesevelam Buy phentermine no prescription Ativan re valium vs vs xanax Online viagra sales Is xanax addictive Kaopectate Phenazopyridine Taking viagra or levitra as a booster for cialis, Avalide Phentermine & health risks Snorting vicodin Buy no phentermine prescription Natural supplement for viagra Buying tramadol online Macular degeneration caused by viagra Tegretol: Cialis viagra levitra Phentermine from canada Xanax withdrawl symptoms Phentermine online cod Hydrocodone prescription Get viagra, Synthroid Viagra alternative for women Ethynodiol Add a link viagra! Secobarbital Buying viagra online uk Difference between viagra and levivia Crestor! Ritonavir Tocainide Cheap vicodin Pyridoxine! Albuterol What does phentermine do to your heart Xanax versus klonopin for chronic anxiety Cheaper viagra levivia cyalis Filling online prescription viagra Trimeprazine Quazepam Iodipamide Cialis compare levitra Cheap phentermine online no prescription Enoxaparin Imipramine Methylergonovine Phentermine prescription Without prescription phentermine Stavudine Cash on delivery shipping of phentermine Cheap viagra pills Cyclophosphamide Trihexyphenidyl Generic name online qoclick tramadol Aricept Viagra conviaindications Viagra alternates Cheap hydrocodone Mesalamine Tramadol hydrochloride overdose Better than viagra Locoid I need to identify pictures of phentermine: Sell viagra online Mixing viagra and cialis: Online cialis Allopurinol Woman taking viagra Levaquin No prescription needed phentermine Keyword tramadol: Phentermine no credit card required Mobic Probenecid Xanax online without prescription Mechlorethamine Dacarbazine Overnight xanax or alprazolam delivery Chlorpromazine Phentermine ingredients Viagra substitutes. Adipex p phentermine vs Dextromethorphan Motrin Spectinomycin Androgel On line phentermine Half price viagra Phendimetrazine, Order viagra visit your doctor online Phentermine fda Drug phentermine 37.5 pdr Nonoxynol Phentermine 37 5mg Viagra female sexual inhancement Phentermine online without a prescription Amsterdam holland viagra Norethynodrel Compare viagra cialis levitra Phentermine from canada Cialis purchase Lorazepam Viagra drug interaction Buying xanax online 100 mg tramadol Cod phentermine Felodipine. Generic cialis uk About xanax Cialis drug for impotence Accolate Long term phentermine use Ethynodiol: Cycloserine Viagra for sale online, Fluoxetine Phentermine info Online tramadol Erythromycin: Diethylpropion Phentermine dangerous Ambien cr Phentermine airborne express+cod Femara Ssri phentermine heart Bush inauguration speech draft viagra bastard of Celebrex Cialis tablets Vitamin Info on meridia Cod phentermine? Niacin Cheap generic viagra substitutes: Overnight phentermine Generic xanax online Ash of soma Mucomyst Epivir Buy get online prescription viagra Link buy online viagra info domain Phentermine hc: Prozac and xanax induced mood disorder Climara Clomid Comparison levivia viagra, Soma sale Hydrochlorothiazide Order viagra online Order cialis online: Adderall Phentermine 37.5 mg sale Hydrochlorothiazide Phentermine next day Podophyllum Viagra experience Hydrocodone withdrawal Buy cheap purchase uk viagra! Effects of phentermine Ritalin Canada viagra Xanax zoloft Crystal meth and xanax Apomorphine Phentermine without a prescription Hydrocodone cough Diet pill phentermine Anagrelide Extra cheap phentermine Online pharmacy phentermine Natural viagra alternative Pyrimethamine! Cheapest cialis Custom hrt phentermine Phentermine risks Viagra alternatives? Phentermine priority mail Buy online salescom viagra? Naprosyn Buying vicodin Nifedipine Thyrotropin Isoxsuprine Dipyridamole. Nizoral Tinzaparin? Aminophylline Fioricet description Hyzaar Is tramadol a narcotic Soma muscle Phentermine discount no prescription? Imiquimod Estrogen Next day phentermine Buy generic viagra Exelon Dantrolene. Estradiol Xanax picture Phentermine interactions Yasmin Viagra levivia Generic sample viagra Crestor Mexican pharmacy viagra: Half life of xanax Fast phentermine: Buy viagra internet Adipexdrug addiction order phentermine online: Atarax Cialis viagra, Cheapest phentermine price Side effects of the drug tramadol: Hydrocodone ap ap Viagra online consultation. Vicodin information Lovenox On line viagra Trifluoperazine! Purchase viagra Fosamax Buy vicodin Abbr href rel title title viagra: Xanax drug testing Actos? Viagra alternatives Viagra like pill. Phentermine by fedex Lowest prices on phentermine Xanax half life Fluconazole? Afrin Trimethobenzamide? Effects phentermine side strong Cortisol Generic cialis Phentermine 37.5 no prescription Filing income tax tramadol Why phentermine. Viagra women Xanax xr 3 mg, Phentermine without rx Viagra substitute Lynestrenol Lexapro and phentermine Ethinamate Phentermine delivered overnight, Cialis review Ceftizoxime How to get xanax Buy cialis uk! Physican's desk reference phentermine Meridia weight loss Per day buy phentermine Phentermine eprescriptions Herbal phentermine does it work 0 buy by popl powered viagra wordpress? Congress viagra Buy generic hydrocodone? Trimetrexate Coreg Phentermine 30mg Phentermine diet medication. 2005 comment december leave viagra Vicodin abuse Online phentermine order Mefloquine Atenolol viagra Ativan xanax Viagra canada prescription Dutasteride Leflunomide Cialis generic? Cilexetil Hydrocodone description Phentermine side effects dangers Adipex cheap phentermine Recreational viagra Mirtazapine. Purchase xanax online Noroxin, Cialis price Per day buy phentermine Cheap phentermine Phentermine np Avandia Discount drug viagra: Butaperazine Pentaerythritol Buy vicodin online Buy cialis generic online Isotretinoin Xanax gg 258 Hyperalimentation No prior perscription tramadol Pfizer xanax Meperidine, Meridia order Phentermine overnight Buy viagra online uk Paxil Bretylium Generic viagra canada Hydrocodone drug test Lithium Cyclamate Cidofovir Buy phentermine Felbamate! Flexeril Adipex loss phentermine weight, Identify xanax Purchase xanax: Fenoldopam Laetrile Cheapest phentermine pill Xenical hgh phentermine quit smoking detox. Differin Phentermine success story Xanax gg 258 Cheapest xanax! Medication drug mylan online search phentermine diet Online pharmacy tramadol! Trovafloxacin Cefoperazone Generic viagra online Xanax prescriptions, Phentermine dosage Viagra online cheap: From generic india viagra Herbal viagra affiliate Buy cheap phentermine Gitalin Cod overnight tramadol Nalbuphine Watson soma Natural alternatives to viagra? Isoetharine Hydrocodone online! Viagra alternates Xanax 2mg generic alprazolam 180 pills: Phentermine 90 day Prozac and phentermine: Mepenzolate Uk viagra suppliers Women viagra Lexapro, Phentermine amide Buy online tramadol Cialis comparison levitra Hyperalimentation! Ethopropazine Arthrotec Lindane Can woman take cialis Xanax detox Tramadol used for Is tramadol a narcotic Haldol: Viagra no prescription Levothyroxine! Xanax withdrawal muscle joint nerve pain Blue xanax Leuprolide Spironolactone Pfizer xanax information Viagra sale Ambien and pregnancy Nolvadex Perscription phentermine Ipratropium Clofazimine Phentermine online prescriptions Fioricet line Loss phentermine story success weight Ketamine Amphetamine Phentermine prescriptions Xanax abuse: Phentermine no prescription Cheap generic viagra: Cheapest cialis generic Xanax alcohol? Phentermine sales Ibuprofen: Ciprofloxacin Asa Buy phentermine without a prescription Viagra commercials Xanax online without a prescription Tramadol active ingredient Phentermine 15 mgs Polythiazide Viagra price compare Paroxetine Tobramycin Pfizer viagra Phentermine depression Meridia side effects Caffeine Tranylcypromine Methadone and xanax Vicodin Altace Combivent Brand drug generic name viagra Canada cialis Alternative viagra Diflucan Guanethidine No perscription tramadol: Betamethasone Phentermine wholesale Avapro Hyzaar Mexican pharmacies online+no precription xanax Diazepam. Piroxicam Viagra sale online Phentermine pharmacys online Generic cialis softtabs Phentermine Clonazepam Abacavir Side effects from viagra Clomipramine Mexican pharmacies online+no precription xanax Cheapest xanax Mexican phentermine Cialis vs viagra Soma san diego: Buy phentermine on line Zithromax Primidone Phentermine cheap no prescription Bad side effects of viagra Phentermine 37.5 adipex 37.5 mg? Thioridazine Glucophage Climara Penicillamine: Amerge C.o.d. Phentermine Cheap viagra order online Generic cialis overnight Diet inexpensive phentermine pill Tramadol hcl Chemical name for viagra Generic viagra overnight, Enalapril Lisinopril with viagra Phentermine weight loss medication Free viagra without a perscription Phenytoin Phentermine buy cheap Delivery florida online pharmacy phentermine Ambien sleep aid Cimetidine Compare viagra cialis levivia Online adipex phentermine prescriptions Cheapest diet phentermine pill Misoprostol 50 hcl mg tramadol. Ciguatoxin Methimazole Buy generic ambien Tramadol 100 mg no prescription Ergotamine Buy phentermine prozac Nasonex Cheap soma online Free overnight phentermine shipping Omeprazole Xanax no prescription Phenylephrine. Viagra without prescription Phendimetrazine versus phentermine! Viagra cialis Cetirizine! Xanax weight loss Discount viagra sales? Viagra Buy xanax Phentermine 37.5 adipex 37.5 mg Viagra online shop Viagra alternative Xanax detoxification! Tramadol withdrawal symptoms Phentermine pill online discount: Herbal alternative to viagra Misoprostol Addiction recovery xanax Phentermine no prescription needed Ambien side effect Pyridium Benztropine Tramadol narcotic:

Stats 101: Chapter 3

Three is ready to go.

I should re-emphasize one of the goals of this book. It is meant to be for that large host of unfortunates who are forced—I mean required—to take a statistics course and, importantly, do not want to. This is why a lot of formulas and methods do not make their traditional appearance. Understanding—and not rote—is paramount.

The material is enough to cover in one typical semester. The student will not learn how to handle many different kinds of data, but he damn well will comprehend what somebody is saying when they make a probability statement about data.

Face it. The vast majority of students who sit through statistics classes never again compute their own regression models, factor analyses, etc., etc. But they often read these kinds of results prepared by others. I want them, as their eyes meet a p-value, say to themselves, “Aha! Here is one of those p-value things Stats 101 warned me about! Sure enough, it is being misused yet again. I don’t know the right answer in this study, but I do know what is being claimed is too certain.”

If I can do that, then I will be a happy man.

(The contents of Chapter 3 now follow. If you use Firefox > version 2.0, then you will be able to see all the characters on your screen. Else some of the content may be a little screwy. I apologize for this. If you can’t read everything below, consider this a tease for the real thing. You can always download the chapter and print it out.)

How to Count

1. One, two, three…

Youtube.com has a video at this URL

http://www.youtube.com/watch?v=wcCw9RHI5mc

The important part is that “v=wcCw9RHI5mc” business at the end, which essentially means “this is video number wcCw9RHI5mc“. This video is, of course, different than number wcCw9RHI5md, and number wcCw9RHI5me and so on. We can notice that the video number contains 11 different slots (count them), each of which is filled with a number or upper or lower case Latin letter, which means the number is case sensitive; A differs from a. The question is, how many different videos can Youtube host given this numbering scheme? Are they going to run out of numbers anytime soon?

That problem is hard, so we’ll start on a simpler one. Suppose the video numbering scheme only allowed one slot, and that this slot could only contain a single-digit number, chosen from 0-9. Then how many videos could they host? They’d have v=0, v=1 and so on. Ten, right? Now how about if they allowed two slots chosen from 0-9. Just 10 for the first, and 10 for each of the 10 of the first, a confusing way of saying 10 × 10. For three slots it’s 10 × 10 × 10. But you already knew how to do this kind of counting, didn’t you?

Suppose the single slot is allowed only to be the lower case letters a,…,z? This is v=a, v=b, etc. How many in two such slots? Just 26 × 26 = 676. Which is the same way we got 100 in two slots of the numbers 0-9.

So if allow any number, plus any lower or upper case letter in any slot, we have 10 + 26 + 26 = 62 different possibilities per slot. That means that with 11 slots we have 62 × 62 · · · × 62 = 6211 ≈ 5 × 1019 , or 50 billion billion different videos that Youtube can host.

2. Arrangements

How many ways are there of arranging things? In 1977, George Thorogood remade that classic John Lee Hooker song, “One Bourbon, One Scotch, and One Beer.” This is because George is, of course, the spiritous counterpart of an oenophile; that is, he is a connoisseur of fine spirits and regularly participates in tastings. Further, George, who is way past 21, is not an idiot and never binge drinks, which is about the most moronic of activities that a person could engage in. He very much wants to arrange his coming week, where he will taste, each night, one bourbon (B) , one scotch (S), and one beer (R). But he wants to be sure that the order he tastes these drinks doesn’t influence his personal ratings. So each night he will sip them in a different order. How many different nights will this take him? Write out what will happen: Night 1, BSR; night 2, BRS; night 3, SBR; night 4, SRB; night 5, RBS; night 6, RSB. Six nights! Luckily, this still leaves Sunday free for contemplation.

Later, George decides to broaden his tasting horizons by adding Vernors (the tasty ginger ale aged in oak barrels that can’t be bought in New York City) to his line up. How many nights does it take him to taste things in different order now? We could count by listing each combination, but there’s an easier way. If you have n items and you want to know how many different ways they could be grouped or ordered, the general formula is:

n! = n × (n − 1) × (n − 2) × · · · × 2 × 1

The term on the left, n!, reads “n factorial.” With 4 beverages, this is 4 × 3 × 2 × 1 = 24 nights, which is over three weeks! Good thing that George is dedicated.

3. Being choosy

It’s the day before Thanksgiving and you are at school, packing your car for the drive home. You would have left a day earlier, but you didn’t want to miss your favorite class—statistics. It turns out that you have three friends who you know need a ride: Larry, Curly, and Moe. Lately, they have been acting like a bunch of stooges, so you decide to tell them that your car is just too full to bring them along. The question is, how many different ways can you arrange your friends to drive home with you when you plan to bring none of them? This is not a trick question; the answer is as easy as you think. Only one way—that is, with you driving alone.

But, they are your friends, and you love them, so you decide to take just one. Now how many ways can you arrange your friends so that you take just one? Since you can take Larry, Curly, or Moe, and only one, then it’s obviously three different ways, just by taking only Larry, or only Curly, or only Moe. What if you decide to take two, then how many ways? That’s trickier. You might be tempted to think that, given there are 3 of them, that the answer is 3! = 6, but that’s not quite right. Write out a list of the groupings: you can take Larry & Curly, Larry & Moe, or Moe & Curly. That’s three possibilities. The grouping “Curly & Larry,” for example, is just the same as the grouping “Larry & Curly.” That is, the order of your friends doesn’t matter: this is why the answer is 3 instead of 6. Finally, all these calculations have made you so happy that you soften your heart and decide totake all three. How many different groupings taking all of them are possible? Right. Only one.

You won’t be surprised to learn that there is a formula to cover situations like this. If you have n friends and you want to count the number of possible groupings of k of them when the order does not matter, then the formula is

(see the book)

The term on the left is read “n choose k”. By definition (via some fascinating mathematics) 0! = 1. Here are all the answers for the Thanksgiving problem:

(see the book)

There are some helpful facts about this combinatorial function that are useful to know. The first is that n choose 0 always equals 1. This means, out of n things, you take none; or it means there is only one way to arrange no things, namely no arrangement at all. n choose n is also always 1, regardless of what n equals. It means, out of n things, you take all. n choose 1 always equals n, and so does n choose n−1 : these are the number of ways of choosing just 1 or just n − 1 things. As long as n > 2, n > n , which makes sense, because you can make more groups of 2 than of 1.

4. Counting meets probability: The Binomial distribution

We started the Thanksgiving problem by considering it from your point of view. Now we take Larry, Moe, and Curly’s perspective, who are waiting in their dorm room for your call. They don’t yet know whether which, or if any of them, will get a ride with you. Because they do not know, they want to quantify their uncertainty and they do so using probability. We are now entering a different realm, where counting meets probability. Take your time here, because the steps we follow will the same in every probability problem we ever do.

Moe, reminiscent, recalls an incident wherein he was obliged to poke you in the eyes, and guesses that, since you were somewhat irked at the time, the probability that you take any one of the gang along is only 10%. That is, it is his judgment that the probability that you take him, Moe, is 10%, which is the same as you would also (independently) take Curly and so on. So the boys want to figure out the probability that you take none of them, take one of them, take two of them, or take all three of them.

Start with taking all three. We want the probability that you take Larry and Moe and Curly, where the probability of taking each is 10%. Remember probability rule #2? Those “ands” become “times”: so the probability of taking all three is 0.1 × 0.1 × 0.1 = 0.001, or 1 in a 1000. Keep in mind: this is from their perspective, not yours. This is their guess of the chances; because you may already have made up your mind—but they don’t know that.

What about taking none of them? This is the chance that you do not take Larry and you do not take Moe, and you do not take Curly. The key word is still “and;” which makes the probability (1 − 0.1) × (1 − 0.1) × (1 − 0.1) = 0.93 ≈ 0.73, since the probability of not taking Larry etc. is one minus the probability of taking him etc. It is, too, because you can either take Larry or not; these are the only two things that can happen, so the probability of taking Larry or not must be 1. We can write this using our notation: let A = “Take Larry”, then AF = “Don’t take him”. Then Pr(A ∪ AF |E) = Pr(A|E) + Pr(AF |E) = 1, using probability rule #1. So if Pr(A|E) = 0.1, then Pr(AF |E) = 1−Pr(A|E) = 0.9. In this case, E is the information dictated by Moe (who is the leader), which lead him to say Pr(A|E) = 0.1.

How about taking just one? Well, you can take Larry, not take Moe, and not take Curly, and the chance of that is (using rules #1 and #2 together) 0.1 × (1 − 0.1) × (1 − 0.1) ≈ 0.08; but you could just as easily have taken Moe and not Larry, or Curly and not Larry, and the chance you do either of these is just the same as you taking Larry and not the other two. For shorthand, write M as “Take M” and so on, and MF as not take M and so on. Thus you could “LMF CF or LF MCF or LF MF C.” Using probability rule #1, we break up this statement into three pieces (”LMF CF “), and then use probability rule #2 on each piece (”ands” turn to times), then add the whole thing up.

You could do all that, but there is an easier way. You could notice there are three different ways to take just one—which we remember from our choosing formula, eq. (10). This makes the probability 3 0.08 = 3 × 0.08 = 0.24. Since we already know the probability of taking one of those combinations, we just multiply it by the number of times we see it. We could have also written the answer like this:

0.11 x (1 − 0.1)^2 = 0.24.

And we could also written the first situation (taking all of them) in the same way

0.13 x (1 − 0.1)^0 = 0.001.

where you must remember that a^0 = 1 (for any a you will come across).

You see the pattern by now. This means we have another formula to add to our collection. This one is called the binomial and it looks like this:

(see book)

There is a subtle shift in notation with this formula, made to conform with tradition. “k” is shorthand for the statement, in this instance, K = “You take k people.” For general situations, k is the number of “successes”: or, K = “The number of successes is k”. Everything to the right of the “|” is still information that we know. So n is shorthand for N = “There are n possibilities for success”, or in your case, N = “There are three brothers which could be taken.” The p means, P = “The probability of success is p”. We already know EB , written here with a subscript to remind us we are in a binomial situation. This new notation can be damn convenient because, naturally, most of the time statisticians are working with numbers, and the small letters mean “substitute a number here,” and if statisticians are infamous for their lack of personality, at least we have plenty of numbers. This notation can cause grief, too. Just how that is so must wait until later.

Don’t forget this: in order for us to be able to use a binomial distribution to describe our uncertainty, we need three things. (1) The definition of a success: in the Thanksgiving example, a success was a person getting a ride. (2) The probability of a success is always the same. (3) The number of chances for successes is fixed.

1 comment May 16th, 2008

Stats 101: Chapter 2

Chapter 2 is now ready for downloading—it can be found at this link.

This chapter is all about basic probability, with an emphasis on understanding and not on mechanics. Because of this, many details are eliminated which are usually found in standard books. If you already know combinatorial probability (taught in every introductory class), you will probably worry your favorite distribution is missing (”What, no Poisson? No negative binomial? No This One or That One?”). I leave these out for good reason.

In the whole book, I only teach two distributions, the binomial and the normal. I hammer home how these are used to quantify uncertainty in observable statements. Once people firmly understand these principles, they will be able to understand other distributions when they meet them.

Besides, the biggest problem I have found is that people, while they may be able to memorize half a dozen distributions or formulas, do not understand the true purpose of probability distributions. There is also no good reason to do calculations by hand now that computers are ubiquitous.

Comments are welcome. The homework section (like in every other chapter) is unfinished. I will be adding more homework as time goes on, especially after I discover what areas are still confusing to people.

Once again, the book chapter can be downloaded here.

5 comments May 12th, 2008

The Sean Bell shooting and probability

Yesterday, there were several protests in New York City. The participants were “outraged” over the recent acquittal of two black cops and one Lebanese cop who shot and killed Sean Bell, who was black.

Much was made about the fact that the three cops shot at Bell’s car 50 times. This number was touted repeatedly by some as evidence that the cops had used excessive force.

Let’s look at this from the probabilistic viewpoint. It turns out that when a cop fires his weapon at a person, he only hits his target about 30% of the time. Anybody who has ever fired a weapon before, especially in an altercation, will know that this is a pretty good rate, but of course not good enough to guarantee that just one shot will be enough to stop a target.

So about how many times must a cop fire so that he is at least 99.9% sure of hitting his target?

Well, if he fired just once, he has a 30% of hitting, or a 70% chance of missing. If he fired twice, what is the chance of hitting at least once? Hitting at least once can happen in three ways: hitting with the first bullet and missing with the second; missing with the first and hitting with the second; or hitting with both. The only other possibility is missing on both. The probability of all these scenarios is 1 (something has to happen). So the chance of hitting at least once is 1 minus the chance of missing both. Or 1 - (0.7)(0.7) = 1 - 0.49 = 0.51.

This means that only firing two shots gives the officer a 50/50 chance of hitting his target. Not very good odds. He must fire more times to increase them.

It turns out that the same formula can be used for any number of shots. The probability of hitting at least once in three shots is 1 - (0.7)^3 = 1 - 0.34 = 0.66. The probability of hitting at least once in n shots is then 1 - (0.7)^n.

We want 1 - (0.7)^n to be at least 0.999. Or, written mathematically, 1 - (0.7)^n > 0.999. Now we have to recall high school algebra and solve for n. Subtract 1 from both sides and cancel the negative signs, which gives (0.7)^n > 0.001.

Now the hard part. If you don’t remember, just take my word for it, but now we use logarithms. So that we get n log(0.7) > log(0.001), or n > log(0.001)/log(0.7) = 20 (rounding to the nearest shot).

That’s right. In order for the cop to be pretty sure of hitting his target (and therefore ensuring his target does not hit him), a copy has to shoot at least 20 times.

Thus, given that three cops were firing, 50 total shots does not seem that unusal.

Note: one cop shot 31 times, on 11, and the other 8. Of course, the above analysis ignores all external evidence, such as how the probability of hitting decreases when aiming at a moving target, awareness by one cop of shots fired by another, whether the cops were well motivated, etc.

16 comments May 8th, 2008

Stats 101: Chapter 1

UPDATE: If you downloaded the chapter before 6 am on 4 May, please download another copy. An older version contained fonts that were not available on all computers, causing it to look like random gibberish when opened. It now just looks like gibberish

I’ve been laying aside a lot of other work, and instead finishing some books I’ve started. The most important one is (working title only) Stats 601, a professional explanation of logical probability and statistics (I mean the modifier to apply to both fields). But nearly as useful will be Stats 101, the same sort of book, but designed for a (guided or self-taught) introductory course in modern probability and statistics.

I’m about 60% of the way through 101, but no chapter except the first is ready for public viewing. I’m not saying Chapter 1 is done, but it is mostly done.

I’d post the whole thing, but it’s not easy to do so because of the equations. Those of you who use Linux will know of latex2html, which is a fine enough utility, but since it turns all equations into images, documents don’t always end up looking especially beautiful or easy to work with.

So below is a tiny excerpt, with all of Chapter 1 available at this link. All questions, suggestions for clarifications, or queries about the homework questions are welcome.

Logic

1. Certainty & Uncertainty

There are some things we know with certainty. These things
are true or false given some evidence or just because they are
obviously true or false. There are many more things about which
we are uncertain. These things are more or less probable given
some evidence. And there are still more things of which nobody
can ever quantify the uncertainty. These things are nonsensical or
paradoxical.

First I want to prove to you there are things that are true,
but which cannot be proved to be true, and which are true based
on no evidence. Suppose some statement A is true (A might be
shorthand for “I am a citizen of Planet Earth”; writing just ‘A’ is
easier than writing the entire statement; the statement is every-
thing between the quotation marks). Also suppose some statement
B is true (B might be “Some people are frightfully boring”). Then
this statement: “A and B are true”, is true, right? But also true is
the statement “B and A are true”. We were allowed to reverse the
letters A and B and the joint statement stayed true. Why? Why
doesn’t switching make the new statement false? Nobody knows.
It is just assumed that switching the letters is valid and does not
change the truth of the statement. The operation of switching
does not change the truth of statements like this, but nobody will
ever be able to prove or explain why switching has this property.
If you like, you can say we take it on faith.

That there are certain statements which are assumed true
based on no evidence will not be surprising to you if you have
ever studied mathematics. The basis of all mathematics rests on
beliefs which are assumed to be true but cannot be proved to
be true. These beliefs are called axioms. Axioms are the base;
theorems, lemmas, and proofs are the bricks which build upon
the base using rules (like the switching statements rule) that are
also assumed true. The axioms and basic rules cannot, and can
never, be proved to be true. Another way to say this is, “We hold
these truths to be self-evident.”

Here is one of the axioms of arithmetic: For all natural
numbers x and y, if x = y, then y = x. Obviously true, right? It is just
like our switching statements rule above. There is no way to prove
this axiom is valid. From this axiom and a couple of others, plus
acceptance of some manipulation rules, all of mathematics arises.
There are other axioms—two, actually—that define probability.
Here, due to Cox (1961), is one of those axioms: The probability
of a statement on given evidence determines the probability of its
contradictory on the same evidence. I’ll explain these terms as we
go.

It is the job of logic, probability, and statistics to quantify
the amount of certainty any given statement has. An example
of a statement which might interest us: “This new drug improves
memory in Alzheimer patients by at least ten percent.” How prob-
able is it that that statement is true given some specific evidence,
perhaps in the form of a clinical trial? Another statement: “This
stock will increase in price by at least two dollars within the next
thirty days.” Another: “Marketing campaign B will result in more
sales than campaign A.” In order to specify how probable these
statements are, we need evidence, which usually comes in the form
of data. Manipulating data to provide coherent evidence is why
we need statistics.

Manipulating data, while extremely important, is in some
sense only mechanical. We must always keep in mind that our
goal is to make sense of the world and to quantify the uncertainty
we have in given problems. So we will hold off on playing with data
for several chapters until we understand exactly what probability
really means.

2. Logic

We start with simple logic. Here is a classical logical argument,
slightly reworked:

All statistics books are boring.

Stats 101 is a statistics book.

_______________________________________________
Therefore, Stats 101 is boring.

The structure of this argument can be broken down as follows.
The two statements above the horizontal line are called premises;
they are our evidence for the statement below the line, which is
the conclusion. We can use the words “premises” and “evidence”
interchangeably. We want to know the probability that the conclusion
is true given these two premises. Given the evidence listed,
it is 1 (probability is a number between, and including, 0 and 1).
The conclusion is true given these premises. Another way to say
this is the conclusion is entailed by the premises (or evidence).

You are no doubt tempted to say that the probability of the
conclusion is not 1, that is, that the conclusion is not certain,
because, you say to yourself, statistics is nothing if not fun. But
that would be missing the point. You are not free to add to the
evidence (premises) given. You must assess the probability of the
conclusion given only the evidence provided.

This argument is important because it shows you that there
are things we can know to be true given certain evidence. Another
way to say this, which is commonly used in statistics, is that the
conclusion is true conditional on certain evidence.

(To read the rest, Chapter 1 is available at this link.)

23 comments May 3rd, 2008

Hitting or Pitching. Which wins more games?

By Tim Murray and William Briggs

You obviously need to score runs to win baseball games, and buying better hitters does this for a team. But you also need to keep your opponent from scoring too many runs, and buying better pitchers does this. Good, error-free, fielding, all other things being equal, will also help a team keep the runs scored against it low. Most teams cannot afford to buy both the best batters and the best hurlers, so they have to make decisions.

You’re the newly appointed manager for your favorite team. The roster is nearly made out, and you find you have money for one more player. You can buy a hitter to improve your team’s overall batting average (BA) or you can acquire a pitcher to lower your team’s earned run average (ERA). What do you do?

We decided to try and answer this question by looking at the complete data from the 2001 to the 2007 seasons for all teams in Major League Baseball. For each team, the number of regular season Wins, batting average, earned run average, number of errors, which league American or National, and total payroll were collected. We also counted the total runs scored for and allowed for each team, but since these statistics were so closely connected with batting average and earned run average, we don’t consider them further.

Payroll is obviously used to buy what teams consider, but as fans know to their grief do not always work out to be, the best players. If winning more games was simply a matter of increasing the payroll, the New York Yankees would win every World Series. Thankfully, then, money isn’t everything.

But it is something. This picture shows the payroll by the number of wins, with each team receiving its own color (since this is for seven years, each team appears seven times on this, and all other, plots). The team to the far right in blue are the Yankees, far exceeding any other team in money spent. The club next to them in red are the Boston Red Sox. There is a huge difference in the amount of money spent between teams. The 2006 Florida Marlins spent the least at about $15 million but won a respectable 78 games. They were followed closely by Tampa Bay, which in 2000 spent about $20 million, only rising to $24 million by 2007. Their wins were steady at around 66.

wins by payroll

A horizontal line has been drawn in at 90 games to show that there is still an enormous range of team payrolls for clubs winning at least this impressive number of games. For example, the 2001 Oakland A’s spent only about $34 million to capture 102 games. They increased the payroll a mere $6 million the next year and won 103 games. Oakland, as documented in the book Moneyball by Michael Lewis, didn’t really drop much below 90 games until last year, winning only 76 games while spending the most they ever had, nearly $80 million.

While spending a lot does not guarantee winning the most games in any year, it does help. The Yankees, for example, never dropped below 94 games (in 2007). Boston was the second biggest spender, and it has helped them win at least 82 games a year. However, most teams cannot spend nearly as much these two. Other teams must be grateful that money isn’t everything.

This second picture explains why money can’t necessarily buy happiness. Each of the three predictive statistics, BA, ERA, and Errors, is plotted against Payroll. A statistical (”nonparametric”) regression line is drawn on each to give a rough, semi-qualitative idea of the relationship of the variables. The signals go in the expected direction: larger payrolls mean, on average, higher BAs, lower ERAs, and lower numbers of Errors. But none of the signals are very strong.

wins by BA, ERA, and Errors

To explain what we mean by that, pick any level of payroll, say $100 million. Then look at the scatter around that number (the points below and above the solid line). With BA, the scatter is just about as wide as the range of team batting averages in the data, which are .240 to .292. The same is true for both ERAs and Errors. Still, there is a general weak trend: spending more money does, very crudely, buy you a better team.

But not much better. For example, if you wanted to spend enough to be 90% sure of upping your team’s batting average 5 points (from the median of .268 to .273), you’d have to shell out an extra $50 million (this is after controlling for League, Errors, and team ERA). That’s a huge increase in team salaries. Even worse, the players you buy would have to have extraordinarily high batting averages to bring the entire team’s average 5 points higher. It’s the same story for ERA and Errors. The point being, is that predicting what players will do, paying more money for those you consider better, and their actual performance after you buy them is not just a tricky business, but an almost impossible one.

This still doesn’t answer what is better, in the sense of predicting more wins: hitting or pitching. Take a look at this picture:

BA, ERA, and Errors frequency by League

This shows fancy, souped-up, “histograms” (called density estimates) for the frequency of BA, ERA, and Errors by League. Higher areas on the graph, like a regular histogram, mean that number is more likely. For example, the most likely value of ERA for teams in the National League is just over 4.0.

It’s clear from these pictures that the American League teams have on average higher ERAs and BAs than do clubs in the National League. Obviously, the designated hitter rule for the American League accounts for most, if not all, of this difference. There doesn’t seem to be any real differences in Errors between the two Leagues, which makes sense. The League differences between ERA and BA have to be accounted for when answering our main question.

This next series of pictures shows there is even more complexity. The first is a plot, separated by League, of each teams’ BA by ERA. There is some weak evidence that as ERA increases, BA drops, especially in the American (A) League, perhaps another remnant of the designated hitter effect. But this isn’t a very strong indicator.

BA by ERA by League

This next pictures shows some stronger relationships. The top two panels, again separate by League, are plots of ERA (on the vertical axis) by Errors (on the horizontal axis): as ERA increases, so do numbers of Errors. Similarly for BA, as numbers of Errors increases, the batting averages of teams tend to decrease. All this evidence means that when a team is bad, it tends to be bad in all three dimensions, and when it is good, it tends to be good in all three dimensions. This is no surprise, of course, but we do have to control for these factors when answering our question.

BA, ERA, by Errors by League

We finally come to our main question, which we answer with a complicated statistical model, one which accounts for all the evidence we have so far demonstrated. The type of model we use accounts for the fact that the number of Wins is a discrete number, by which we mean the total Wins can be 97 or 98, say, but they cannot be 97.4. In technical terms, it is called a quasi-Poisson generalized linear model, a fancy phrase that means that the model is very like a linear regression model, about which you may have heard, but with some twists and extra knobs that allow us to control for our interacting factors and discrete response.

The answer lies in these complicated-looking pictures. Let’s work through them slowly. First, only look at the top picture, which is the modeled, or predicted number of wins by various batting averages.

Predicted wins

There are two sets of three curves. The brownish is for the National League, and the blueish for the American. Now, in order to predict how many wins a team will have, we have to supply four things: their expected BA, ERA, number of errors, and League. That’s a lot of different numbers, so to simplify somewhat, we will fix the number of Errors at the median observed figure, which is 104. (Changing Errors barely changes the results.)

We still have to plug in a BA, ERA, and League in order to predict the number of wins. We first start by plugging in the BA over the range of observed values, but we still have to supply an ERA. In fact, we supply three different ERAs: the observed median, and first and third quartiles, which are: 4.04, 4.37, and 4.74. For the American League, these are the three blue curves: the top one corresponds to the lowest ERA of 4.04, the middle for the value of 4.37, and the bottom for the highest value of 4.74. To be clear: each point on these curves is the result of four variables: a BA, an ERA, a number of Errors, and a League. From these four variables, we predict the number of wins, which varies as the four variables do.

All of these curves sweep upwards, implying the obvious: higher BAs lead to more predicted Wins, regardless of ERA or League. At the lowest BAs, differences in ERA are the largest in the American League. Meaning that, if your team is hitting very poorly, small variations in pitching account for large changes in the number of games won. To make sure you see this, focus on the very left-most points of the graph, where the BAs are the smallest. Then look at the three blue curves (American League): the three left-most points on the blue curve are widely separated. Moving from a team ERA of 4.74 to 4.04 increases the number of games won from 61 to 78, or 17 more a season, which is of course a lot. But when a team is batting well, while differences in ERA are still important, they are not as influential. These are the right-most blue points on the figure: notice how at the largest BAs, the three curves (again representing different ERAs) are very close together. If a team in the American League is batting very well, improvements in pitching do not account for very many more games won.

That is so for the American League, but perhaps surprisingly not for the National, where the opposite occurs. Differences in ERA are more important for high batting averages, but not as important for low ones: better pitching becomes more crucial as the team bats better. The brown curves spread out more for high BAs, and are tighter at low BAs.

Now let’s look at the bottom picture. This is the same sort of thing, but for the range of ERAs are three fixed levels of BA: .259, .266, and .272. The top curves are the highest BA, and the bottom curves the lowest. Looking first at the American League, we can see that when the team ERA is low, differences in BA do not account for much. In fact, when the team ERAs are the lowest, improvements in batting in the American League are almost not different at all! When team ERAs are high, changes in BA mean larger differences in numbers of games won: the spread between the blue lines increases as ERA increases.

Again, the situation is opposite for the National League: when the team ERA is low, changes in BA are more important than when teams ERAs are high. In this league, when team ERAs are low, good batting can make a big difference in numbers of games won. But when ERAs are high, improvements in batting do not change the number of games won very much.

Once more, we point out that we can draw each of these three curves again for different numbers of Errors. We did so, but found that the differences between those curves and the ones we displayed were minimal, but not negligible: for example, adding a whopping 40 errors onto a team that ordinarily only commits 80, on average only costs them 2 games a season. Higher BAs or ERAs can mitigate this somewhat, from losing 2 games to only losing about 1 extra game a season. So while Errors are important, they are by far decisive factors in an overall season.

So what should you do?

Look again at the two plots. In the BA plot, the highest number of predicted wins, for a BA of .292 for the ERA of 4.04 (the lowest pictured) is about 104 games for National League teams, and about 100 for American League clubs. But the hitoghest number of predicted wins, looking at the ERA plot, for teams with the lowest ERA of 3.13 with the BAs of .272 (the highest pictured) is about 111 games for the National League and 107 games for the American. Conversely, back in the BA plot, those teams with the lowest BAs of .240 and high ERAs of 4.74 won only about 61 games in the American League and 67 in the National. While—in the ERA plot—teams with the worst ERAs of 5.71 and lowest BAs of .259 won only about 56 games in the American and 62 in the National.

Clearly, then, pitching is more important than batting overall: more games on average will be won by those clubs who have the lower ERAs than those teams with the higher BAs.

But that isn’t necessarily the answer to our question. Remember that you only have money for one more player. Should you recruit or trade for a better pitcher or batter? It depends on what kind of team you have now. Our team right now has a certain ERA, BA, and expected number of Errors, so what do we do? The final answer is in this last picture.

Effects of ERA and BA

This shows improvement, in either ERA (decreasing) or BA (increasing) on the bottom axis. The other axis shows for each “unit” of improvement (0.05 for ERA, 0.001 for BA), the additional games won. These are the same, in essence, of the plots above, but they show the data in a different fashion (the same colors still represent the two leagues). The way this figure works is that you pick a certain point, say a BA of .266 or an ERA of 4.34 (which is the same point on the graph), and then move upwards (to the right on the horizontal axis) by one “unit” (0.05 for ERA, 0.001 for BA) and then pick off the number of additional games won.

No matter where we are on the graph, ERA easily wins this race, in the sense that buying a better picture to improve the ERA wins more games than buying a better batter to improve the BA. This is true for either league. (These pictures are also concocted using the median values of ERA, BA, and Error, as mentioned above: do not worry if you don’t understand this; the results do not change for the other values.)

So spend your money on the pitcher.

Tim Murray is a student at Central Michigan University and can be reached at murra1td@cmich.edu. William Briggs is a statistician in New York City and can be reached at matt@wmbriggs.com.

20 comments April 28th, 2008

CONTEST: Preliminary Discussion of the “Best Internet Conspiracy Theory”

Best Internet Conspiracy Theory
This is the first posting preliminary to the announcement of an Official Contest to find the Best Internet Conspiracy Theory.

The Contest will be officially announced in about one week.

This contest is primarily a public service for those who contribute regularly to sites like Digg.com, Reddit.com, Wikipedia.org, etc. Many of those people are forced to spend an inordinate amount of time concocting theories that neatly explain messy world events. This has led to an enormous increase in carpal tunnel and internet addition syndrome cases worldwide. Thus, we want to provide these overworked souls a handful of ready-made theories to which they can refer. The theories we have in mind are described in the contest rules below.

I will need help in publicizing this Contest, and may need help in judging entries, depending on how many I receive. Volunteers should email me: put “CONTEST” in the subject line.

A sketch of the rules is as follows:

(1) All entries must be shorter than 150 words. Shorter entries will receive more weight than longer ones.

(2) Entries—one per person—must be placed into the Comments Section of the Official Contest Post. No discussion will be allowed on that post; only Contest entries are allowed.

(3) All entries will be judged by the intrinsic awfulness, brevity, completeness of derangement, plausibility, specificity (names named), and potential appeal to the everyday, e.g., Digg reader.

(4) The Contest will last approximately two to three weeks.

(5) A prize, or prizes, to be decided later, will be announced.

(6) An example of an Internet Conspiracy Theory:

Certain scientists discovered a formula, derived from an alien artifact dug up in Area 51, for turning ordinary sea water into limitless, cheap fuel. Green Energies, a subsidiary of MoveOn.org, based in the World Trade Center was about to sell this discovery and eliminate Global Warming, when the Oil Companies learned of it. Big Oil contacted George Bush, who ordered the Twin Towers destroyed before the secret could get out. Ron Paul found out about this and was going to expose the entire matter had he won the Republican Nomination, which he would have done except the Mainstream Media ignored him.

Please do NOT post any conspiracy theories now! Save them for the Contest.

11 comments April 24th, 2008

CO2 and Temperature: which predicts which?

Parts of this analysis were suggested by Allan MacRae, who kindly offered comments on the exposition of this article which greatly improved its readability. The article is incomplete, but I wanted to present the style of analysis, which I feel is important, as the method I use eliminates many common errors found in CO2/Temperature studies. Any errors are, of course, entirely my own.

It is an understatement to say that there has been a lot of attention to the relationship of temperature and CO2. Two broad hypotheses are advanced: (Hypothesis 1) As more CO2 is added to the air, through radiative effects, the temperature later rises; and (Hypothesis 2) As temperature increases, through ocean-chemical and biological effects, CO2 is later added to the atmosphere. The two hypotheses have, of course, different consequences which are so well known that I do not repeat them here. Before we begin, however, it is important to emphasize that both or even neither of these hypotheses might be true. More on this below.

The source of monthly temperature data is from The University of Alabama in Huntsville, which starts in January 1980. Temperature is available at different regions: global, Northern Hemisphere, etc. The monthly global CO2 is from NOAA ERSL.

We want to examine the CO2/temperature processes at the finest level allowed by the data, which here is monthly at the time scale, and Northern and Southern Hemisphere and the tropics at the spatial scale. The reason for doing this, and not looking at just yearly global average temperature and CO2, is that any processes that occur at times scales less than a year, or occur only or differently in specific geographic regions, would be lost to us. In particular, it is true that the CO2/temperature process within a year is different in the Northern and Southern hemispheres, because, of course, of the difference in timing of the seasons and changes in land mass. It is also not a priori clear that the CO2/temperature process is the same, even at the yearly scale, across all regions. It will turn out, however, that the difference between the regional and global processes are minimal.

The question we hope to answer is, given the limitations of these data sets, with this small number of years, and ignoring the measurement error of all involved (which might be substantial), does (Hypothesis 1) increasing CO2 now predict positive temperature change later, or does (Hypothesis 2) increasing temperatures now predict positive CO2 change later? Again, this ignores the very real possibility that both of these hypotheses are true (e.g., there is a positive feedback).

During the course of an ordinary year, both Hypotheses 1 and 2 are true at different times, and sometimes neither is true: in the Northern Hemisphere, the temperature and CO2 both increase until about May, after which CO2 falls, though temperature continues to rise. In the Southern Hemisphere, temperature falls in the early months, while CO2 rises, and so on. These well known differences are due to combinations of respiration and changes in orbital forcing.

There are, then, obvious correlations of CO2 and temperature at different monthly lags and in different geographic regions (I use the word “correlation” in its plain English meaning and not in any statistical sense). We are not specifically interested in these correlations, which are well know and expected, and whose role in long-term climate change is minimal. The existence of these correlations present us with a dilemma, however. It might be that, for either Hypothesis 1 or 2, the time at which either CO2 or temperature changes in response to changes in forcing is less than one year, but disentangling this climate forcing with the expected changes due to seasonality, is, while possible, difficult and would require dynamical modeling of some sort (in the language of time series, the seasonal and long-term signals are possibly confounded at time scales less than 1 year).

Therefore, instead of looking at intra-year correlations, we will instead look at inter-year correlations. This introduces a significant limitation: any real, non-seasonal, correlations less than 1 year (or at other non-integer yearly time points) will be lost and it will be possible that we are misled in our conclusions (in the language of time series, the “power” on these non-integer-year lags will be aliased onto the 1 year lag). What is gained by this approach, however, is that there is no chance of misinterpreting lags less than one year as being due to a process other than seasonality. However, the main purpose of this article is not to identify the exact dynamical and physical CO2/temperature relationship, nor to identify the lag that best describes it; we just want to know is Hypothesis 1 or Hypothesis 2 more likely on time scales greater than 1 year?

Most of us have seen pictures like this one, which shows the monthly CO2 for 1980-1984; also shown in the Northern Hemisphere (NH) temperature anomaly (suitably normalized to fit on the same picture).
Co2 through time
You can immediately see the intra-year CO2 “sawtooth”. This sawtooth makes it difficult to find a functional relationship of CO2 and temperature. I do not want to model this sawtooth, because I worry that whatever model I pick will be inadequate, and I do not immediately know how to carry the uncertainty I have in the model through to the final conclusion about our Hypotheses. I also do not want to smooth the sawtooth, or perform any other mathematical operation on the observed CO2 values within a year, because that tends to inflate measures of association.

Instead, let’s look at CO2 in a different way:
Co2 through time by month
This is yearly CO2 measured within each month: each of the 12 months has its own curve through time. It doesn’t really matter which is which, though the two lowest curves are from the winter months (for those in the NH). What’s going on is still obvious: CO2 is increasing year by year and the rate at which it is doing so is roughly constant regardless of which month we examine.

Looking at the data this way show that the sawtooth has effectively been eliminated, as long as we examine year-to-year changes within each month through time.

Suppose we were only interested in Decembers and in no other months. Let us plot the actual December temperature from 1980 to 2006 on the x-axis and on the y-axis plot the increase in CO2 for the years 1981 to 2007. Shown in the thumbnail below is this plot: with black dots for the Southern Hemisphere (SH), red dots for the NH, and green dots for the tropics (redoing the analyses with global or sea surface temperatures instead of separating hemispheres produces nearly indistinguishable results). For example, in one year, the NH temperature anomaly was -0.6: this was followed in the next year by an increase of about 1.5 ppm of CO2 (this is the left-most plot on the figure).
Co2 through time by month

The solid lines estimate the relationship between temperature and the change in CO2 (the dCO2/dt on the graph). These are loess lines and estimate the relationship between the two variables. If the loess lines were perfectly straight (and pointed in any direction), we would say the two measures are linearly correlated. The lines aren’t that straight, so the data does not appear to be that well correlated, linearly or otherwise.

Click on the figure (do this!) to see the same plot for each of the 12 months (right click on it and open it in a new window so you can follow the discussion). Notice anything? Generally, when temperature increases this year CO2 tends to increase in the following year. Hypothesis 2 is more likely to be true given this picture.

The loess lines are not always straight, which means that a straight-line model, i.e. ordinary correlation, is not always the best model. For example, in Januaries, until the temperatures anomalies get to 0 or above, temperature and change in CO2 have almost no relationship; after this point, the relationship becomes positive, i.e., increasing temperatures leads to increases in the change of CO2. The strength of the relationship also depends on the month: the first six months of the year show a strong signal, but the later six show a weakening in the relationship, regardless of where in the world we are.

Coincidence? Now plot the actual December CO2 from 1980 to 2006 on the x-axis and on the y-axis plot the change (increase or decrease) in temperature for the years 1981 to 2007. For example, in one year, the NH CO2 was 340 ppm: this was followed in the next year by a temperature decrease of about -0.5 degrees (this is the bottom left-most plot on the figure). No real signal here:
Co2 through time by month

Again, click on the figure (do this!) to see all twelve months. There does not appear to be any relationship in any month between CO2 and change in temperature, which weakens our belief in Hypothesis 1.

It may be that it takes two years for a change in CO2 or temperature to force a change in the other. Click here for the two-year lag between temperature and change in CO2; and here for the two-year lag between CO2 and change in temperature. No signals are apparent in either scenario.

As mentioned above, what we did not check are all the other possibilities: CO2 might lead or lag temperature by 9.27, or 18.4 months, for example; or, what is more likely, the two variables might describe a non-linear dynamic relationship with each other. All I am confident of saying is, conditional on this data and its limitations etc., that Hypothesis 2 is more probable than Hypothesis 1, but I won’t say how much more probable.

It is also true that, over this period of time and using this data, CO2 always increased. The cause of this increase sometimes was related to temperature increases (rising temperatures led to more CO2 being released) and sometimes not. We cannot say, using only this data, why else CO2 increased, although we know from other sources that CO2 obviously increased because of human-cased activities.

71 comments April 21st, 2008

It was bound to happen

Remember how you used to cavalierly ignore those “Keep of the Grass Signs” in your un-enlightened youth?

Well, you brutal, uncaring, beast.

For it has finally been announced—from Europe, naturally, from the Swiss government-appointed Federal Ethics Committee on Non-Human Biotechnology—that plants have feelings too.

They have authoritatively stated that “interfering with plants without a valid reason as ‘morally inadmissible.’” This means the next time you carve you and your sweetheart’s name into a tree can lead to a nice, long jail sentence. (If the famed Swiss police ever catch you, that is.)

The ethics committee did grudgingly admit—for now—that “all action involving plants for the preservation of the human race was morally justified.” Meaning, I suppose, that it’s still OK to eat them. I probably don’t need to explain to you the fix we’d be in if we could not. But there is only direction for the Enlightened to go, so stay tuned for an announcement banning the use of “higher” plants, such as maybe corn and tomatoes, for use in the “preservation of the human race.”

The august Swiss body has also found that “genetic modification of a plant did not contradict the idea of its ‘dignity’.” Yes, I can see how a kumquat would not find it an affront to be genetically probed. Until, that is, the kumquat learns how easily this sort of thing can sully one’s reputation. It’s only matter of time before a lawyer figures this out and brings a case to Brussels.

Just keep all this in mind, think about what you are doing—raise your awareness!—next time you are at the salad bar.

7 comments April 20th, 2008

The Devil’s Delusion: Atheism and its Scientific Pretensions by David Berlinski

There are, as everybody knows, a recent number of books seeking to either demonstrate, scientifically, that God does not exist, or to show that the love of religion is the root of all evil. Some familiar names: Daniel Dennet, Richard Dawkins, Stephen Weinberg, Victor Stenger, Christopher Hitchens, and even John Allen Paulos. All proclaim that the weight of scientific evidence is either completely or heavily on the side of the non existence of God.

The question is, of course: Has the authority of eminent scientists enabled them to prove their case? Berlinski says, “Not even close.” Not only have they not come close, Berlinski goes further and shows how easily they are persuaded by weak or demonstrably false arguments, and the extraordinary lengths that some scientists will go, in the sense of believing bizarre theories, to avoid ceding any ground to the “religionists.” Their distaste of religion has also lead them to say some rather stupid things. For example, Berlinski quotes the eminent biologist Emile Zuckerkandl as saying that if God exists, He would represent “something like a pathology of the state of being.” An enjoyable, sputtering rant by that author published in the peer-reviewed journal Gene is summarized later in the book.

Incidentally, before we get too far, it is worth mentioning that like most (all?) books in this genre, Berlinski does not attempt a definition of who or what God is—and neither do those on the other side. I haven’t one to offer, either. This curiosity can very well mean that everybody is talking at cross purposes. But since nobody delineates or bounds God, I can’t say much more than this, except that it should be borne in mind when reading any of these books.

A non-Enlightened disease

Berlinski puts the claim that religion is bad for you in perspective. Some anti-religion authors won’t settle for anything less than damning religion in all its stripes, disallowing, even, the crumb of comfort given to people when their loved ones die. Even Carl Sagan, in his Demon-Haunted World allowed this kind of solace, without recognizing that since, I must point out, everybody dies, this is an enormous amount of comfort to go around that would be denied mankind if religion were absent. But you never hear of our authors breaking open Mill to assist in calculating the utility of comforts versus torments of religion.

Many scientists feel that religion, while still a cancerous growth, is benign and only mostly harmful, and not immediately deadly. Sort of like smoking, which the more Enlightened among us would like to ban. Presumably, those who would prohibit smoking are same people who would support legalizing assisted suicide. Which happened in Holland in 1984 (and where a partial smoking ban does exist). Since then, about three percent of all deaths in that country are assisted, of which the government admits that about one-fourth are “involuntary.” We call that involuntary method of exiting “murder” here in the States, but Europeans are often considered more Enlightened, so they might be one step ahead of us in legal definitions.

Arguments for assisted suicide are usually intentionally religion-free. Thus, the point of the Holland example, of course, is that the world would not necessarily become a more moral, or safer place, if religion were to disappear. More proof is given by Berlinski in the form of a table, ordered by number of “excess”, or untimely, twentieth-century deaths due to non- or even anti-religious behavior. Leading the pack are of course the two World Wars, but not far behind in the body count are mankind’s experiments with various communist utopias. Since one of the top arguments used by those who would wish to bar religion is that the religious can be cruel and have killed, the evidence that the non-religious can be cruel and have killed in equal or larger number only proves that there will always be a class of people who adore pain, misery, and bloodshed, irrespective of creed.

The disease religion is also seen as congenital, in the sense that people have religion on the brain, literally. Somehow, we are assured, the brain has genetically encoded religion into itself, and that if we’d just grow up and recognize this, we would become Enlightened (or brightened, these days). This is one of the sillier arguments put forth by scientists. If religion is genetically encoded, then it cannot be overcome, unless some of us, the superior ones naturally, have somehow managed to escape expressing those particular genes that activate, say, the praying response. Look for one of those fMRI studies that “proves” this, soon.

Berlinski shows that because some scientists cannot countenance religious arguments of any kind, they refuse to accept any evidence that is any way tainted by religion. This leads to the fallacy that one should not listen to arguments against, say, stem cell research or abortion because they are religious. You will surely certainly recognize this ploy when you meet it.

Scientific ontology

Everybody already knows that physics, and its offshoots, has done brilliantly at explaining more and more of the universe. But it cannot keep doing so forever. At some point, meta-physics must enter into the discussion. This is because, no matter what physical laws we have identified, we will never have explained through observation why these particular laws and not some other are in force, nor can we answer what the laws mean. It is obvious that it is here that God can slip in and offer the needed explanations. Some scientists are therefore anxious to fill in these gap with…something, anything but God. Or, if that cannot be accomplished, then to prove that God does not exist.

Dawkins, in his The God Delusion offers a particularly weak argument. His first premise is that the universe is improbable. And we can stop right there, because that is a nonsensical statement, so his argument fails. Any thing or statement cannot be improbable. A thing can only be improbable with respect to something else. Further, a thing can be improbable with respect to one set of evidence and entirely probable with respect to other evidence. So, in Dawkin’s case, the universe is improbable with respect to what?

Weak Anthropic evidence is sometimes offered, in the guise of certain physical constants having particular values, in the sense that if these constants did not have these values, then human life would be impossible (which is not the same as saying the universe is impossible, but let that pass). Now the burden is on those who tout this evidence to show that this is the best evidence with which to measure the improbability of the universe. And there are many hints that it is not the best evidence. It is, after all, by its very name, suspiciously self indulgent and human centered evidence. Why would the universe care if humans, or other sentient beings, evolved enough to notice that they might not have evolved had the universe been arranged differently anyway? Besides, to say that things might have been different and humans might not have evolved is just a tautology, and therefore of no interest.

Still, accept it if you like, so that we can move to Dawkins’s second premise, which is that God Himself is improbable. Again, the statement is nonsensical: improbable with respect to what? Dawkins suggests that God must be more improbable than the universe, which again makes no sense. Anyway, improbable is not impossible, as Dawkins often argues with respect to evolution by natural selection, arguments he has apparently forgotten. Still, Dawkins moves to his conclusion that God is so improbable that He doesn’t exist, and advises people to accept some recent conjectures in cosmology that seem to do away with the need to explain why the universe, or universes, are the way they are.

These are the Landscape and multiverse hypotheses, put forward by various authors to help them cope with the insolubilities of quantum mechanics and cosmology. These are attempts to shift the questions of “Why?” one step back. That they do not answer them, I would have thought obvious. Even pushing the grand questions a little deeper down is enough to please some people. Berlinski, a mathematical physicist, covers these speculations well, without any math, and gives pointers to books where we might learn more. See especially his very clever “Catechism of Quantum Cosmology.” Briefly, however, the solutions offered posit an uncountable number of alternate universes that are coming into and out of creation always. There are no mechanisms to observe these other universes directly or indirectly. Even if we could, these theories might answer some questions of quantum mechanics and gravity, but they never answer why it is infinities of universes instead of just one. The theories are also mind-boggling complex, and by no means are they consistent with one another. Nobody even knows what the full scope of these ideas are.

Berlinski quotes Dawkins, who is nevertheless satisfied, as saying, “The key difference between the radically extravagant God hypothesis and the apparently extravagant multiverse hypothesis, is one of statistical improbability.” Presumably, he means that God is more improbable. He never says how much more. Infinities, of universes or anything else, are a dangerous thing. More foolishness has been generated by jumping to infinity than by any other reason (see chapter 15 of Jaynes’s remarkable Probability Theory for appropriate words of admonition).

Argument from design

It has long been convincing to many that the wonderful biological complexity that is everywhere in evidence must have had a designer. How else, Darwin himself wondered, can one explain the human eye? This argument is less convincing than it once was, because of the success of modern biology and genetics, and the seeming success of evolution by natural selection.

(It is just as well to point out here that I accept that evolution accounts for some or most of the observed biological variation on Earth, and that the mechanism driving it is natural selection, or something like it.)

Wait a minute. Did he just say seeming success? He did. Which brings us back to Dawkins, the best-known anti-religion author. Was there ever a man who published so much nonsense that was taken so seriously by the scientific community? Nobody else even comes close. Just mentioning the word memes proves my point. Is not believing in God a meme? Berlinski doesn’t discuss memes, but does offer some well known criticisms of “selfish” genes—incidentally, the best are due to the philosopher’s Mary Midgley (Evolution as a Religion) and David Stove (Darwinian Fairytales; if you haven’t read either of these books, please do so, especially Stove’s, before you comment).

Not all biologists are satisfied with present-day theory. Berlinski writes

[Darwinian] theory is what is always was: It is unpersuasive. Among evolutionary biologists, these matters are well known. In the privacy of the Susan B. Anthony faculty lounge, they often tell one another with relief that it is a very good thing the public has no idea what the research literature really suggest.

“Darwin?” a Nobel laureate in biology once remarked to me over his bifocals. “That’s just the party line.”

There are still gaps in the evolutionary record. Nobody knows how life original arose, and nobody knows how species originate. Some fill these gaps with God. Scientists argue that the gaps will be filled in eventually. Berlinski says that this assumption is “both intellectually primitive and morally abhorrent—primitive because it reflects a phlegmatic absence of curiosity, and abhorrent because it assigns to intellectual future a degree of authority alien to human experience” because filling gaps “has created [new] gaps all over again.”

The answer

The best summation on the side of (non-apoplectic) scientists is probably from Richard Feynman, who said, “Today we cannot see whether Schrödinger’s equation [which describes the time evolution of physical systems] contains frogs, musical composers, or morality. We cannot say whether something beyond it like God is needed , or not. And so we can all hold strong opinions either way.”

To say whether or not God exists is the hardest question in the world; yet it is the one people find easiest to answer, and everybody seems delighted to meet an argument, however weak, that agrees with their desires. This leads very smart people to say exceptionally stupid things.

My own surmise is that any proof—for or against—is impossible. And so any belief you have is based entirely on faith.

85 comments April 14th, 2008

Why multiple climate model agreement is not that exciting

There are several global climate models (GCMs) produced by many different groups. There are a half dozen from the USA, some from the UK Met Office, a well known one from Australia, and so on. GCMs are a truly global effort. These GCMs are of course referenced by the IPCC, and each version is known to the creators of the other versions.

Much is made of the fact that these various GCMs show rough agreement with each other. People have the sense that, since so many “different” GCMs agree, we should have more confidence that what they say is true. Today I will discuss why this view is false. This is not an easy subject, so we will take it slowly.

Suppose first that you and I want to predict tomorrow’s high temperature in Central Park in New York City (this example naturally works for any thing we want to predict, from stock prices to number of people who will vote for a certain USA presidential candidate). I have a weather model called MMatt. I run this model on my computer and it predicts 66 degrees F. I then give you this model so that you can run it on your computer, but you are vain and rename the model to MMe. You make the change, run the model, and announce that MMe predicts 66 degrees F.

Are we now more confident that tomorrow’s high temperature will be 66 because two different models predicted that number?

Obviously not.

The reason is that changing the name does not change the model. Simply running the model twice, or a dozen, or a hundred times, does not give us any additional evidence than if we only ran it just once. We reach the same conclusion if instead of predicting tomorrow’s high temperature, we use GCMs to predict next year’s global mean temperature: no matter how many times we run the model, or how many different places in the world we run it, we are no more confident of the final prediction than if we only ran the model once.

So Point One of why multiple GCMs agreeing is not that exciting is that if all the different GCMs are really the same model but each just has a different name, then we have not gained new information by running the models many times. And we might suspect that if somebody keeps telling us that “all the models agree” to imply there is greater certainty, he either might not understand this simple point or he has ulterior motives.

Are all the many GMCs touted by the IPCC the same except for name? No. Since they are not, then we might hope to gain much new information from examining all of them. Unfortunately, they are not, and can not be, that different either. We cannot here go into detail of each component of each model (books are written on these subjects), but we can make some broad conclusions.

The atmosphere, like the ocean, is a fluid and it flows like one. The fundamental equations of motion that govern this flow are known. They cannot differ from model to model; or to state this positively, they will be the same in each model. On paper, anyway, because those equations have to be approximated in a computer, and there is not universal agreement, nor is there a proof, of the best way to do this. So the manner each GCM implements this approximation might be different, and these differences might cause the outputs to differ (though this is not guaranteed).

The equations describing the physics of a photon of sunlight interacting with our atmosphere are also known, but these interactions happen on a scale too small to model, so the effects of sunlight must be parameterized, which is a semi-statistical semi-physical guess of how the small scale effects accumulate to the large scale used in GCMs. Parameterization schemes can differ from model to model and these differences almost certainly will cause the outputs to differ.

And so on for the other components of the models. Already, then, it begins to look like there might be a lot of different information available from the many GCMs, so we would be right to make something of the cases where these models agree. Not quite.

The groups that build the GCMs do not work independently of one another (nor should they). They read and write for the same journals, attend the same conferences, and are familiar with each other’s work. In fact, many of the components used in the different GCMs are the same, even exactly the same, in more than one model. The same person or persons may be responsible, through some line of research, for a particular parameterization used in all the models. Computer code is shared. Thus, while there are some reasons for differing output (and we haven’t covered all of them yet), there are many more reasons that the output should agree.

Results from different GCMs are thus not independent, so our enthusiasm generated because they all roughly agree should at least be tempered, until we understand how dependent the models are.

This next part is tricky, so stay with me. The models differ in more ways than just the physical representations previously noted. They also differ in strictly computational ways and through different hypotheses of how, for example, CO2 should be treated. Some models use a coarse grid point representation of the earth and others use a finer grid: the first method generally attempts to do better with the physics but sacrifices resolution, the second method attempts to provide a finer look at the world, while typically sacrificing accuracy in other parts of the model. While the positive feedback in temperature caused by increasing CO2 is the same in spirit for all models, the exact way it is implemented in each can differ.

Now, each climate model, as a result of the many approximations that must be made, has, if you like, hundreds (even thousands) of knobs that can be dialed to and fro. Each twist of the dial produces a difference in the output. Tweaking these dials, then, is a necessary part of the model building process. The models are tuned so that they, as closely as possible, first are able to produce climate that looks like the past, already observed, climate. Much time is spent tuning and tweaking the models so that they can, at least roughly, reproduce past climate. Thus, the fact that all the GCMs can roughly represent the past climate is again not as interesting as it first seemed. They better had, or nobody would seriously consider the model as a contender.

Reproducing past data is a necessary but not sufficient condition that the models can predict future data. Thus, it is also not at all clear how these tweakings affect the accuracy in predicting new data, which is data that was not used in any way to build the models, that is, future data. Predicting future data has several components.

It might be that one of the models, say GCM1 is the best of the bunch in the sense that it matches most closely future data. If this is always the case, if GCM1 is always closest (using some proper measure of skill), then it means that the other models are not as good, they are wrong in some way, and thus they should be ignored when making predictions. The fact that they come close to GCM1 should not give us more reason to believe the predictions m