Statistics Galore! Ithaca Teaching Journal, Day 0

Two Weeks

Beginning tomorrow, and lasting a semisolid two weeks, will be my class at Cornell. A Masters “How To” in statistics, directed towards MBA-like personages. An impossible task. No subject of substance can be learnt in two weeks.

But it will be my task to (1) demonstrate statistics, despite all public evidence to the contrary, is (or can be) a subject of substance, and (2) to display its lineaments entertainingly.

The blog—except for one notable exception, probably this Wednesday—will follow my thoughts on the nature of probability and statistics, and how those thoughts make their way to the students.

What appears here won’t necessarily appear in class. Topics here will necessarily be telegraphic, but also more fundamental, more philosophical. The first day will focus on logic, belief, intuition, induction, the nature of probability.

On day one, we’ll talk about questions like this one: Before peeking at the answer, see how you fare at analyzing this sentence, spoken on John Dvorak’s X3 in defense of downloading video using, inter alia, bit torrent: “Anything I buy, I buy legitimately.”

Isn’t that a bit like saying, “Anything I steal, I steal illegitimately”? It is, until we realize we can buy illegitimately, too. However, the original is a good sentence and it, and ones similarly structured, are recommended to politicians like Anthony Weiner.

Later in the course, we’ll talk about topics like cancer clusters.

Seek And Ye Shall Find…Cancer Clusters

A person with cancer never suffers alone: others will also have his disease. Just like our man, the other similarly diseased souls must live somewhere. It will thus happen that two or more people who have cancer will often be found to live “near” each other, especially in sprawling urban areas.

A “cancer cluster” is thus defined as group of cancerous folk who all live near one another, where “near” can mean anything from one block to 100 miles. Importantly, “near” must never be defined until the cancerous have all be identified. This makes designations of “clusters” easier.

When the number of sufferers inches past the century mark, there will usually be a team of lawyers ready to claim these unfortunates belong to a “cluster.” By this they mean that some person or well-funded corporation must be responsible for causing the cancers; and since they caused the cancer, they must pay the lawyers for having discovered this fact.

A lovely property of classical, p-value-based statistics is that cancer clusters can always be found (we shall learn how over the next two weeks). Which is to say, “statistically significant” results of clusters can always be found. This not only gives constant employment to lawyers, it helps keep up the level of National Gross Nervousness.

To prove this, Steve Milloy quotes from the announcement of a new cancer-cluster law.

The “Strengthening Protections for Children and Communities From Disease Clusters Act” [pdf] (S. 76), offered by Chairwoman Barbara Boxer (D-Calif.) and Sen. Mike Crapo (R-Idaho), cleared the panel on a party-line 11-7 vote. Only Crapo joined Democrats in supporting the bill.

It’s for the children!

The writer (at Milloy’s site) calls cancer cluster “myths.” This is not so: they are not myths; they are real. People who have cancer can certainly live “near” one another and cluster. But this does not mean, or even imply, that something in the environment nearby where they lived caused the cancer.

The burden of proof should always be on those who cry “Cluster!” to show what biological mechanism causes the cancers in those that have it and does not in those that do not have it. Raw statistical “proof” that clusters exist are of little to no use.

1. Ray says:

If you have ever ploted the output of a uniform pseudorandom number generator by generating x and y coordinates, the output is not distributed uniformly across the paper, but has clusters.

2. Alan Bates says:

Hi Speed

This is not available in the UK. Do you have an alternate link? Thank you.

Alan

3. Dennis says:

At one point in my career as a forester, I oversaw brush and pest control treatments on a forest unit and in that capacity I was required to have a pesticide applicator certification. This required yearly training to maintain the certification. I remember that at one of the courses being taught by a researcher from Oregon State University (Dr. Frank Dost). He recounted the story of the cancer scare in the southwest corner of Oregon. This was in many of the papers of the day and was characterized by a population of women whose babies purportedly showed a higher than normal incidence of teratomas at birth. Local and national papers and news programs focused on the use of 2,4-D in local forests for shrub control as the obvious culprit. After the usual hue and cry and the decision of private and government foresters to curtail the use of 2,4-D in the area, Oregon State University commenced a series of studies on the effects of the chemical on human health.

What the studies found was that a person would have to ingest huge amounts of the chemical to show symptoms, that it was not persistent in the body or organs (in fact 82% is excreted in unchanged form), that the half life of residue in the bodies of living organisms is 10 to 20 hours, and that mutagenic and teratogenic responses in humans required very large and chronic doses[1].

Like most of these environmental scares, this one had some basis in fact. There were birth defects noted in this specific SW Oregon population. The time period, however, was the 60â€™s and encompassed the â€œback to the earth/hippyâ€ movement of the day, and southern Oregon was one of the Meccaâ€™s for that movement. When researchers controlled for the number of young women of childbearing age in that area, the incidence of birth defects was found to be less than would be expected in the normal population.

4. Curt says:

One of my favorite illustrations of this type of abuse of “statistical significance” can be found here:

http://www.xkcd.com/882/

5. Ah, Mike of Idaho. T’was ever a US senator more aptly surnamed?

6. Mercher says:

@ Alan Bates

Here’s a link to the 10 coin trick thing that works in the UK (starts about 9:45):

http://tinyurl.com/65ao7dt

7. Doug M says:

I would say that these guys are generally good with numbers, but not necessarily good at math. Teach them the pitfalls of linear regression, or they will abuse this tool to get really big R-squares. Try to teach them when a normal distribution is not a reasonable approximation.

8. The bill defines a DISEASE CLUSTER as:

“(4) DISEASE CLUSTER- The term `disease cluster’ means–

(A) the occurrence of a greater-than-expected number of cases of a particular disease within a group of individuals, a geographical area, or a period of time; or
(B) the occurrence of a particular disease in such number of cases, or meeting such other criteria, as the Administrator, in consultation with the Administrator of the Agency for Toxic Substances and Disease Registry and the Director, may determine.”

At the risk of conflating mean and median, clause (A) means that half the counties in the USA are clusters for whatever disease you want to name. If that isn’t enough, then clause (B) allows three people to declare anyplace at any time a cluster.

9. Alan Bates says:

Mercher