Statistics

# Epidemiology, Causality, And P-Values: Part II

Be sure to read yesterday’s post first.

One of the screwy consequences of classical statistics is that my odd sample (mixing babies and the patients from the brain cancer ward) is perfectly acceptable. Nobody would ever use such a sample, of course, but that is a tacit admission of the incompleteness of the classical theory (about which, more another day).

For a moment, ignore the oddness of my sample and assume that it is instead a “good” one, where I’ll leave “good” unexplained. Think of it as comparing cell phone users and non-users in the “right” way (whatever that means).

Even assuming the sample is “good”, we still have a problem. We cannot say why only some people in the exposed group developed cancer, nor can we say why everybody in the non-exposed group did not. Obviously, something, or some things, caused some of those people to develop cancer, or caused them not to develop cancer. What?

Why didn’t everybody in the exposed group develop cancer, and why didn’t everybody in the non-exposed group remain free of the disease? Obviously, something or some things that we did not measure are part of the causal brain cancer chain.

Through our sample, we only know two things with certainty: (1) cell phone exposure does not always cause cancer, (2) non-cell phone exposure does not always cancer. Further, these are direct statements of causality.

Any other statement about causality we can make with our sample can only be true with a probability somewhere strictly greater than 0 and less than 1. What are these uncertain statements like?

It is our surmise that, through some biological mechanism, said to be plausible conditional on information external to our sample, that cell phone radiation twiddles with certain cellular (ha!) processes, turning normal cells into rogue ones. But we also must believe that these mechanisms only work sometimes, or only on people who meet other criteria, or both.

We might guess what these other criteria are—say, smoking—but these are just guesses: we cannot know with certainty that the other criteria are causally responsible, just as we cannot say that cell phone radiation certainly is.

Further, we might guess incorrectly—this is what it means to guess, right?—and that other processes, completely unknown to us are what causes the cancer. For example, it could be that ingesting an unknown suite of chemicals in just the right order causes the cancer, but only when hit by the radiation from the cell phone.

Suppose, then, that we know of no other criteria: that is, we are not considering any other measured or unmeasured (of which there a number approaching infinity) characteristic. That is, we are not prepared to formally specify, or model, how these characteristics affect cancer. Understand: this includes the sample, or the way in which the sample was taken.

If we believe our mixed maternity ward/brain cancer ward sample is somehow “biased”, we must be prepared to model that bias. It is we who suppose the bias is a certain way. That, after all, is what it means to create a model. (Classical theory has a difficult time formalizing just how bad this sample is; people surely make statements that it is, but they do not do so based on formal probability.)

OK, no other criteria considered, including the way the sample arose, except exposure. Suppose we see a low p-value. Are we entitled to say that the cell phone radiation caused the cancer? No, as already explained. But can we make statements such as, “There is an 3% chance that if you use a cell phone, you will develop cancer?” Yes, but not using classical theory—you can say this using Bayesian statistics. And even then, it is not a statement of causality, merely one of correlation.

Classical theory only lets you say something about the p-value and that hypothesis mentioned yesterday. Don’t forget, though, that that hypothesis is actually only a statement about the parameters of a formal probability model of exposure and cancer. P-values do not say anything directly about chances about actual things happening.

So does going with Bayesian statistics solve all our problems? In other words, is that “3% chance” correct? Answers: no, and probably not.

All probability statements are conditional on specified information. That “3% chance” is a correct probability assessment given our information. If our information is faulty or biased, then so it the “3% chance.” Since we cannot know our information is true, then we cannot know whether the “3% chance” is true, either.

That is, it could still be that case that something utterly unconnected with what we measured or with our biological theory caused the cancer.

And with observational/epidemiological data, the chance, as experience has shown us, of something else causing the malady, is pretty high.

Obviously, this is just a sketch. My vacation ends today, but this was written the day before, hurriedly in a coffee shop on my way to lake for one last swim.

Categories: Statistics

### 4 replies »

1. DAV says:

Slightly OT and speaking of lawyers: I am reminded of one of the pieces of damning evidence against Vioxx showing the control group had a rather sudden drop in heart incidents about midway through the study while the Vioxx group kept the same rate. The presented graph was of cumulative incidents giving the appearance of Vioxx causing more incidents. Another conclusion could be taking a placebo is good for your heart. And even better one is the study was a waste of time. Ain’t science by lawsuit is fun?

It’s not quite OT as similar effects also arise in other double blind tests. The block method does little to identify them.

Another problem is: every one of these studies apparently disregards the magnitudes of the supposed effects which are aparently ever so slightly above the noise level. I doubt there has ever been a statistical study to determine if head gunshot wounds are bad thing to experience. Even if the supposed correlation really exists and it is truly the cause, should I change my lifestyle simply because my chances of getting X have doubled from 1 in a million to 2 in a million?

2. Ray says:

I was not aware that you could prove causality with statistics. Causality is a deterministic process. Statistics applies to random processes i.e. processes with more than one outcome.

3. Jim Fedako says:

Take a look at mortality tables and compare smokers with nonsmokers. Even in the earliest year reported, there is a difference in mortality between the two groups. The creators of the mortality tables are not stating that smoking itself kills 16-year olds. But smoking can be considered a proxy for risky behaviors (including smoking) â€“ and relatively early death.

Whether smoking is truly the cause or is only a proxy is not the issue. The insurance company is not concerned with causation between smoking per se and death. Their concern is predicting mortality rates of insured.

Similarly, cell phone use may be a proxy of other behaviors that leads to brain cancer. An insurance company could price policies based on cell phone use without claiming direct causation.

Sadly, the courts and regulators (and sundry nanny do-gooders) see direct links (causation) â€“ and affix blame â€“ where none exist.

4. POUNCER says:

http://www.jstor.org/stable/167252

Data enrichment. Suppose you know for a fact that higher levels of, say, second hand smoke, are more likely to cause a fair coin to land — after a fair toss — showing heads. The higher the concentration of smoke, the more heads. And data from experiments with lower level of smoke can be merged into a meta study with other data from other experiments to show the overall impact …

Couldn’t we then change the p-value?