The Gold Standard Of Experimentation Exposed As Pyrite

Everybody says the “gold standard” of experimentation, and especially clinical trials, is the randomized controlled trial. A typical view is that “A Randomized Controlled Trial is an experiment or study conducted in such a way that as many sources of bias as possible are removed from the process.”

This view is false. Randomization cannot remove bias, but it can add it. Even stronger, everybody, including proponents of randomization, has always known this.

You have handfuls of two seed varieties and a large field in front of you. Which seed provides greater yield? Well, plant the seeds, grow the crops, and then count or weigh the harvest. For example, on the north end of the field, plant seed A; on the southern, plant B.

But wait! Upon inspection, the field has a definite slope from north to south, and since water flows downhill, and water almost surely will affect seed growth, we should control our plantings so that equal numbers of A and B seeds are uphill and downhill.

Maybe those trees on the east side of the plot will cast too much shade before noon, so we can’t just put all A (or B) on the east side, and all B (or A) on the west. Instead, we can chop the field into four blocks: a northwest, northeast, southwest, and southeast. We then apportion the seeds A and B into these blocks so that each variety overall receives equal shade and water.

Is that it? Well, it turns out that soil analysis reveals that the nitrogen content across the field has a definite pattern. The plants will make use of this nitrogen, so we cannot ignore the unequal distribution of it. So, even though it’s difficult, we can chop the field into blocks such that water drainage, sunlight, and nitrogen are overall equal for both A and B.

Obviously, we can keep on doing this for each factor that we believe has a causative relationship with seed growth. Further, everybody agrees that this is a sensible strategy.

And so it is. Controlling for known or likely causative factors is just the thing. Experimenters in all fields practice control in their setups routinely. Chemists have standard atmospheres, physicists are careful to define all the conditions in which an experiment takes place, and so on.

So why is it that some would go further and ask for “randomization”? What does that do for us? Well, nothing. For example, after chopping our fields into blocks such that we are sure A and B are equally represented across environmental conditions, some would ask that A and B be “randomized” to these blocks.

That is, a (computerized) coin flip would decide the precise order of planting. Supporters of randomization say that this eliminates unseen bias. The field has been blocked, but there still might remain causative agents or factors that we did not measure that will affect seed growth. They say randomization assures us that these unknown, or unmeasured or unmeasurable, factors are evenly split between the seeds.

Mathematically, this is false. Since the number of possible causative factors that might affect seed growth are practically limitless—you’d have to specify the movement of every quark, or superstring, every photon, every graviton, etc. of the seeds, soil, sunlight and so on to be certain of equality—the probability that you have achieved balance between them all by a coin flip is as close to zero as you like. In other words, imbalance is guaranteed. (See this explanation for the math.)

It is control which is important, not randomization. Control is the true gold standard. Another example: suppose you commission a poll to ask, “How’s President Obama doing?” and then called only people in San Francisco. Would that sample be representative of voters across the States? Obviously not.

You want to control your sample for at least geography, if not income, sex, age, party affiliation, and so on. A collection of randomly dialed telephone numbers would be more biased than a carefully controlled sample.

Randomization can provide some benefit, but only in one narrow case: when the experimenter is likely to cheat. For example, in drug trials nobody trusts the pharmaceutical company, nor the doctors on its payroll, to choose who gets their drug and who the placebo.

The temptation for the drug firm to gives its pill to sicker patients, and the placebo to healthier, is akin to putting a bowl of candy in front of a hungry kid and saying, “I’m going out. Don’t eat any.”

True control—taking into account each patient’s history, genetics, etc.—by an outside disinterested agency would be better, but, baring that, a coin flip will do.


  1. In your last example of how randomization may be beneficial when the experimenter is likely to cheat, you describe what I have always considered “controlling for the experimenter”. In other words, it is exactly the same as controlling for the soil or runoff characteristics in your seed planting example. Only in this case, you are controlling for the human desire to gain a favorable outcome to their viewpoint.

  2. I use the economic term “gold standard” in this sense too, even though I know it’s a misuse. I’m afraid we are stuck with it.

    I’m not sure I agree with you on your main point. It is theoretically possible for some crop-yield-relevant characteristic of the land to correlate highly with a randomly selected planting order. But is this really possible in reality? Isn’t it highly likely in reality for such characteristics to be rather smoothly-varying functions of position on the land?

    I see an analogy with lossless image compression. Suppose you take a picture of your cornfield with a digital camera, so that the raw data is just a matrix of integers from 0 to 255. It is theoretically possible that randomly scrambling the pixels of your image will give you a picture that is *more* compressible with respect to a given lossless compression algorithm. It’s even theoretically possible that you could randomly produce a picture of Abe Lincoln. But in reality, the original picture is much more likely to be compressible, regardless of the choice of algorithm.

  3. You block for known factors that impact variability because it removes that variation from the error term and thus improves your ability to dectect experimental effects.

    If you randomize known factors that impact variability, you force that variability into the error term thus increasing the error term, and decreasing your ability to detect experimental effects.

    I always describe randomization as an act of desperation that at best may prevent an unknown factor from being confounded with a test factor. But it never decreases your error.

    BTW, I’ve noticed you indirectly taking shots at Fisher lately. Am I imagining things? Or can we expect a full-bore anti-Fisher treatise from you someday?

  4. Matt:

    Mike B seems to have nailed it. Though I am not sure I see the difference between “If you randomize known factors that impact variability” and “If you ignore other known factors that impact variability” given my understanding of randomize.

    OT. Some interesting Bayesian things are occurring at James Annan’s blog and Roger Pielke’s Jr’s.
    The subject is measuring the “skill” of a model by comparing the projections of a model to that of a naive forecast. It may be worth a look. Check Annan’s August 5th post.

  5. What if the seeds in your grain sacks vary in viability from top to bottom?

    I grant that after properly controlling for *all* the factors that actually make a difference, any method of assignment would be equally good.

    But unless you’re certain that you’ve controlled for *all* those factors (and how could that ever be the case?), random assignment across equivalent cells is a better strategy than take your pick of non-random assignment methods.

    Or so I still think. Briggs is an actual statistician, and he may wear me down on this point.

  6. Morgan:
    The problem is that the impact of an uncontrolled variable is manifested after your randomization in an unknown way. This inherently limits what you can actually say. Economists get around this major epistemological issue by pronouncing the magical incantation ceteris paribus. The bottom line is the experimenter presents their results as if they were more certain of them than they actually have a right to be.

  7. Mike B,

    Fisher was a brilliant man—he could think rings around me—but he made some odd and conspicuous errors. Like eugenics. And p-values. So your suggestion might be a good one.


    We’re not actually controlling for the experimenter, but providing us external information that the experimenter couldn’t cheat (easily).


    Smoothly varying functions it is: but it’s still control that you’re after. Randomization doesn’t buy you anything.


    We can never control for all. It is practically impossible. If we could, then we could devise a seed-soil combo that gives us identical yields each and every time.


    Amen to the “all things equal” being just as valid.

Leave a Comment

Your email address will not be published. Required fields are marked *