A purposely absurd, yet telling, example. You’re a statistician and new recruit to ISIS assigned to crucify three score perceived enemies. However, you’ve run out of wooden crosses. But there are some sturdy metal poles that you think might make good substitutes. So you go to the chief and say, “Boss, I want to prove that crucifixion by metal pole is as efficacious as by wooden cross. I have drawn up an experimental design to randomize victims to either wood or metal. If all goes well, I’ll be able to show that death by metal-pole crucifixion is statistically identical with wooden-cross crucifixion.”

Why is the example absurd? It is true that since metal-pole crucifixion hasn’t been tried, victims might not die. The event is contingent. And since statistical evidence in the form of a “gold standard” randomized controlled trial hasn’t been supplied, how could anybody believe metal works as well as wood? Don’t skip lightly over these questions. Their answer explains why randomization isn’t needed, and that some experiments are of no utility. Essence and cause are once again present.

What is the purpose of an experiment? To provide evidence probative towards some proposition of interest. Here the proposition is, “Victims will die by metal crucifixion”. Evidently, the “data” that can be gathered is probative. If nobody dies while welded (tied, trussed, nailed, or whatever) to a metal pole, then we have learned that metal-pole crucifixion does not work, or does not work well. Contrariwise, if everybody dies strapped to a metal pole, this is also probative, and we’ll have evidence giving weight to our proposition. Since the experimental obviously fulfills the desideratum of an scientific experiment, we haven’t discovered why the example is absurd.

Similarly, randomization is meant to guard against the possibility of experimental error of a certain kind. Everybody “knows” randomization is a good thing; indeed, it is believed essential for a quality study. But since your experiment will use randomization, again we have fulfilled the standard desideratum of experimental design. And we still haven’t discovered why the example is absurd.

Could it be because, “It’s obvious that metal poles are no different than wooden crosses”? This is necessarily false. If there were no difference, they would both be made of the same material; and we’d probably not have different words nor would we be able to form separate mental images for the two objects. They are not the same; they are different; of course they are different! Plus, this difference was acknowledged by the design of the experiment. If you had thought they were identical, there would have been no need to gather new evidence. It is because there are known differences that you proceeded. Still no absurdity.

The answer is this: it’s obvious that people crucified to metal poles will die as they do when tied to wood crosses because the cause of death is excruciating exposure. This is an induction, and a true one, coming to us in syllogistic form as we saw in Chapter 3. We induce that the essence of the kind of death, the experimental “outcome”, is excruciating exposure and that however one is strung up, be it metal, plastic, wood, or some other substance, the result will be the same. We already have the evidence we need that makes the proposition of interest true, evidence supplied by induction-argument. Notice also that our interest was always the cause of death and nothing to do with metal poles per se. We didn’t want to gather evidence that would make the proposition of interest likely, we wanted to know what caused it to be true. And this we got for free.

Randomization wasn’t needed. But perhaps that’s because the experiment itself wasn’t needed. Perhaps in other instances the blessings provided by randomization are needed. They aren’t, as we shall see. But before I can prove that, we need to understand more the purpose of experimentation.

The difficulty lay in the definition of the experiment. An experiment is the process of discovering information probative to a proposition of interest. Experiments can be active or passive. They are passive when nothing but mental labor is involved in discerning this evidence as was the case in the crucifixion example, or they are mixed passive and active in cases where data is gathered in some (usually mechanical) fashion. So-called observational experiments are more passive than active, but they can be just as active as so-called controlled experiments; e.g. “chart reviews” in medicine. None of these dividing lines are sharp, and most experiments are really mixtures of these types, but controlled experiments usually see evidence generated or caused newly to be made under conditions controlled, to various extent, by the experimenter.

The experimenter in any kind of experiment is the person responsible for the three most important things: deciding the proposition of interest, stating what evidence is probative, and then gathering it. This isn’t an empty statement. Propositions of interest are not free for the asking. They are related almost always to decisions people want to make in the face of uncertainty. The possibility of mixup is great, because propositions that are answerable are often stand-ins or proxies for what is of real interest. Also, readers of experiments often mix up or misunderstand just what the proposition of interest was that guided the experimenter. By the time most “studies” reach the press they are as badly garbled as a Shakespearean sonnet conveyed by the Telephone Line game played by first graders.

Propositions of interest are anything from “The weight of this elementary particle is $y$” to, “The value of this biological measure is $w$”, to “The amount a person will spend is $z$”, etc. The times when we can deduce or induce evidence which tells the cause of propositions of interest are rare. Or, rather, they only seem so because nobody records the “mental experiments” like the crucifixion example. But because they’re not recorded, they come to seem as if they are not experiments, which is too bad. We only come to know the “hard cases”, i.e. those propositions where the evidence of cause is inconclusive. Or we know those cases in which the proposition of interest has been deduced given a set of circumstances (premises), and the experiments are such that they verify the conditions (premises)-proposition concordances. These are, of course, local deductions based on contingent premises and not necessary truths. If they were necessary truths, we’d again have no need to perform any experiment (except for pedagogical purposes).

What evidence is probative? This is the real question. Let’s work with an example. Y = “The value of this biological measure is $y$”. If I claim X = “This biological measure can only be $y=120$ mm/Hg”, then the probability Y takes any value but 120 mm/Hg is 0. Or I could have said, X = “This measure can only be 120 mm/Hg or 160 mm/Hg”, then the probability that it is 120 m/Hg is 0.5 and so forth. But where do these X come from? I made them up. Given the X I supplied, the probability of Y is deduced by the rules of probability (which only takes as condition the information supplied and none other). But if my audience is genuinely interested in Y, they are unlikely to be convinced that my probative X—and it is probative—is proper. What experiments are looking for, then, are X which are themselves true or can be reasonably believed given another or “outside” or observed set of premises.

For instance, suppose I conducted an experiment to take actual measures of this Y, and further suppose that each time the measurement was 120 mm/Hg. I announce, given my experiment—my X, which are my measurements—the probability of Y = “The measure is 120 mm/Hg” is high. Now you can choose to believe this or not. If you do, you are supplying tacit premises of the form, W = “This Briggs is honest and his measurements were error free and in the milieu, form, and type that I expect.” Then, given W and X (the conjunction), the probability is Y is high. But if you reject W and suppose instead, “I haven’t a clue what Briggs is on about; why are all the numbers the same?; maybe they weren’t of the form I expected”, then the probability of W and X is not high. (The probability of Y given X is, no matter what, high.)

Real experiments tighten this. They list all the premises which led to the measurements or collection of data thought probative of Y. No matter how good a job I do at listing premises (explaining the experiment), you still must trust if you are to believe. The best experiments find those premises where the probability of Y is extreme or high and where trust (or faith!) is high, and the worst experiments find premises which are murkily related to Y and were trust is low. The premises in which the probability of Y is extreme or high are those most related to the cause or causes of Y.

Control, true control, is what produces the best evidence, not randomization. To understand what causes some thing to happen, the ideal experiment is of course that which focuses on that thing or things that are the cause. If we can hold fast every condition which we assume might be a cause of Y (our proposition of interest) and vary or manipulate only one, we are then certain that the changes in Y are caused by this manipulated condition. This is a local truth. It is not a necessary truth because it presumes that we have identified all possible causes. Since Y will usually be contingent, it is likely we might err in this presumption, especially if Y concerns complicated matters like human behavior. Of course, holding all things constant is a tremendous demand, one that perhaps can never be met in practice. This is the implication of, for example, Nancy Cartwright’s work, who has repeatedly emphasized the necessity of identifying true causes.

If we can hold everything constant and manipulate one X and witness the changes in Y, then we can make statements like this: “Assuming all other things constant, and set at these certain levels, when X = $x$, Y = $y$.” This produces a local truth that Y = $y$ (and possibly even a necessary one if the “all things constant” is sufficiently tight, i.e. deduced from axioms). Probability is not needed: there is no uncertainty. This easily extends to multidimensional X and Y. Since this is so, we don’t need randomization when we can control. Indeed, randomization is the opposite of what we want. Randomization could introduce variation into those things which are potentially causative and we thought we were controlling!

True controlled experiments demonstrate cause; they confirm cause. But because controlling everything that might be a cause of Y is difficult or practically impossible when Y has many causes, as does human behavior, we often have to settle for experiments where only some, or maybe even no, things that are causative can be controlled or observed. We posit race as one of many causes of income. We can measure race and income, but it is clear that race is not (or almost never not) the sole cause. We can now only say things like this: “Assuming race is causally related to income, and given some observed race-income pairs, as well as assuming some technical modeling details, when race is K, the probability of incomes (for people we haven’t yet measured) larger than $x$ is $p_{\mbox{K}}$, whereas if race is J, the probability is $p_{\mbox{K}}$.” We’ll discuss “causally related” in the next Chapter. We would also have to say “Assuming race is not causality related to income, and given some observed race-income pairs, as well as assuming some technical modeling details, when race is J or K, the probability of incomes (for people we haven’t yet measured) larger than $x$ is $p$; i.e. $p$ is the same for J and K.”

Causal relations are what drive the probability. The uncertainty is only in those causes which we could not measure. And if we cannot measure potential causes because we don’t know what they are, then randomizing does nothing for us. It gives probability no special boost; randomizing does not, as it is tacitly thought, bless statistical results. If we don’t know what all potential causes are in some human experiment, for instance, we then do not know if person 1 has a potential cause and person 2 does not and so on, therefore randomizing does nothing. And there may be many causes each person possess that are unknown to us, thus mixing people up helter-skelter is absolutely no guarantee of producing equal mixtures of people with these unknown causes in each of our experimental groups. We are flying blind. What probability does for us is to regain partial, but still hazy vision for those causes we have assumed are operative.

Besides, “randomizing” isn’t even a thing that one can do in the mystical sense people usually take that word. Since randomness only means unknown, to randomize can only mean to make unknown and that is the opposite goal of any experiment! Adding “randomness” to experiments does not make them valid, it makes them worse. Thus randomizing, if it means anything, means removing control. And that is fascinating.

There is still the vague unsettling notion that “randomizing” does something for us. And this is true. It can, in some situations, restores trust, or rather allay suspicions of deceit. But this is only when randomizing is used in its proper sense of removing control. When dealing with duplicitous lying unscrupulous scheming self-decepting conniving canny chiseling human beings we need often to have some procedure to lessen the chance of falsehood. This is why we have referees flip a coin to decide who gets the ball first. The coin flip is random in the sense of unknown. It is also chaotic in the sense that it is sensitive to conditions. We all understand it is difficult—but not impossible!—to manipulate the flip so that a given outcome occurs. In this way, we let “nature” decide who gets the ball. Or, better, we remove the knowledge that some human being is cheating us in some way. Feelings aren’t hurt. We also remove control (randomize) patients in medical trials. Doctors are as prone (and maybe more?) to self-deception as the rest of us, and they can too easily veer patients into treatment and control groups to make the treatment seem better or worse than it really is. So we remove control from the doctor and give it another, say, a statistician. This isn’t ideal because usually the statistician thinks he is “randomizing” in the mystical sense. It would be far better to examine each patient and control assiduously which group in an experiment this patients goes, and for this control to be open so that fears of abuse are minimized. Stephen Senn has many insights on this subject. And notice physicists don’t rush to “randomize”: they control. That’s because cheating, including self-cheating, is far less of a problem (but not non-existent).

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

3 Comments

shawn marshall

October 28, 2024, 8:40 am

Experiment:
an engineer constructs two identical buildings – 30 ft tall – 200 ft apart – isolated from other structures, trees & etc, and vented at the eaves
inside he installs in each a grid of electric heating wire in earth saturated with water (volume of earth and water measured and controlled)
electric meters and controls set to maintain T at 90F
Above each he installs a plastic cloud translucent to IR perhaps 20 ft in height while allowing room about the perimeter for the free flow of rising air
One cloud is filled with ambient air and allowed to stabilize in T; the other cloud filled with CO2 and allowed to stabilize
once everything is stable – he records the amount of energy required to maintain the earth/water grid at 90F
Then he runs the same experiment reversing the clouds/building combo
if CO2 heats the earth by down transmission the change in energy should be measurable and calculable
What is he missing?
Uncle Mike

October 28, 2024, 9:22 pm

Pedagogical is one of my favorite words. If it’s probative, so much the better. I have no idea why.
Paul Murphy

October 29, 2024, 11:57 am

Marshall:
What your engineer is missing is pretty much everything about the CO2/warming issue. Specifically, what your guy is doing is broadly similar to the al gore two jars experiment in which a jar filled with air is shown to absorb less heat from the sun (and take longer to cool) than one with CO2 enriched air inside. Unfortunately the effect is entirely due to the additional mass because CO2 is denser than normal air and so has nothing to do with the issue.

If you want to explore the issue for yourself check out NOAA radiosonde data – some coverage from about 1906; good coverage (1000s per day, from airports worldwide) since about WWII. These things measure temp and presure from launch level well into the stratosphere. Warming gasses expand, so if the earth is warming you expect the data to show at least two the following three things: increased temp throughout; increased presure throughout (troposperic warming due to CO2 predicts stratospheric cooling); and/or change in the altitude at which the tropo ends and strato begins (sudden temp change).

Spoiler alert: there’s no evidence of longer term, unidirectional, change.

Sam on No posts, class. No power, no internetApril 2, 2025
So this is an open thread then. Something to do while Sarge is on break. Read some of Feynman's Lectures…
Hagfish Bagpipe on No posts, class. No power, no internetApril 1, 2025
Ice storm. No power. How romantic. Break out the candles, stoke a fire, and open another bottle of wine. After…
C-Marie on No posts, class. No power, no internetApril 1, 2025
Miss you, too!! God bless, C-Marie
Brian (bulaoren) on No posts, class. No power, no internetApril 1, 2025
I miss you, old father William. Do you think that Greenland is the new island of Melos? I was thinking…
bob sykes on MIT Scientists Trying To Invent ZombiesMarch 31, 2025
Evidently, they haven’t heard of Dr. Frankenstein. When did science and scientists become evil?

Class 26: Randomization Does Nothing

Video

Lecture

Experimental Design & Randomization

Related

3 Comments

Leave a Reply

Video

Lecture

Experimental Design & Randomization

Share this:

Related

3 Comments

Leave a Reply