Observational Versus Controlled Trials

Received this email from a reader:

I took on board all I read on your website, and it has created confusion in my mind. I have been reading Ioannidis and others about the disagreement between observational studies and RCT’s. I know that it is impossible to adjust statistically for all confounders, as many are unknown in observational research. You pointed it out, and elsewhere too there is a lot of talk about the failings of epidemiology and its servants, the cohort and case/control study. I am about to enroll in a PhD where I will be conducting a observational study. Is observational studies and epidemiology a pseudo science? Should I enroll? Would it not be better to focus my energy on learning about clinical trials? You are the only person I know that can help me, it is driving me nuts! I do not want to pretend I can see the emperors new clothes…

Hope you can help me.

About whether to enroll, I am the worst guide. If pressed, I’d probably recommend a series of high risk Nigerian securities as the surest path to riches. I do know that running a blog is not a wise career choice.

But about the difference between observational versus controlled trials, I am your man.

There is nothing in the world wrong with observational studies. Indeed, nearly all the knowledge we develop is based on observation. Think of everything you’re certain and uncertain about. What is your wife/mother likely to say when she walks in the door at night? You don’t know for sure, but based on long observation you can make a good bet. You’re adept at guessing whether it’ll rain that day by looking at the morning sky.

Yes, you’re not using formal, i.e. cookbook, models, but what of it? Most probability, as I often say, is not quantifiable. Informal judgments are adequate for the bulk of life. I needn’t go on: you get the idea.

Now, assuming no calculation mistakes, all surveys, observational studies, polls and so forth are valid for the type or kind of data they represent. Thus (assuming no cheating, too) the polls flashed on the screens of Fox News and MSNBC are perfectly representative of the type and kind of people who would call or log in to those stations.

Which, given our daily observational life, tells us these people are not the same type and kind as other citizens. How applicable are the polls from the stations to the rest of the country? There’s no real formal way to know, but you can guess pretty well. Or you think you can.

Your giant observational trial, whatever it is, assuming no cheating, purposeful misinterpretations (e.g. the epidemiologist fallacy), and the like will also be valid for the kind and type of data it represents.

In it, you will be interested in some thing, call it Y. The goal of this study is to say, of the rest of the data, assuming the model is good and given that X1 = a, X2 = b, Xp = p, the probability Y = y is this-and-such. Simple as that.

The grand mistakes, and the reasons observational studies have poor reputations, are two. One is that people can’t stop themselves from making causal interpretations of the Xi. It may be in your data that as Xi varies over its range the probability Y = y changes a lot. But that does not imply that Xi is of any causal relation to Y.

Two is not knowing how the Xis came to be what they were: we took them as they came, just like in the TV polls. This mistake boils down to the same as over-interpreting the TV polls, claiming they say things about all Ys and not just the kind who would show up to your dataset.

Another difficulty, incidentally, is that people forget probability models are epistemological not metaphysical.

Experimental trials are not too different. Except that you in advance manipulate some of the Xi, accept the other Xj as they come just like in an observational study, and watch what the Xs do to Y. Experimental trials are thus always partly observational.

The reason experimental trials have a better and deserved reputation is that, over a long period of time, for the Y of interest, people have whittled down the Xi to a set that experience has shown are more closely related to Y, and some of which may even be causally related to Y (we still accept some Xj observationally). And they’re able to pick the Xi from the type and kind of area they claim the study represents.

Gist: there is nothing fundamentally wrong with conducting observational studies.


  1. One general comment: a limitation of observational studies is that you can only measure the impacts of x’s in the ranges and combinations in which they’ve “naturally” occured. You’re also sometimes limited in the y’s that have been observed.

  2. The only reason to avoid those lines of study is to avoid the internal dissonance that is likely to result when you discover that telling the emperor he is not wearing any clothes has a high correlation with you not having any clothes.

    As our beloved and esteemed host is fond of implying, the clothes do make the man. Not having clothes is not good for getting the next job. Science should profoundly respect those that point out states of dishabille, but people profoundly respect having food on the table. It turns out that most scientists are people.

    I am not sure I am willing to state that all scientists are people though.

  3. Dear Dr. Briggs,

    While you are at it, could you also critique MRPP? I mean Multiple Response Permutation Procedure, not the Movimento Reorganizativo do Partido do Proletariado (the Portugese Communist Party). The procedure for (statistical) MRPP can be viewed here:


    MRPP is quite popular out here at Pucker Brush U. Any bag o’ data can be squeezed through MRPP and the results are always “significant”. I try to explain to practitioners that the treatment is a crock, but they don’t care. What can I say to them to cure them of their delusions?

  4. Uncle Mike!

    Holy Moly, I thought you’d climbed to the Great Treehouse of the Sky!

    Great to hear form you. I’ll look into MRPP. Never heard of it before. Notice that the R package name is ‘vegan’, which is not a good omen.

    Update Did a quick check. Their example has a sample size of 20 (yes, 19 + 1) and 30 (yes, 30) variables! They ran a test to say whether breaking this into 4 groups of sizes 3, 5, 6, and 6 were “significantly” different. Lo! They were.

    So a sample of size 3 and 30 variables differs, with p-value of 0.001, from a sample of 5 and 30 variables etc.?

    Don’t understand it all yet, but it smells mighty fishy.

  5. Would it be fair to say then that you can, with great caution, learn something about the real world by looking at the results of a well conducted observational study? It sounds like Briggs is saying that the results from an observational study could be useful, with caveats. Observation shows the sun will rise in the east, based on seeing it happen every morning. If we see such patterns in the results of say a case/control study, and the results make biological sense, and the effect is strong, and we have accounted for all the confounders we know of, can we tentatively trust the results? Of course we know that no causal relationship is proved (and nothing assures that you will see the association again if you re-run the study). But then seeing the sun rise in the east does not guarantee that it will rise in the east again. But with our understanding of Newton etc. and our observation of the sun rising we can make a good case for it likely rising in the east tomorrow morning again. If we humbly accept the limitation of these epidemiological studies, are they useful?

  6. Francsois,

    The only difference between experimental data and observational data is the manipulation. Causal relationships are still established in the same way. Using observational data makes the determination less certain because, as Briggs pointed out, you don’t really know why X changed.

    Epidemiology lately suffers from dealing with causes and effects that are near the noise level (e.g., does picking nose hairs cause toenail cancer?). It’s problems are no longer glaring as it was with the Broad Street Pump; there are more practitioners; and the ever-present need to publish. It’s a field which may have outlived its usefulness.

  7. Well, I don’t know that DAV and Matt are exactly precisely correct about epi. For one thing, dumb epidemiologists never get above the noise level, pretty much by definition; maybe there’s some smart ones. That said, even the best epidemiologists don’t know nothin’ about predictive statistics; but the same can be said for high-energy physicists.

    I recommend that Matt’s correspondent at least read Ken Rothman’s Epidemiology: An Introduction, 2nd Edition to get an idea of what a working epidemiologist thinks are the problems and issues within epidemiology.

  8. I once had a conversation with a fireman of a diesel locomotive with no fire to tend. He also had occasion to justify his career choice as he was acutely aware that his only function was to keep the engineer awake and to be a second person not engaging the dead-man switch in an emergency (only on a railroad is this considered to be a two-person job).

  9. DAV,
    Not sure how this is any different than having a copilot in a jetliner. A runaway train is a heavy and dangerous projectile.

  10. Scotian,

    So is my lawn mower it seems. Fortunately, it doesn’t require a second person to not operate the dead-man as I often can’t afford to hire the help.

  11. Scotian,

    I think so. It’s my favorite one but then I only have one. The dead-man is a pain in the anatomy and not just in my hands. I’m often tempted to disable or at least circumvent it.

Leave a Comment

Your email address will not be published. Required fields are marked *