Skip to content
September 20, 2013 | 16 Comments

On-Line Courses Not Faring Well—As Expected

A revolutionary means to an education.
A revolutionary means to an education.
Two months back I made the non-joke that MOOC’s—massive off-line open courses—were not new, not “revolutionary”, and were therefore not going to “change the face of education as we know it.”

Note off-line and not on-line.

Libraries (in most places in the West) have been free and open to anybody for centuries. Any person, regardless of circumstance, could walk in, pick up a book on Introductory Latin or The Philosophy of Physics, read it, assimilate its material, and then walk away educated.

Not only that, but they could have done it repeatedly for as long as they had the patience or capacity. There were books on any subject, therefore any subject could have been learned.

Since the libraries were free and ubiquitous, their patronage encouraged almost from the cradle, and their benefits nearly infinite, you would guess, if your thoughts tended towards Utopianism, that we should be surrounded with a highly educated, erudite citizenry.

But if you were a realist, you would have realized that most people don’t want to know much beyond what’s needed for their daily existence. Learning is difficult, is brutal hard work, and the payoff is in most cases not obvious. What makes it worse is that reading, the fastest and most secure way to assimilate knowledge, can be painful and time consuming. It is a solitary, quiet avocation and most of are too gregarious to keep at it for the extended periods necessary to master a topic.

Well, the realists were right. Before television, before even radio, libraries were full and people read; mostly for entertainment, but read they did. Now libraries are sometimes just as full, but with people checking out DVDs or sitting at Internet terminals (as you are doing now). Books are now secondary—at best. They and reading will never disappear, but they will become the habit of only a fraction of us, as was true historically and as is part of our nature.

So it should come as no surprise to hear that the on-line courses aren’t living up the ecstatic hyperbole which accompanied their announcement. According to Politico, they have “high dropout rates and disappointing student performance among those who stick it out.” Completion rates average 10%.

They “found that disadvantaged kids performed particularly poorly and students found the courses confusing.” The same students, that is, who are not heading to the library.

Some expert or other—the kind likely to use words like “Learning-management systems”, “revolutionary impact”, and “Transformational Learning”—was heard to say, “The elephant in the room with online learning has been that these courses don’t equate with the quality in face-to-face courses.”

That’s the kind of thing expected as universities transform from colleges to corporations. What was already known is given more impressive sounding labels.

Skip all that. Scan this: “Some college faculty members don’t trust the courses or actively work against their formation”.

No wonder. What professor wants his material packed into a YouTube video—which makes his presence redundant. At least as far as teaching goes. He’s still needed to solicit money from Leviathan, which in turn is needed to feed the Dean.

Make that seemingly redundant. Professors have always been in the habit of writing books, which distills and concentrates the same material. And these books could have been, but largely were not, read free.

The only difference I can see between off- and on-line “open” courses is that the latter are easier to track and award “credits” for. I.e. they are useful steps toward “degrees”, the thing the vast majority of students and employers desire.

There’s probably a way to do courses on-line, but they’ll mimic the off-line ones. That is, lectures to groups of students which allows back-and-forth questions and answers, homework (reading!), writing, unique exams (and not generic multiple-choice which any college might use for the course of the same name). But there goes most of the cost savings and all of the hype. And out the door go the consultants and “education specialists.”

September 18, 2013 | 10 Comments

Abortion Safety: Doctors V. Nurses & Physician Assistants & Midwives—Part II

Update 4

Too late. 10 October 2013.

Update 4

Moved this to top because the bill allowing non-doctors to perform abortions is on Gov. Brown’s desk. He’ll likely sign, but those who care about “women’s health” should be careful what they wish for. I tender this critique on the very rare chance it will cause Brown to change his mind.


Be sure to first read Part I where the language used in the study and in this analysis is explained. (It will be obvious in your comments whether you have done so.)

Today we analyze the paper, “Safety of Aspiration Abortion Performed by Nurse Practitioners, Certified Nurse Midwives, and Physician Assistants Under a California Legal Waiver” in the American Journal of Public Health (2013 March; 103(3): 454–461) by Tracy A. Weitz and others (link).

Executive Summary

Knowing that many won’t or can’t read everything below, my main findings are provided here for ease. I wish this could be shorter, but not everything is easy.

The study stinks and can’t be trusted. There is every indication the work was done sloppily. Peer review failed to catch some pretty glaring mistakes, a not-rare occurrence. The protocol was a mess. The actual complication rates reported by the study were deflated because of an unwarranted, extremely dicey assumption about missing data. It appears that non-doctors have complications rates about twice that of doctors, even though the authors claim they are “clinically” the same.

Update New readers interested in commenting may also enjoy this article on the genetic fallacy.

Sample Size

The paper reported that 13,807 women agreed to participate in the study. Of these, 2320 were excluded because they were used to train the non-doctors. The complication rates for the training were never given—peer review should have insisted they were. How many mistakes are made by non-doctor trainees as opposed to doctor trainees? We never learn.

That left 11,487. The authors next report “[a]s a result of a protocol violation at 1 site, 79 patients in the physician group were excluded.” This should leave 11,408, yet the authors say “The final analytic sample size was 11 487; of these procedures, 5812 were performed by physicians and 5675 were performed by NPs, CNMs, or PAs.” It appears that it should read 5733 for physicians.

Now 5812 + 5675 = 11,487. Keep these numbers in mind. They were used for all subsequent calculations.


The authors’ concern was whether the killing of lives inside the uteri of women by “doctors” or “physicians” (see Part I for definitions) or by “nurse practitioners,” (NPs) “certified nurse midwives,” (CNWs) and “physician assistants” (PAs) resulted in greater or lesser rates of “complications.”

What is a “complication”? The authors never fully say. There are two parts to any such definition: the time span over which complications occur and the specification of what counts as one. For the time span they say this:

Each patient received $5 and a follow-up survey about medical problems after the [killing] to capture any delayed postprocedure complications. If patients did not return the survey, clinic staff made at least 3 attempts to administer the survey by phone. If the patient experienced post[killing] problems, she was asked a defined set of questions to obtain medical details. Additionally, staff conducted patient chart abstractions 2 to 4 weeks after [killing] to ensure delayed complications were captured.

It appears—but only appears—from this that immediate, i.e. on-site, post-procedure complications were recorded. Others were self-reported by some of the patients from “2 to 4 weeks” after. This is a sloppy protocol. A rigorous one would have specified the exact time window for follow ups. As it is, there could have been complications after two weeks but before four which would be missed by the lax protocol. All these (potential) complications went unrecorded, thus the study underestimates the true complication rate at 4 weeks.

As is typical in medical trials, there was significant loss to follow up, i.e. not every woman could be contacted. The authors say that only 69.5% of the 11,408/11,487 were measured.

Their next step was highly problematic: they decided to code each missing value as “no complication”.

They explain this by assuming that any un-contacted woman who did suffer a complication would have gone to “the facility” where she had her killing and reported it. Indeed, 41 women did so. But to say that all 3479/3503 (depending on what grand total we use) did is completely unwarranted and even ridiculous: the women could have seen their own doctors or “rode out” the complications at home, not contacting anybody. This is a shocking error.

We also don’t know how many of the women were lost to follow up in each group. Were most lost in the doctor group, perhaps because these women felt fine and because those in the non-doctor group had higher complication rates? We never learn. But, just to have some feel, assume the loss was (roughly) equal in each group. That leaves (ignoring round off) 7928/7983 in total, or 3984/4039 in the doctor group and 3944 in the non-doctor group.

Another error: we never learn whether the complications were ad hoc or whether they were pre-specified. If they were defined, as it appears, “on-the-fly” the authors’ statistical findings are of no generality. Peer-review let us down here (as it so often does).

We can still learn some things, however. Minor complications, to the authors, are at least (from their “Outcomes” section):

  • incomplete [killing],
  • failed [killing],
  • bleeding not requiring transfusion,
  • hematometra (retention of blood in the uterus),
  • infection,
  • endocervical injury,
  • anesthesia-related reactions,
  • uncomplicated uterine perforation,
  • symptomatic intrauterine material,
  • urinary tract infection,
  • possible false passage,
  • probable gastroenteritis,
  • allergic reaction,
  • fever of unknown origin,
  • intrauterine device-related bleeding,
  • sedation drug errors,
  • inability to urinate,
  • vaginitis.

Major complications included:

  • uterine perforations,
  • infections (presumably worse than minor),
  • hemorrhage.

To prove this list incomplete, some common complications like sepsis, septic shock, and death are not listed (presumably these and others were 0% for each group; “common” in the sense that these are tracked in other studies).

Whatever a “complication” was—and we must remember that the list was incomplete—the authors expected “rates ranging from 1.3% to 4.4%”; specifically, in their sample-size calculations they used the “rate of 2.5%, which was based on mean complication rates cited in the published literature.” Keep this in mind.

Because of the way the study was designed (discussed below), the authors “anticipated a slightly higher number of complications among newly trained NPs, CNMs, and PAs than among the experienced physicians.” Was this the case? Here are the complications given in tabular form with rates (percentages) for doctors (using the reported n = 5812 killings) and non-doctors (n = 5675 killings):

Complication Doctors Non-doctors
incomplete [killing] 0.155 0.423
failed [killing] 0.120 0.194
bleeding not requiring transfusion 0 0.035
hematometra 0.052 0.282
infecton 0.120 0.123
endocervical injury 0.344 0.352
anesthesia-related reactions 0.172 0.176
uncomplicated uterine perforation 0 0.053
symptomatic intrauterine material 0.275 0.282
urinary tract infection 0.017 0
possible false passage 0.017 0
probable gastroenteritis 0.017 0
allergic reaction 0.017 0
fever of unknown origin 0 0.018
intrauterine device-related bleeding 0 0.018
sedation drug errors 0 0.053
inability to urinate 0 0.018
vaginitis 0 0.018
uterine perforations; infections; hemorrhage 0.052 0.053

The authors did not specify the breakdown for major complications for doctor and non-doctors, except to say there were 3 instances in each group. This is a mistake.

Now except for four minor complications the rates were higher for non-doctors. Where the doctors had higher complications, there was only 1 instance of each complication and two of these were uncertain (they might not have been complications after all). This result (the ordering) is the same if the not-guessed at data is used.

Overall, using the reported numbers, doctors’ rates were 0.9%, and non-doctors were twice that at 1.8%, which also uses the unwarranted assumption that all those lost to follow up did not suffer a complication. Using just the observed and not guessed-at data, the rates were 52/(3984/4039) = 1.3%/1.28% (doctors) and 100/3944 = 2.5% (non-doctors). Note that these larger rates are more in line with what was expected from the literature.

The raw conclusion is thus: that for these practitioners and at these locations and for these females, doctors had complication rates about half those of non-doctors.

Yet the conclusion of the authors was (from the Abstract):

Abortion complications were clinically equivalent between newly trained NPs, CNMs, and PAs and physicians…

Why the discrepancy? The miracle of statistics. But first, the study design.

Study Design

The study was not blinded. Those recording complications knew who did the procedures and knew the goal of the study. Never a good idea.

Women presenting to the 22 facilities were asked whether they wished to have their killing done by an NP, CNM, or PA. If she agreed, one of the 28 NPs, 5 CNMs, and 7 PAs did so. But sometimes—they never say how often; more sloppiness—she was sent to a doctor if “clinical flow necessitated reorganizing patients”. Or she was sent to one of 96 doctors if she requested one.

This loose protocol is problematic. Could women who saw themselves as sicker or tougher to treat (or whatever) have requested doctors more often than non-doctors? It’s possible. In which case, the complication rate difference between the two groups would be artificially narrowed.

About half the women (in each group) were “repeat customers”, incidentally, with about one-fifth (in each group) having had two more more previous killings.

Statistical Interlude

One real question might be: “Which is less dangerous? Getting a killing from a doctor or a non-doctor?”

Now the evidence before us is that, in this study, (even assuming the reported numbers as accurate) non-doctors were associated with complications at about twice the rate of doctors. But what about future killings? Will they, too, have about twice as many complications for non-doctors?

To not answer that, but to give the appearance of answering that, the authors used two classical (frequentist) statistical methods: one called “noninferiority analysis” and another called “propensity score analysis.”

Propensity scores are controversial (Yours Truly does not like them one bit) and are attempts to “match” samples over a set of characteristics. Suppose, for example, the doctor group had more smoker patients than the non-doctor group and so forth for other measured characteristics. Propensity scores would statistically adjust the measured outcome numbers so that characteristics were more “balanced.” Or something. Anyway, even with this “trick”, the authors found that complications were “2.12…times as likely to result from abortions by NPs, CNMs, and PAs as by physicians.” Since this is roughly the same as the raw data, there is no story here.

Or so it would seem. For the authors next engaged a complex statistical model (for the noninferiority piece), once using the propensity scoring and once not, and reported no difference between the groups.

We fit a mixed-effects logistic regression model with crossed random effects to obtain odds ratios that account for the lack of independence between [killings] performed by the same clinician and within the same facility and cross-classification of providers across facilities. We included variables associated with complications in bivariate analyses at P < .05 in the multivariate model in addition to other clinically relevant covariates to adjust for potential confounders.

It is a mystery which “clinically relevant covariates” made it into the models: all of them (from Table 1)? Some? Others not listed? Who knows.

What they should have done is listed, for each practitioner, the number of killings he performed and the number of and kind of complications which resulted. We never learn this information. Site was in the model, as it should have been (some sites presumably have higher complication rates, some lower; just as some practitioners have higher rates, some lower), yet we never learn site-statistics, either. We also never learn if complication type clustered by practitioner or site.

We never see the model (no coefficients for any of the covariates, etc.). All that is reported is that the “corresponding risk differences were 0.70% (95% CI = 0.29, 1.10) in overall complications between provider groups.” Well, this is all suspect, especially considering the model is using the dodgy numbers. While there are good reasons for posting the data by practitioner-by site, there is little reason to trust this (hidden) model. It is far too complicated, and there are too many “levers” to push in it to trust that it was done correctly.

In any case, it is the wrong model. What should be given is the prediction: not how many complications there were—we already know that—but how many we could expect in the future assuming conditions remain the same. Would future groups of patients, as did these patients, suffer more complications at the hands of non-doctors? Or fewer? We just don’t know.

Wrapping Up

There were 40 non-doctors and 96 doctors doing the 5675 and 5812 killings. That’s an average of 142 killings for each non-doctor and 61 killings for each doctor. In other words, the inexperienced non-doctors did more than twice as many killings than doctors. An enormous imbalance!

The study ran from “August 2007 and August 2011.” This is curiously long time. Were the same practitioners in the study for its duration? Or did old ones retire or move on and new ones replaced them? We never learn. The authors report that non-doctors had a “mean of 1.5 years” of killing experience but that doctors had 14 years. Given the study lasted four years, and that training was part of the protocol, this appears to say that the non-doctors were not constant throughout the study. How could this affect the complication rates? We never learn.

All in all, this was a very poorly run study. The evidence from it cannot be used to say much any way: except that just because a study appears in a “peer-reviewed journal” it does not mean the results are trustworthy. But we already knew that.


How To Mislead With P-values: Logistic Regression Example

Today’s evidence is not new; is, in fact, well known. Well, make that just plain known. It’s learned and then forgotten, dismissed. Everybody knows about these kinds of mistakes, but everybody is sure they never happen to them. They’re too careful; they’re experts; they care.

It’s too easy to generate “significant” answers which are anything but significant. Here’s yet more—how much do you need!—proof. The pictures below show how easy it is to falsely generate “significance” by the simple trick of adding “independent” or “control variables” to logistic regression models, something which everybody does.

Let’s begin!

Recall our series on selling fear and the difference between absolute and relative risk, and how easy it is to scream, “But what about the children!” using classical techniques. (Read that link for a definition of a p-value.) We anchored on EPA’s thinking that an “excess” probability of catching some malady when exposed to something regulatable of around 1 in 10 thousand is frightening. For our fun below, be generous and double it.

Suppose the probability of having the malady is the same for exposed and not exposed people—in other words, knowing people were exposed does not change our judgment that they’ll develop the malady—and answer this question: what should any good statistical method do? State with reasonable certainty there aren’t different chances of infection between being exposed and not exposed groups, that’s what.

Frequentist methods won’t do this because they never state the probability of any hypothesis. They instead answer a question nobody asked, about some the values of (functions of) parameters in experiments nobody ran. In other words, they give p-values. Find one less than the magic number and your hypothesis is believed true—in effect and by would-be regulators.

Logistic regression

Logistic regression is a common method to identify whether exposure is “statistically significant”. Readers interested in the formalities should look at the footnotes in the above-linked series. Idea is simple enough: data showing whether people have the malady or not and whether they were exposed or not is fed into the model. If the parameter associated with exposure has a wee p-value, then exposure is believed to be trouble.

So, given our assumption that the probability of having the malady is identical in both groups, a logistic regression fed data consonant with our assumption shouldn’t show wee p-values. And the model won’t, most of the time. But it can be fooled into doing so, and easily. Here’s how.

Not just exposed/not-exposed data is input to these models, but “controls” are, too; sometimes called “independent” or “control variables.” These are things which might affect the chance of developing the malady. Age, sex, weight or BMI, smoking status, prior medical history, education, and on and on. Indeed models which don’t use controls aren’t considered terribly scientific.

Let’s control for things in our model, using the same data consonant with probabilities (of having the malady) the same in both groups. The model should show the same non-statistically significant p-value for the exposure parameter, right? Well, it won’t. The p-value for exposure will on average become wee-er (yes, wee-er). Add in a second control and the exposure p-value becomes wee-er still. Keep going and eventually you have a “statistically significant” model which “proves” exposure’s evil effects. Nice, right?


Take a gander at this:

Figure 1
Figure 1

Follow me closely. The solid curve is the proportion of times in a simulation the p-values associated with exposure were less than the magic number as the number of controls increase. Only here, the controls are just made up numbers. I fed 20,000 simulated malady yes-or-no data points consistent with the EPA’s threshold (times 2!) into a logistic regression model, 10,000 for “exposed” and 10,000 for “not-exposed.” For the point labeled “Number of Useless Xs” equal to 0, that’s all I did. Concentrate on that point (lower-left).

About 0.05 of the 1,000 simulations gave wee p-values (dotted line), which is what frequentist theory predicts. Okay so far. Now add 1 useless control (or “X”), i.e. 20,000 made-up numbers1 which were picked out of thin air. Notice that now about 20% of the simulations gave “statistical significance.” Not so good: it should still be 5%.

Add some more useless numbers and look what happens: it becomes almost a certainty that the p-value associated with exposure will fall less than the magic number. In other words, adding in “controls” guarantees you’ll be making a mistake and saying exposure is dangerous when it isn’t.2 How about that? Readers needing grant justifications should be taking notes.

The dashed line is for p-values less than the not-so-magic number of 0.1, which is sometimes used in desperation when a p-value of 0.05 isn’t found.

The number of “controls” here is small compared with many studies, like the Jerrett papers referenced in the links above; Jerrett had over forty. Anyway, these numbers certainly aren’t out of line for most research.

A sample of 20,000 is a lot, too (but Jerrett had over 70,000), so here’s the same plot with 1,000 per group:

Figure 2
Figure 2

Same idea, except here notice the curve starts well below 0.05; indeed, at 0. Pay attention! Remember: there no “controls” at this point. This happens because it’s impossible to get a wee p-value for sample sizes this small when the probability of catching the malady is low. Get it? You cannot show “significance” unless you add in controls. Even just 10 are enough to give a 50-50 chance of falsely claiming success (if it’s a success to say exposure is bad for you).

Key lesson: even with nothing going on, it’s still possible to say something is, as long as you’re willing to put in the effort.3

Update You might suspect this “trick” has been played when in reading a paper you never discover the “raw” numbers, where all that is presented is a model. This does happen.


1To make the Xs in R: rnorm(1)*rnorm(20000); the first rnorm is for a varying “coefficient”. The logistic regression simulations were done 1,000 times for each fixed sample size at each number of fake Xs, using the base rate of 2e-4 for both groups and adding the Xs in linerally. Don’t trust me: do it yourself.

2The wrinkle is that some researchers won’t keep some controls in the model unless they are also “statistically significant.” But some which are not are also kept. The effect is difficult to generalize, but in the direction of we’ve done here. Why? Because, of course, in these 1000 simulations many of the fake Xs were statistically significant. Then look at this (if you need more convincing): a picture as above but only keeping, in each iteration, those Xs which were “significant.” Same story, except it’s even easier to reach “significance”.

3The only thing wrong with the pictures above is that half the time the “significance” in these simulations indicates a negative effect of exposure. Therefore, if researchers are dead set on keeping on positive effects, then numbers (everywhere but at 0 Xs) should be divided by about 2. Even then, p-values perform dismally. See Jerrett’s paper, where he has exposure to increasing ozone as beneficial for lung diseases. Although this was the largest effect he discovered, he glossed over it by calling it “small.” P-values blind.

September 17, 2013 | 12 Comments

Bacteria Found In Holy Water

Safe at last!
Study making the rounds yesterday was “Holy springs and holy water: underestimated sources of illness?” in the Journal of Water & Healthnational chess master) and others.

They sampled holy water in Vienna churches and hospital chapels and discovered traces of Pseudomonas aeruginosa and Staphylococcus aureus, and where these come from you don’t want to know. However, it is clear from this evidence that at least some parishioners did not heed sister’s rule to wash after going.

The authors also traveled the city to its holy springs and found that about eighty-percent of these had various impurities, some of them at (European) regulatable levels.

Doubtless the findings of Kirschner are true—and of absolutely no surprise to anybody who reads (or helps create) the medical literature. Three or four times a year new studies issue forth showing that doorknobs have bacteria on them, or that the pencil you’re chewing on has lingering traces of some bug, or that doctor’s ties (I did this) are not only ugly but happy home to nasties of all sorts.

So many studies like this are there that it is safe to conclude that absolutely everywhere and everything is infected and that the only sterile place on the planet is in one of those bubbles John Travolta gadded about in in the 1976 beloved classic The Boy in the Plastic Bubble.

Since the stated purpose of the authors was to “raise public awareness” of the dangers lurking in holy water, I’ll do my bit to help. It’s good advice not to sip from the parish font or to get too cozy with the aspersory. Not only could it be injurious to your health, but it’s in bad taste.

The authors also recommend not drinking from holy springs because they fret over its little wigglies. But since there’s little evidence of a practical effect from this—lots of people drink from the springs without keeling over—it’s probably not worth changing your habits. Keep opening doors, too, and chewing on pencils and go to your doctor even though he wears a tie.

(There’s a nun joke in there somewhere, but I’m still jet lagged. Invent your own.)