Too late. 10 October 2013.
Moved this to top because the bill allowing non-doctors to perform abortions is on Gov. Brown’s desk. He’ll likely sign, but those who care about “women’s health” should be careful what they wish for. I tender this critique on the very rare chance it will cause Brown to change his mind.
Be sure to first read Part I where the language used in the study and in this analysis is explained. (It will be obvious in your comments whether you have done so.)
Today we analyze the paper, “Safety of Aspiration Abortion Performed by Nurse Practitioners, Certified Nurse Midwives, and Physician Assistants Under a California Legal Waiver” in the American Journal of Public Health (2013 March; 103(3): 454–461) by Tracy A. Weitz and others (link).
Knowing that many won’t or can’t read everything below, my main findings are provided here for ease. I wish this could be shorter, but not everything is easy.
The study stinks and can’t be trusted. There is every indication the work was done sloppily. Peer review failed to catch some pretty glaring mistakes, a not-rare occurrence. The protocol was a mess. The actual complication rates reported by the study were deflated because of an unwarranted, extremely dicey assumption about missing data. It appears that non-doctors have complications rates about twice that of doctors, even though the authors claim they are “clinically” the same.
Update New readers interested in commenting may also enjoy this article on the genetic fallacy.
The paper reported that 13,807 women agreed to participate in the study. Of these, 2320 were excluded because they were used to train the non-doctors. The complication rates for the training were never given—peer review should have insisted they were. How many mistakes are made by non-doctor trainees as opposed to doctor trainees? We never learn.
That left 11,487. The authors next report “[a]s a result of a protocol violation at 1 site, 79 patients in the physician group were excluded.” This should leave 11,408, yet the authors say “The final analytic sample size was 11 487; of these procedures, 5812 were performed by physicians and 5675 were performed by NPs, CNMs, or PAs.” It appears that it should read 5733 for physicians.
Now 5812 + 5675 = 11,487. Keep these numbers in mind. They were used for all subsequent calculations.
The authors’ concern was whether the killing of lives inside the uteri of women by “doctors” or “physicians” (see Part I for definitions) or by “nurse practitioners,” (NPs) “certified nurse midwives,” (CNWs) and “physician assistants” (PAs) resulted in greater or lesser rates of “complications.”
What is a “complication”? The authors never fully say. There are two parts to any such definition: the time span over which complications occur and the specification of what counts as one. For the time span they say this:
Each patient received $5 and a follow-up survey about medical problems after the [killing] to capture any delayed postprocedure complications. If patients did not return the survey, clinic staff made at least 3 attempts to administer the survey by phone. If the patient experienced post[killing] problems, she was asked a defined set of questions to obtain medical details. Additionally, staff conducted patient chart abstractions 2 to 4 weeks after [killing] to ensure delayed complications were captured.
It appears—but only appears—from this that immediate, i.e. on-site, post-procedure complications were recorded. Others were self-reported by some of the patients from “2 to 4 weeks” after. This is a sloppy protocol. A rigorous one would have specified the exact time window for follow ups. As it is, there could have been complications after two weeks but before four which would be missed by the lax protocol. All these (potential) complications went unrecorded, thus the study underestimates the true complication rate at 4 weeks.
As is typical in medical trials, there was significant loss to follow up, i.e. not every woman could be contacted. The authors say that only 69.5% of the 11,408/11,487 were measured.
Their next step was highly problematic: they decided to code each missing value as “no complication”.
They explain this by assuming that any un-contacted woman who did suffer a complication would have gone to “the facility” where she had her killing and reported it. Indeed, 41 women did so. But to say that all 3479/3503 (depending on what grand total we use) did is completely unwarranted and even ridiculous: the women could have seen their own doctors or “rode out” the complications at home, not contacting anybody. This is a shocking error.
We also don’t know how many of the women were lost to follow up in each group. Were most lost in the doctor group, perhaps because these women felt fine and because those in the non-doctor group had higher complication rates? We never learn. But, just to have some feel, assume the loss was (roughly) equal in each group. That leaves (ignoring round off) 7928/7983 in total, or 3984/4039 in the doctor group and 3944 in the non-doctor group.
Another error: we never learn whether the complications were ad hoc or whether they were pre-specified. If they were defined, as it appears, “on-the-fly” the authors’ statistical findings are of no generality. Peer-review let us down here (as it so often does).
We can still learn some things, however. Minor complications, to the authors, are at least (from their “Outcomes” section):
- incomplete [killing],
- failed [killing],
- bleeding not requiring transfusion,
- hematometra (retention of blood in the uterus),
- endocervical injury,
- anesthesia-related reactions,
- uncomplicated uterine perforation,
- symptomatic intrauterine material,
- urinary tract infection,
- possible false passage,
- probable gastroenteritis,
- allergic reaction,
- fever of unknown origin,
- intrauterine device-related bleeding,
- sedation drug errors,
- inability to urinate,
Major complications included:
- uterine perforations,
- infections (presumably worse than minor),
To prove this list incomplete, some common complications like sepsis, septic shock, and death are not listed (presumably these and others were 0% for each group; “common” in the sense that these are tracked in other studies).
Whatever a “complication” was—and we must remember that the list was incomplete—the authors expected “rates ranging from 1.3% to 4.4%”; specifically, in their sample-size calculations they used the “rate of 2.5%, which was based on mean complication rates cited in the published literature.” Keep this in mind.
Because of the way the study was designed (discussed below), the authors “anticipated a slightly higher number of complications among newly trained NPs, CNMs, and PAs than among the experienced physicians.” Was this the case? Here are the complications given in tabular form with rates (percentages) for doctors (using the reported n = 5812 killings) and non-doctors (n = 5675 killings):
|bleeding not requiring transfusion||0||0.035|
|uncomplicated uterine perforation||0||0.053|
|symptomatic intrauterine material||0.275||0.282|
|urinary tract infection||0.017||0|
|possible false passage||0.017||0|
|fever of unknown origin||0||0.018|
|intrauterine device-related bleeding||0||0.018|
|sedation drug errors||0||0.053|
|inability to urinate||0||0.018|
|uterine perforations; infections; hemorrhage||0.052||0.053|
The authors did not specify the breakdown for major complications for doctor and non-doctors, except to say there were 3 instances in each group. This is a mistake.
Now except for four minor complications the rates were higher for non-doctors. Where the doctors had higher complications, there was only 1 instance of each complication and two of these were uncertain (they might not have been complications after all). This result (the ordering) is the same if the not-guessed at data is used.
Overall, using the reported numbers, doctors’ rates were 0.9%, and non-doctors were twice that at 1.8%, which also uses the unwarranted assumption that all those lost to follow up did not suffer a complication. Using just the observed and not guessed-at data, the rates were 52/(3984/4039) = 1.3%/1.28% (doctors) and 100/3944 = 2.5% (non-doctors). Note that these larger rates are more in line with what was expected from the literature.
The raw conclusion is thus: that for these practitioners and at these locations and for these females, doctors had complication rates about half those of non-doctors.
Yet the conclusion of the authors was (from the Abstract):
Abortion complications were clinically equivalent between newly trained NPs, CNMs, and PAs and physicians…
Why the discrepancy? The miracle of statistics. But first, the study design.
The study was not blinded. Those recording complications knew who did the procedures and knew the goal of the study. Never a good idea.
Women presenting to the 22 facilities were asked whether they wished to have their killing done by an NP, CNM, or PA. If she agreed, one of the 28 NPs, 5 CNMs, and 7 PAs did so. But sometimes—they never say how often; more sloppiness—she was sent to a doctor if “clinical flow necessitated reorganizing patients”. Or she was sent to one of 96 doctors if she requested one.
This loose protocol is problematic. Could women who saw themselves as sicker or tougher to treat (or whatever) have requested doctors more often than non-doctors? It’s possible. In which case, the complication rate difference between the two groups would be artificially narrowed.
About half the women (in each group) were “repeat customers”, incidentally, with about one-fifth (in each group) having had two more more previous killings.
One real question might be: “Which is less dangerous? Getting a killing from a doctor or a non-doctor?”
Now the evidence before us is that, in this study, (even assuming the reported numbers as accurate) non-doctors were associated with complications at about twice the rate of doctors. But what about future killings? Will they, too, have about twice as many complications for non-doctors?
To not answer that, but to give the appearance of answering that, the authors used two classical (frequentist) statistical methods: one called “noninferiority analysis” and another called “propensity score analysis.”
Propensity scores are controversial (Yours Truly does not like them one bit) and are attempts to “match” samples over a set of characteristics. Suppose, for example, the doctor group had more smoker patients than the non-doctor group and so forth for other measured characteristics. Propensity scores would statistically adjust the measured outcome numbers so that characteristics were more “balanced.” Or something. Anyway, even with this “trick”, the authors found that complications were “2.12…times as likely to result from abortions by NPs, CNMs, and PAs as by physicians.” Since this is roughly the same as the raw data, there is no story here.
Or so it would seem. For the authors next engaged a complex statistical model (for the noninferiority piece), once using the propensity scoring and once not, and reported no difference between the groups.
We fit a mixed-effects logistic regression model with crossed random effects to obtain odds ratios that account for the lack of independence between [killings] performed by the same clinician and within the same facility and cross-classification of providers across facilities. We included variables associated with complications in bivariate analyses at P < .05 in the multivariate model in addition to other clinically relevant covariates to adjust for potential confounders.
It is a mystery which “clinically relevant covariates” made it into the models: all of them (from Table 1)? Some? Others not listed? Who knows.
What they should have done is listed, for each practitioner, the number of killings he performed and the number of and kind of complications which resulted. We never learn this information. Site was in the model, as it should have been (some sites presumably have higher complication rates, some lower; just as some practitioners have higher rates, some lower), yet we never learn site-statistics, either. We also never learn if complication type clustered by practitioner or site.
We never see the model (no coefficients for any of the covariates, etc.). All that is reported is that the “corresponding risk differences were 0.70% (95% CI = 0.29, 1.10) in overall complications between provider groups.” Well, this is all suspect, especially considering the model is using the dodgy numbers. While there are good reasons for posting the data by practitioner-by site, there is little reason to trust this (hidden) model. It is far too complicated, and there are too many “levers” to push in it to trust that it was done correctly.
In any case, it is the wrong model. What should be given is the prediction: not how many complications there were—we already know that—but how many we could expect in the future assuming conditions remain the same. Would future groups of patients, as did these patients, suffer more complications at the hands of non-doctors? Or fewer? We just don’t know.
There were 40 non-doctors and 96 doctors doing the 5675 and 5812 killings. That’s an average of 142 killings for each non-doctor and 61 killings for each doctor. In other words, the inexperienced non-doctors did more than twice as many killings than doctors. An enormous imbalance!
The study ran from “August 2007 and August 2011.” This is curiously long time. Were the same practitioners in the study for its duration? Or did old ones retire or move on and new ones replaced them? We never learn. The authors report that non-doctors had a “mean of 1.5 years” of killing experience but that doctors had 14 years. Given the study lasted four years, and that training was part of the protocol, this appears to say that the non-doctors were not constant throughout the study. How could this affect the complication rates? We never learn.
All in all, this was a very poorly run study. The evidence from it cannot be used to say much any way: except that just because a study appears in a “peer-reviewed journal” it does not mean the results are trustworthy. But we already knew that.