Keeping with Fadeback Fridays, which is soon to be an international success, this classic post, which originally ran on 23 December 2016. It’s yet another in an endless supply of Research Shows what scientist desperately want to be true: that men and women are Equal, by which they mean women are more Equal. Our lesson is if you screw with the data long enough, your P will be wee.
The NBC News story “Female Doctors Outperform Male Doctors, According to Study” makes these bold claims.
Patients treated by women are less likely to die of what ails them and less likely to have to come back to the hospital for more treatment, researchers reported Monday.
If all doctors performed as well as the female physicians in the study, it would save 32,000 lives every year, the team at the Harvard School of Public Health estimated.
Yet women doctors are paid less than men, on average, and less likely to be promoted.
“The data out there says that women physicians tend to be a little bit better at sticking to the evidence and doing the things that we know work better,” [Harvard’s Dr. Ashish Jha, who oversaw the study] told NBC News.
The ordinary reader would assume female doctors are always much better than male doctors, and the reason is (partly) because male doctors practice medicine regardless of what the evidence dictates. Worse, they receive greater rewards for their foolish and dangerous behavior.
The NBC story drew from paper “Comparison of Hospital Mortality and Readmission Rates for Medicare Patients Treated by Male vs Female Physicians” in the journal JAMA Internal Medicine by Tsugawa, Jena, and Figueroa. Its main claim is this:
Using a national sample of hospitalized Medicare beneficiaries, we found that patients who receive care from female general internists have lower 30-day mortality and readmission rates than do patients cared for by male internists. These findings suggest that the differences in practice patterns between male and female physicians, as suggested in previous studies, may have important clinical implications for patient outcomes.
Now those “suggests” in the second sentence should set alarm bells ringing. And, indeed, Tsugawa and his co-authors did not measure how doctors practiced, and so even if it were true that male and female physicians had different 30-day mortality and readmission rates, the researchers would have no way of knowing why the differences existed. And neither would NBC.
Let’s Examine the Numbers
What happened was this. The authors collected a sample of about a million-and-a-half “Medicare fee-for-service beneficiaries 65 years or older who were hospitalized in acute care hospitals.” Mean age of patients was about 80. The NBC summary misleads by saying just “patients,” which implies the research applies to everybody and not just elderly Medicare patients.
Here’s the conclusion (my emphasis):
Patients treated by female physicians had lower 30-day mortality (adjusted mortality, 11.07% vs 11.49% …) and lower 30-day readmissions (adjusted readmissions, 15.02% vs 15.57% …) than patients cared for by male physicians, after accounting for potential confounders.
First note the differences are small: 11.1% versus 11.5%. And then realize these are the “adjusted” and not actual numbers. Adjusted?
They mean adjusted using the statistical technique of regression modeling. It’s complicated, but everybody forgets that regression is an equation that describes how the guts inside a model vary as “covariates” or “confounders” do (these covariates are other possible explanatory variables). Those guts (which are called parameters) are not observable and do not make direct statements of what can be observed, like 30-day mortality. Hence, the researchers should only have spoken indirectly about 30-day mortality while also acknowledging the uncertainty that accompanies their statistical models.
The researchers did not do this; hence their results are stated in terms which are too sure. In their favor, theirs is a common error (discussed in this book).
An Overdose of Covariates and a Much More Plausible Explanation
Here’s another problem. According to the supplemental information to the paper, they crammed greater than 1,000 covariates into their models.
Greater than 1,000 covariates!
Any statistician will tell you that over-stuffed regression models like this are bound to lead to an uninterpretable morass. Nobody can have a clear idea what is going on with the actual 30-day mortality after all that adjusting. But because of the huge sample size and all those covariates, the model will look like it’s performing well (that is, it will evince wee p-values).
Are there other possible explanations to account for the small differences noted by the models? Yes. Female docs were about 5 years younger on average, and female docs also treated many fewer patients on average than men. This implies women docs had more time per patient.
Even more intriguing, we also know “female physicians treated slightly higher proportions of female patients than male physicians did.” And since women live longer than men, particularly at those advanced ages, maybe — just maybe! — any slight change in mortality or readmission rates between male and female docs could be explained by women doctors treating more longer-lived patients.
That explanation is surely as or more plausible than results from an unnecessarily complicated statistical model. It also eliminates the unwarranted theorizing about how women physicians are “better at sticking to the evidence” and are thus “underpaid.”
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE
Aye – to all you say.
Almost all multivariate models – no matter how over-stuffed with covariates – may be assumed to fail to correct accurately and validly for confounding.
Probably residual confounding is the main problem – i.e. incomplete removal of the effect of confounding. There is the ineradicable problem of these models in modelling confounders with a straight line relationship; when there is no biological reason for assuming a straight line relationship. For this reason I always preferred a stratified analysis to multivariate – it did not assume linearity and you could see exactly what was going on.
Furthermore there is the much deeper problem; that one could only know the real nature of confounding by doing many specific scientific experimental studies, each one studying a single suspected confounder, in which “all” relevant variables were controlled except for the suspected confounder and the outcome variable. In other words a causal relationship would need to be established and measured for each confounder.
In other words, it is always better to control for confounding in the study design – in advance; rather than assuming/ pretending that your badly-controlled study can *retrospectively* be “cleaned-up” by elaborate statistical calculations based-on what are implicitly (often) biologically-implausible assumptions.
1,000 covariates? That’s absurd. How can anyone claim to understand what’s happening in that mess? And with an N of 1.5 million, even a difference of 10^(-10) would be statistically significant—never mind their 11.07% versus 11.49%. This is a classic case where too much data obscures rather than clarifies. And let’s not ignore another issue…replication. Observational studies like this are extremely hard to replicate, which raises serious questions about the reliability of any conclusions drawn.
I wonder if the specific women in this study were paid less than the men were. Or, was the pay difference measured in another study? The nudge on pay differences seems to be the only reason for the publication.
@B G Charlton
Amen to all that! And more: stratification might have allowed a nonlinear model to be fit, e.g. any of the 3 flavors of logistic regression, with little or no confounding; the use of a 20%(!) random sample seems like an invitation to overfitting, and a methodological waste of data. With thousands of observations, they had the opportunity to train, validate, and calibrate a whole portfolio of classification models, many with w-a-a-y fewer covariates. Of course, all of this requires extensive “playing with the data,” which is as distasteful as playing with your food in polite company.
Your point is well taken, Briggs, but in the greater scheme of Modernismo how are the Overlords supposed to move the masses if they cannot lie with statistics? You think they should just tell the TRVTH?! — HA-HA-HA! And ruin the modernist project? The Modern Agenda advances by lying about everything, you rube, dope, naïf, born yesterday, wet behind the ears, credulous, pea-brained, dimbulb, microcephalic, Air Force simpleton. [Rhetorical comedy, for the humor impaired.] I am outraged — OUTRAGED! — that Satan is telling lies! And subverting the sacred science of statistics to do it. Does Satan have no shame? And are there no bounds to man’s folly, stupidity, and sin?
If there were such bounds Briggs might be out of business. Imagine a world where men are not foolish, stupid, and sinful. What would there be to condemn? What would there be to criticize? What would there be to say? In Utopia how many times can one say, my, what a nice sunset! Or, You look lovely, dear! When what you really want to scream is over my dead body, asshole! or,stop tailgating me, jerkface! and such like things that really get the blood going. So much more satisfying.
But Jesus teaches us not to indulge these lowly earthly urgings but rather to moderate and edit one’s destructive content to advance a more moral and constructive agenda. Fine. I’m all on board. Now, if only all these other assholes were too! Bastards! Dickheads! Jews! Negroes! Scandinavian blockhead wigglers! Know-it-all flyboys! Irish drunken bums! Women! Liberals! Conservatives! Eggheads! I’ll get with the program when they do too. Until then I’ll match everyone jerk-for-jerk. If I can’t pose as an angry, righteous victim how can I justify being such a jerk? Why be part of the solution when you can have so much more fun being part of the problem? That’s the big modern question.