William M. Briggs

Statistician to the Stars!

Page 145 of 566

“Probably Fine” Isn’t A Number

What are those white spots?

Today’s headline is a true proposition. “Probably fine” isn’t a number, yet it is a perfectly reasonable way to communicate risk. Indeed most risk is, and should be, given in the form of words, or even vague thoughts.

What is to be discouraged is the relentless, brutal search for scientific-sounding precision which, unless the situation is rigorously defined, is absurd and leads to bad decisions.

The proof of this is easy. Take the context from where the quote originated, a Wall Street Journal article by a once-pregnant lady (Emily Oster) who was investigating the myriad restrictions and cautions issued to women who wish to take care of the lives growing inside them. Oster asked her doctor whether she could have “one or two glasses” a wine a week, to which the doctor replied that it was “probably fine.”

This led Oster to observe that “probably fine” was not a number, which was less than satisfying. She desired “real answers.”

In her pursuit of real answers, Oster discovered in medical journals that some deli meats had the risk of carrying listeria bacteria, which is bad.

I concluded that avoiding queso fresco and deli turkey was a good idea, but in the end I didn’t feel that it made sense even to exclude other deli meats. My best guess was that avoiding sliced ham would lower my risk of listeria from 1 in 8,333 to 1 in 8,255. I just didn’t think it was worth it. It would have made more sense to avoid cantaloupe.

Here are two “real” answers: 1 in 8,333 and 1 in 8,255. Each a nice quantification and, as promised, both give the feeling that science is happening. Both are also absurdly, unjustifiably precise.

These numbers were culled from various medical studies. Now any study is conditional (as all probability is conditional) on the kind and type of observations that were taken. For instance, in studies of deli meats containing listeria, there are the kinds of meats—all the different kind of hams, bologna, including that with and without olives or other stuffings, various types of turkey, salted beef, hard salamis, soft ones. There are different manufacturers—domestic, international, the actual plants, the carriers and methods of transport to the delis.

There are the animals used in the meats—pigs, cows, mystery meats, each of these grown on farms God knows where, each fed different foods, some genetically this way, others genetically that, some fed with antibiotics in various doses, some given hormones.

Then there are the delis—located in this and that neighborhood, kept under who knows what temperature control, selling meats fresh and past its sell-by dates, owned by fastidious shopkeepers or by corporations, managed by new hires and old. There are many more items easily added to the list which are is possible to imagine influence the chance a given piece of meat contains listeria.

In the end we have a piece of meat from a deli. Either it will have listeria or it will not (a tautology, therefore always true of any piece of meat). If it does, we look at all the characteristics listed above and put a check mark next to the ones which are true of this slice (from a pig, sold on a weekend, etc.). One of these characteristics might have been the cause of the meat having listeria, or maybe we missed measuring the cause, and that the characteristics we measured are only associated with the cause.

A second piece of meat won’t have listeria. Again we check off all the boxes, some of which will be also checked in the meat with listeria, some won’t. If boxes are checked for both pieces, it seems likely that those characteristics aren’t the one that caused or didn’t cause the bacteria to grow. But some characteristics will be checked on the bacteria-laden meat which won’t be checked on the clean meat, and vice versa. Perhaps one of these, or lack of one of these, is why the meat got infected. It is only “seems likely” and “perhaps” because we have to recall that we might not have measured all the right things.

In the end, we have to pick some characteristics and eschew others; we have to settle for summaries. For example, in one experiment we might find a greater frequency of turkey slices than ham slices contained listeria. The risk of eating turkey or ham can thus be given a precise number. But because this summary necessarily ignores many characteristics, and we are never sure we have thought of every relevant one, the quantification is only meaningful if it is certain that characteristics we ignored or didn’t measure where not involved in listeria production.

Since we are not certain, the quantification is the wrong number: it is not the real answer. It is misleading, it is too sure. Better to round to the nearest order of magnitude. Or say something like “Eating a slice of deli ham is probably fine.”

And we haven’t even added the layers of complexity which comes from eating the meats! Some people can eat the bacteria-laden meat and never develop symptoms, others need only a fragment to become ill. Oh my! It goes on and on.

And so have I. So I’ll stop.


Thanks to reader Jim Fedako for suggesting this topic.


Plastic Surgery Might Make You Uglier

Daryl Hannah: Did she or didn't she?

Daryl Hannah: Did she or didn’t she?

In that stating-the-obvious tone common to science writing, Joshua Zimm and his co-authors open their “Objective Assessment of Perceived Age Reversal and Improvement in Attractiveness After Aging Face Surgery” in JAMA: Facial Plastic Surgery with the words:

Primary reasons why patients pursue aesthetic facial surgery are to look younger and more attractive; however, there is minimal literature about the effect of aesthetic facial surgery on perceived age and attractiveness.

Now Zimm says that, but it’s not clear whether plastic surgeons, especially the kind who advertise in in-flight magazines or on bus benches in Beverly Hills, are interested in discovering whether their expensive Ginsuing “cures” aging and ugliness to the extent their customers hope. But even if they admitted to their clients there was little to be hopeful for, their waiting rooms would probably remain filled.

It seems that the age where one is sliced and diced makes all the difference. In South Korea, for instance, it is nearly de rigueur for young ladies to have their eyes shifted, noses rebored, and milk production units given boosts. This phenomenon is so ubiquitous that even The Atlantic has heard about it: South Korean High Schoolers Get Plastic Surgery for Graduation (complete with eye raising pictures; no, literally: eye-raising pictures).

The bodies of teens and twenty-somethings heal wonderfully from these traumas, so much so that it’s difficult to tell whether or not somebody’s had something done. Of course, it could be South Korean women’s consummate cultural facility with makeup which disguises the mistakes of surgery, but more plausible is that young people are easier to patch up.

More proof comes from the story we’ve all heard. A Chinese man sued his wife for giving him an ugly baby. Turns out that, when young and before her marriage, the woman took a trip to South Korea to have an Earl Scheib facial. It worked, (before and after pictures here) at least in the sense of attracting a man who loved her for her looks. But her unmodified looks were passed on to her kid, greatly surprising her husband.

Anyway, after a body reaches forty or so, the effects of face lifts linger, or thus it seems to the untrained eye, which is one trained to see ante-posterior pics of starlets and Congresspeoples. Who isn’t familiar with the permanently astonished, Nancy-Pelosi-like visage caused by one too many “procedures”? Or that some women, like young ones trying to squeeze into too-small jeans, ask the doc to stretch “just a bit more” and thus turn into cat-ladies? This happens to men, too, but fewer men; though lately more and more undergo the rigors of surgical vanity.

Back to Zimm and friends. They collected about fifty before-and-after pictures of “patients” (a word which no longer means sick people) who willingly paid for “aesthetic facial surgery”. They sent these pics to about fifty folks and asked them to estimate the age of the “patients” and how attractive the “patients” were on a scale of 1 to 10.

Makeup versus surgery, the eternal question.

Makeup versus surgery, the eternal question.

Turns out the civilians recruited as judges said the age reduction was anywhere from -4.0 to 9.4 years, meaning a goodly fraction (just under half) of patients appeared older because of the “work” they had done. The change in the attractiveness score bounced around in between -0.5 and 0.5, meaning even those who got better looking only did so a tiny bit, plus about half got uglier. Which is the bad news doctors hoped they wouldn’t discover.

Bad for the docs because their art work was not judged to be sufficiently worthy, especially given the nomenclature of the procedure used, i.e. “rejuvenation.” Not as much juvenating power in the knifes and suction machines as touted.

The folks undergoing the cutting and hacking were “42 to 73 years at the time of surgery, with a mean age of 57 years.” Not unexpectedly, about three-quarters were women. And since our on-the-street evidence that people past 40 emerging from beauty clinics sometimes appear to have been in a terrible accident, this comes as no surprise.

Of course, a sample of 50 is nothing, and hardly any controls were done in this experiment (only three “patients” had “lower facial rejuvenation”), so the best news is for Zimm and colleagues. More research is needed. Let’s let them have the last word on one of their findings:

[O]ur study may demonstrate that once an age is ascribed to someone others associate that age with a certain level of attractiveness. Specifically, younger people are generally gauged as being more attractive.


Briggs Makes Der Spiegel: Violence “Linked”, Etc.

Almost like looking in a mirror.

That embarrassing Science paper which “proves” “violence” will increase because of climate change is exciting interest globally. It certainly provided us a certain amount of harmless fun.

But this time criticism of the latest we-are-dooomed scenario has extended beyond the usual cluster of sites. Even Der Spiegel picked up on it: Globale Erwärmung: Forscherkrieg um Klimastudie.

The relevant passage, which I used Google Translate to provide, is this:

More serious the allegation several critics, the data base of the study was chosen so that a certain result come out. Hsiang and his colleagues had “comparing apples with rollerblades”, wrote about William Briggs, statistics professor at the U.S. elite Cornell University, dripping with sarcasm in a blog post . “Data of past Tuesday will be just as meaningful as that of 8000 BC.” At the end are “many great graphics” and a result came out: “Hot, dry weather is bad for us.” Therefore, mocks Briggs, also draw everyone into the hot from the cold Michigan South Carolina – “to be where the action is.” “Complete nonsense,” was still the kindest description of the study.

No, no. My kindest description was to say that Hsiang and his co-authors put in a staggering amount of work. No faulting their diligence, he said sincerely.

But hard work doesn’t always imply good work (as regular readers of this blog will attest!).

Here is the succinct summary of Hsiang: they mixed physical measures of the atmosphere and land from widely disparate times and places without accounting for the uncertainty inherent in each or in their combination; they defined “violence” haphazardly, intent to let it be rape at one location and “leader removal” at another. They cobbled the whole together into a—let me charitable—creaky statistical model which was able to produce a wee p-value, defined as a p-value less than the magic number (0.05).

Additionally, every site and data source they used did not show increasing “violence” with increasing inclement weather (people do not experience a climate, but moment-to-moment weather). It was only after mixing the stew of the sources and kinds of “violence”, all said to be just plain “violence”, into their model were they able to “prove” what they hoped would be true.

Suppose they were interested in rape. What they should have done is to look at rape statistics from localities all across the globe, accompanied by those meteorological variables which they could show, or plausibly show, are directly responsible for rape. They’d have to answer questions like, Why isn’t rainier weather correlated with less rape? After all, rapists like to keep dry, too.

But this alone would not have been good enough to show a connection. After all, rape rates are measured with substantial error which depends on location. These uncertainties would have to be “carried forward” in their models. And then culture plays an enormous role in rape: these variables would have to be controlled for.

Of course, the weather/climate variables are also subject to large measurement uncertainties, which also depend on location and time. Time itself is highly problematic. Cultures change through time, immigration and emigration have to be understood, technology changes things (e.g. cameras may dissuade rape). And on and on.

And these are just the bare minimum to be able to say anything about just rape. What about “leader removal”? Same set of considerations, only now expanded for politics, that simple subject. And so forth for the other kinds of “violence.”

Then, finally, when all this is done, they’d have to look at the “opposites” of these variables. It could be that “leader removals” increase with temperature, but it might also be that people are happier, politically speaking. It was not joke about people moving from the North to the South, where it is much hotter: it’s happening with increasing frequency and people seemed pleased about it. They do not seem to be turning to violence at their new addresses.

Just because it is easy to dump a bunch of numbers into a computer does not mean that it should be done. The software is forced to give you an answer, but that does not necessarily mean the answer is what you hope it is. That this paper showed up in Science says lots about how easily influenced even our best and brightest (but see Belloc’s exception) can be by fad and bias.

Update More from the Der Spiegel article (at the end):

Storch also makes allegations of the “Science” Editorial: “The peer-review process did not work you should have critical evaluators, which was obviously not the case here..” A spokeswoman of the journal, however, can see nothing but good in the debate. “Science is a self-correcting process,” she wrote in an e-mail. Researchers published their results so that they confirmed could be refuted or corrected. “In this way science makes progress.”

You bet it is, baby. Consider this a massive self-correction.


Belloc On The Limited Intelligence Of Scientists

And always keep a-hold of Nurse.

From Hilaire Belloc’s Richelieu, J.B. Lippincott Company, Philadelphia & Londen, 1929, p. 23, in the context of disproving the social theory of historical events, i.e. the one which claims individuals are not influential, yet somehow groupings of them are.

The conquests of physical science were due to minute and extensive observation conducted by vast numbers of men, and therefore, for the most part, by the unintelligent. Science attracted some few men of high culture and some even (much fewer) of strong reasoning power: but in themselves mere observation and comparison, the framing of hypotheses and the testing of them by experiment need no intellectual qualities above the lowest and therefore an obvious occupation for those who despise or do not grasp the use of the reason. It has even been maintained that the ceaseless practice of exact measurement dulls the brain. At any rate, the business of modern physical science was not attached to, and became more and more divorced from, philosophy—and therefore from theory which is philosophy’s guide.

But this, for the most part unintelligent, mass of observation, has led to astounding results….As a consequence, its prestige has risen prodigiously; its methods, conclusions, and much more, the moral atmosphere in which it works has affected every other art, and every other study; notably did it affect the spirit of history in the later nineteenth century.

Was this offhand comment fair then? Fair still now? Seems pretty accurate to me, and I speak of one of the fold.

Of course, the egos of scientists have done anything but shrink since these words were penned. Except for activists and politicians, no man is more ready to self congratulate himself over his profession than a scientist. Yet it takes some brains to do the routine tasks of these artisans. But maybe less than has been claimed. And, after all—and this is Belloc’s main point—facility with integration does not give one more insight into what defines the good life than do the abilities possessed by, say, carpenters.

Be sure you understand what is being criticized here. People not knowledge. Science is often spoken of by its practitioners as a thing, a real entity, and a poor one, too; one whose honor is always in dire and desperate need of defending; a damsel in acute distress, beset upon continuously by the forces of unreason. These perpetually nervous guardians are certain sure that if the percentage of the population who cannot on demand name the weight of a neutrino slips below a fixed level, then the mullahs and priests will take over and enforce blind dogma.

As if the weight of the neutrino is not dogma. And never mind that it was the theological bent of priests and Abrahamic religious which gave birth to Science, which in many cases, to this very day, was advanced by those sporting dog collars and cloaks.

Plus it’s true that in our culture kiddies grow up with the myths and legends of scientists. While everybody knows Einstein, how many can name, for instance, Aristotle? Or Bach?

Anyway: scientists. Sparkling geniuses all, or regular, somewhat tedious, folk?

« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑