William M. Briggs

Statistician to the Stars!

Category: Statistics (page 1 of 173)

The general theory, methods, and philosophy of the Science of Guessing What Is.

Ask A Scientific Ethicist: Baby Making, Auto Mishap, ISIS Attacks

The Scientific Ethicist, PhD

The Scientific Ethicist, PhD

This was supposed to run this morning. No idea why it didn’t.

This week, three letters from concerned readers.

Too many babies

Dear Scientific Ethicist,

Hopefully this subject matter isn’t too technical for your audience.

In a recent discussion with my boss, I claimed that it was impossible for nine women to make a baby in one month. My boss claimed that with proper planning, nine women could indeed have a baby a month for nine months.

Which one of us is correct? No pressure intended, but I think my job might be dependent on your answer.



Dear Milton,

Your boss is right. Nine (biological) women could, with appropriate planning, have one baby each, one per month spaced equally over nine months.

And you’re wrong. Nine women could indeed make one baby in one month. As long as they had access to an egg from any one of them, certain male genetic material, which Science shows can be had any old place, and some rather sophisticated medical equipment (made by Science!). The baby could be made—and in well under one month, at that—and implanted in any of the women. This isn’t the best, safest, surest, or recommended method—it’s too easy to kill the baby because creation and implantation—but the thing can be done.

Of course, Science tells us that baby would take approximately nine months to emerge from its mother. But that’s birth, and not the making of it.

Unfortunately, Science disagrees with you. But if it’s any consolation, Science disagrees with a lot of people!

The Scientific Ethicist

Automotive mishap

Dear Scientific Ethicist,

I was driving down the road and saw a car crash into a parked car, then drive away without leaving a note. There was just a little damage on the parked car. I took down the the licence plate number of the car that drove off, but it was being driven by a young ethnic woman, and I don’t want to be a racist. What should I do in this situation?

[Name Withheld], Atlanta, GA

Dear [Name Withheld],

The force between an average car going at typical speeds (in the neighborhood of 30 MPH) hitting a stationary average car is easily calculated. We call this momentum, the mass of the car multiplied by its velocity. In many cases, we can speak of the momentum as a single variable instead of trying to keep track of multiple measures.

If both cars were moving, then depending on the directions both cars were traveling, there could have been at the time of contact anything from very little momentum, to something quite high. But since one car was not moving, the momentum probably wasn’t large.

Low momentum impacts produce notably less damage than high momentum impacts. That you say “just a little damage” indicates that this was probably a low momentum impact.

Once again, Science gives the answer!

The Scientific Ethicist

Take that man

Dear Scientific Ethicist,

I live in Al Bukamal, Syria. The Islamic State is practically out the back door. They’re beheading non-Muslims, burying children alive for not being Muslims, and many other terrible things. And they’re boasting of it! Oosting pictures of it on the web. The terror endless. I’m starting to panic. What should I do to stay calm?

Billy, San Francisco

Dear Billy,

Only the consolations of Science can have any effect. I usually recommend reading Introduction to Topology by Bert Mendelson, or Inorganic Chemistry by Gary Wulfsberg. Though in your case, nothing is better suited than Brian Greene’s The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos.

In that book, Greene highlights the Science of the multiverse. The gist is that there are an infinite number of other universes where you also exist and where the Islamic State is benign. Why, there’s even a universe in which each member of ISIS is a Good Humor man handing out free ice cream to children over-heated by the desert! In none of these other happy universes would you feel terror.

Science can calm the most troubled soul!

The Scientific Ethicist

Send in your questions to the Scientific Ethicist today! Or read his previous columns.

It Makes No Sense To Say You’re More Likely To Die Of Bee Sting Than Shark Bite

"Tell the world. Tell this to everybody, wherever they are. Watch the skies everywhere. Keep looking. Keep watching the skies."

“Tell the world. Tell this to everybody, wherever they are. Watch the skies everywhere. Keep looking. Keep watching the skies.”

The National Journal says: “The odds of being killed by a shark are about 1 in 3.7 million. The odds of being killed by a sting from a bee, wasp, or hornet are 1 in 79,842, according to the National Center for Health Statistics, a part of the Centers for Disease Control and Prevention.”

Now you see these kinds of stories all the time, all of which have conclusions like You have a better chance of dying in a lightning strike than in winning the lottery, etc., etc.

These stories are all wet. Nobody has a chance of dying in a lightning strike, just as nobody has a chance of winning the lottery or being killed by stinging sharks or biting wasps. By which I mean, everybody has a chance of winning the lottery or being stung by lightning or whatever.

Double talk? The problem is chance, a nebulous word, apt to change shape mid-sentence so that you don’t always end up where you were aiming.

In one version, chance means logical possibility. Everybody has the logical possibility of dying by shark, bee, or lightning. But chance also connotes probability. And there just is no unconditional probability of dying by anything.

Logical possibility is a weak criterion. Anything that isn’t logically impossible, such as square circles, is logically possible. You might be a native Antarctican, solid ice your bed and fluffy snow your pillow since birth. But, one day, an evil Polar Vortex might surreptitiously search the seas for your doom, and fling a hammerhead shark from the tropics to the very spot on which you take your morning ice floe stroll, as was illustrated in the documentary Sharknado.

Hey, it could happen. It’s logically possible. And in that sense you have a chance of dying by shark bite.

But you don’t have an unconditional probability of dying by one. What can we say? If the probability of you being eaten by a shark were 0, then it would be impossible, logically impossible, that you could be eaten by a shark. But we’ve already agreed that it is logically possible. And if your probability is 1, then that means the universe guarantees, no matter what, you will have your head bit off. Ouch.

This means the probability of you dying by shark bite, without knowing anything else about you (and I mean this clause just as it’s written), is no number at all, but all numbers between 0 and 1. Which is fairly useless, as far as information goes, But not entirely unless since non-extreme probability tells us a event is contingent, i.e. logically possible.

We can now see that it makes no sense to say the unconditional probability of dying by a bee sting is “larger” than of suffering the consequences of a sharknado. These probabilities, in the absence of any other information, are equal.

What “other information”? Example: your aunt Narantsetseg lives in central Mongolia, far from the sea and inland aquariums, whilst you live in Key Largo in a beach shack. Given only this information, which includes common knowledge about these locations and their nearness to sharks, it’s natural to say you have a larger probability of dying of shark bite than your aunt.

How much larger? We don’t know. There’s still not enough information to quantify the difference. Why? All probability is conditional on the information supplied. If that information is vague, as it is in the maximal sense when we know noting other than the event in logically possible, no quantification is possible. To get numbers, we need specific information.

Enter the the Frequentist Fallacy. Happens like this. American citizens killed by shark bites are divided by the population, and this number is substituted for the probability of you yourself dying from shark bite. This “probability” is assigned to beach dwellers and Norther Michiganders alike. Which is silly, because, obviously, the information for these folks is radically different, and thus so are their (conditional) probabilities.

So is dying of a bee or wasp sting more probable than by shark bite? We now see that it makes no sense to ask this. If you can’t swim and are allergic to bee stings and live next to an apiary, then it’s more likely you’ll die from a sting than a shark bite. But if you live in Key West and go snorkeling daily and aren’t allergic to bees, it’s more likely you’ll die inside the innards of a shark.

How much more likely (in either case) we can’t say.


Thanks to Brad Tittle for suggesting this topic.

Decline Of Participation In Religious Rituals With Improved Sanitation

All the world's religions, in bacterial form.

All the world’s religions, in bacterial form.

Answer me this. Earl at the end of the bar, on his sixth or seventh, tells listeners just what’s wrong with America’s science policy. His words receive knowing nods from all. Does this action constitute peer review?

Whatever it is, it can’t be any worse than the peer review which loosed “Midichlorians—the biomeme hypothesis: is there a microbial component to religious rituals?” on the world. An official paper from Alexander Panchin and two others in Biology Direct, which I suppose is a sort of bargain basement outlet for academics to publish.

The headline above is a prediction directly from that paper, a paper so preposterous that it’s difficult to pin down just what went wrong and when. I don’t mean that it is hard to see the mistakes in the paper itself, which are glaring enough, for all love. No: the important question is how this paper, how even this journal and the folks who contribute to it, can exist and find an audience.

Perhaps it can be put down to the now critical levels of the politicization of science combined with the expansion team syndrome. More on that in a moment. First the paper.

It’s Panchin’s idea that certain bugs which we have in our guts make us crazy enough to be religious, and that only if there were a little more Lysol in the world there would be fewer or no believers.

Panchin uses the standard academic trick of citing bunches of semi-related papers, which give the appearance that his argument has both heft and merit. He tosses in a few television mystery-show clues, like “‘Holy springs’ and ‘holy water’ have been found to contain numerous microorganisms, including strains that are pathogenic to humans”. Then this:

We hypothesize that certain aspects of religious behavior observed in human society could be influenced by microbial host control and that the transmission of some religious rituals could be regarded as a simultaneous transmission of both ideas (memes) and organisms. We call this a “biomeme” hypothesis.

Now “memes” are one of the dumbest ideas to emerge from twentieth-century academia. So part of the current problem is that dumb ideas aren’t dying. Capital-S science is supposed to be “self correcting”, but you’d never guess it from the number of undead theories walking about.

Anyway, our intrepid authors say some mind-altering, religion-inducing microbes make their hosts (us) go to mass, or others to temple, and still more to take up posts as Chief Diversity Officers at universities just so that the hosts will be able pass on the bugs to other folks. Very clever of the microbes, no? But that’s evolution for you. You never know what it’ll do next.

Okay, so it’s far fetched. But so’s relativity—and don’t even get started on quantum mechanics. Screwiness therefore isn’t necessarily a theory killer. But lack of consonance with the real world is. So what evidence have the authors? What actual observations have they to lend even a scintilla of credence to their theory?


Not one drop. The paper is pure speculation from start to finish, and in the mode of bad Star Trek fan fiction at that.

So how did this curiosity (and others like it) become part of Science? That universities are now at least as devoted to politics as they are to scholarly pursuits is so well known it needs no further comment here. But the politics of describing religion as some sort of disease or deficiency is juicy and hot, so works like this are increasingly prevalent. Call them Moonacies, a cross between lunacies and Chris Mooney, a writer who makes a living selling books to progressives who want to believe their superiority is genetic.

Factor number two, which is not independent of number one, is expansion team syndrome. The number of universities and other organizations which feed and house “researchers” continue to grow, because why? Because Science! We’re repeatedly told, and everybody believes, that if only we all knew more Science, then the ideal society will finally have been created. Funding for personnel grows. Problem is, the talent pool of the able remains fixed, so the available slots are filled with the not-as-brilliant. Besides, we’re all scientists now!

New journals are continuously created for the overflow, and they’re quickly filled with articles like this one giving the impression things of importance are happening. Not un-coincidentally, these outlets contain greater proportions of papers which excite the press (no hard burden). And so here we are.

Comments On Dawid’s Prequential Probability

Murray takes the role of a prequential Nature.

Murray takes the role of a prequential Nature.

Phil Dawid is a brilliant mathematical statistician who introduced (in 1984) the theory of prequential probability1 to describe a new-ish way of doing statistics. We ought to understand this theory. I’ll give the philosophy and leave out most of the mathematics, which are not crucial.

We have a series of past data, x = (x1, x2, …, xn) for some observable of interest. This x can be quite a general proposition, but for our purposes suppose its numerical representation can only take the values 0 or 1. Maybe xi = “The maximum temperature on day i exceeds Wo C”, etc. The x can also have “helper” propositions, such as yi = “The amount of cloud cover on day i is Z%”, but we can ignore all these.

Dawid says, “One of the main purposes of statistical analysis is to make forecasts for the future” (emphasis original) using probability. (It’s only other, incidentally, is explanation: see this for the difference.)

The x come at us sequentially, and the probability forecast for time n+1 Dawid writes as Pr(xn+1 | xn). “Prequential” comes from “probability forecasting with sequential prediction.” He cites meteorological forecasts as a typical example.

This notation suffers a small flaw: it doesn’t show the model, i.e. the list of probative premises of x which must be assumed or deduced in order to make a probability forecast. So write pn+1 = Pr(xn+1 | xn, M) instead, where M are these premises. The notation shows that each new piece of data is used to inform future forecasts.

How good is M at predicting x? The “weak prequential principle” is that M should be judged only on the pi and xi, i.e. only how on good the forecasts are. This is not the least controversial. What is “good” sometimes is. There has to be some measure of closeness between the predictions and outcomes. People have invented all manner of scores, but (it can be shown) the only ones that should be used are so-called “proper scores”. These are scores which require pn+1 to be given conditional on just the M and old data and nothing else. This isn’t especially onerous, but it does leave out measures like R^2 and many others.

Part of understanding scoring is calibration. Calibration has more than one dimension, but since we have picked a simple problem, consider only two. Mean calibration is when the average of the pi equaled (past tense) the average of the xi. Frequency calibration is when whenever pi = q, q*100% of the time x = q. Now since x can only equal 0 or 1, frequency calibration is impossible for any M which does produce non-extreme probabilities. That is, the first pi that does not equal 0 or 1 dooms the frequency calibration of M.

Ceteris paribus, fully calibrated models are better than non-calibrated ones (this can be proven; they’ll have better proper scores; see Schervish). Dawid (1984) only considers mean calibration, and in a limiting way; I mean mathematical limits, as the number of forecasts and data head out to infinity. This is where things get sketchy. For our simple problem, calibration is possible finitely. But since the x are given by “Nature” (as Dawid labels the causal force creating the x), we’ll never get to infinity. So it doesn’t help to talk of forecasts that have not yet been made.

And then Dawid appears to believe that, out an infinity, competing mean-calibrated models (he calls them probability forecasting systems) are indistinguishable. “[I]n just those cases where we cannot choose empirically between several forecasting systems, it turns out we have no need to do so!” This isn’t so, finitely or infinitely, because two different models which have the same degree of mean calibration can have different levels of frequency calibration. So there is still room to choose.

Dawid also complicates his analysis by speaking as if Nature is “generating” the x from some probability distribution, and that a good model is one which discovers this Nature’s “true” distribution. (Or, inversely, he says Nature “colludes” in the distribution picked by the forecaster.) This is the “strong prequential principle”, which I believe does not hold. Nature doesn’t “generate” anything. Something causes each xi. And that is true even in the one situation where our best knowledge is only probabilistic, i.e. the very small. In that case, we can actually deduce the probability distributions of quantum x in accord with all our evidence. But, still, Nature is not “generating” x willy nilly by “drawing” values from these distributions. Something we-know-not-what is causing the x. It is our knowledge of the causes that is necessarily incomplete.

For the forecaster, that means, in every instance and for any x, the true “probability distribution” is the one that takes only extreme probabilities, i.e. the best model is one which predicts without error (each pi would be 0 or 1 and the model would automatically be frequency and mean calibrated). In other words, the best model is to discover the cause of each xi.

Dawid also has a technical definition of the “prequential probability” of an “event”, which is a game-theoretic like construction that need not detain us because of our recognition that the true probability of any event is 0 or 1.


That models should be judged ultimately by the predictions they make, and not exterior criteria (which unfortunately includes political considerations, and even p-values), is surely desirable but rarely implemented (how many sociological models are used to make predictions in the sense above?). But which proper score does one use? Well, that depends on exterior information; or, rather, on evidence which is related to the model and to its use. Calibration, in all its dimensions, is scandalously underused.

Notice that in Pr(xn+1 | xn, M) the model remains fixed and only our knowledge of more data increases. In real modeling, models are tweaked, adjusted, improved, or abandoned and replaced wholesale, meaning the premises (and deductions from the same) which comprise M change in time. So this notation is inadequate. Every time M changes, M is different, a banality which is not always remembered. It means model goodness judgments must begin anew for every change.

A true model is the one that generates extreme probabilities (0 or 1), i.e. the identifies the causes, or the “tightest” probabilities deduced from the given (restricted by nature) premises, as in quantum mechanics. Thus the ultimate comparison is always against perfect (possible) knowledge. Since we are humble, we know perfection is mostly unattainable, thus we reach for simpler comparisons, and gauge model success by it success over simple guesses. This is the idea of skill (see this).

Reminder: probability is a measure of information, an epistemology. It is not the language of causality, or ontology.


Thanks to Stephen Senn for asking me to comment on this.

1The two papers to read are, Dawid, 1984. Present position and potential developments: some personal views: statistical theory: the prequential approach. JRSS A, 147, part 2, 278–292. And Dawid and Vovk, 1999. Prequential probability: principles and properties. Bernoulli, 5(1), 125–162.

Explanation Vs Prediction

The IPCC, hard at work on another forecast.

The IPCC, hard at work on another forecast.


There isn’t as much space between explanation and prediction as you’d think; both are had from the same elements of the problem at hand.

Here’s how it all works. I’ll illustrate a statistical (or probability) model, though there really is no such thing; which is to say, there is no difference in meaning or interpretation between a probability and a physical or other kind of mathematical model. There is a practical difference: probability models express uncertainty natively, while (oftentimes) physical models do not mention it, though it is there, lurking below the equations.

Let’s use regression, because it is ubiquitous and easy. But remember, everything said goes for all other models, probability or physical. Plus, I’m discussing how things should work, not how they’re actually done (which is very often badly; not your models, Dear Reader: of course, not yours).

We start by wanting to quantify the uncertainty in some observable y, and believe we have collected some “variables” x which are probative of y. Suppose y is (some operationally defined) global average temperature. The x may be anything we like: CO2 levels, population size, solar insolation, grant dollars awarded, whatever. The choice is entirely up to us.

Now regression, like any model, has a certain form. It says the central parameter of the normal distribution representing uncertainty in y is a linear function of the x (y and x may be plural, i.e. vectors). This model structure is almost never deduced (in the strict sense of the word) but is assumed as a premise. This is not necessarily a bad thing. All models have a list of premises which describe the structure of the model. Indeed, that is what being a model means.

Another set of premises are the data we observe. Premises? Yes, sir: premises. The x we pick and then observe take the form of propositions, e.g. “The CO2 observed at time 1 was c1“, “The CO2 observed at time 2 was c2,” etc.

Observed data are premises because it is we who pick them. Data are not Heaven sent. They are chosen and characterized by us. Yes, the amount of—let us call it—cherishing that takes place over data is astonishing. Skip it. Data are premises, no different in character than other assumptions.


Here is what explanation is (read: should be). Given the model building premises (that specified, here, regression) and the observed data (both y and x), we specify some proposition of interest about y and then specify propositions about the (already observed) x. Explanation is how much the probability the proposition about y (call it Y) changes.

That’s too telegraphic, so here’s an example. Pick a level for each of the observed x: “The CO2 observed is c1“, “The population is p”, “The grant dollars is g”, etc. Then compute the probability Y is true given this x and given the model and other observed data premises.

Step two: pick another level for each of the x. This may be exactly the same everywhere, except for just one component, say, “The CO2 observed is c2“. Recompute the probability of Y, given the new x and other premises.

Step three: compare how much the probability of Y (given the stated premises) changed. If not at all, then given the other values of x and the model and data premises, then CO2 has little, and maybe even nothing, to do with y.

Of course, there are other values of the other x that might be important, in conjunction with CO2 and y, so we can’t dismiss CO2 yet. We have a lot of hard work to do to step through how all the other x and how this x (CO2) change this proposition (Y) about y. And then there are other propositions of y that might be of more interest. CO2 might be important for them. Who knows?

Hey, how much change in the probability of any Y is “enough”? I have no idea. It depends. It depends on what you want to use the model for, what decisions you want to make with it, what costs await incorrect decisions, what rewards await correct ones, all of which might be unquantifiable. There is and should be NO preset level which says “Probability changes by at least p are ‘important’ explanations.” Lord forbid it.

A word about causality: none. There is no causality in a regression model. It is a model of how changing CO2 changes our UNCERTAINTY in various propositions of y, and NOT in changes in y itself.1

Explanation is brutal hard labor.


Here is what prediction is (should be). Same as explanation. Except we wait to see whether Y is true or false. The (conditional) prediction gave us its probability, and we can compare this probability to the eventual truth or falsity of Y to see how good the model is (using proper scores).

Details. We have the previous observed y and x, and the model premises. We condition on these and then suppose new x (call them w) and ask what is the probability of new propositions of y (call them Z). Notationally, Pr( Z | w,y,x,M), where M are the model form premises. These probabilities are compared against the eventual observations of z.

“Close” predictions means good models. “Distant” ones mean bad models. There are formal ways of defining these terms, of course. But what we’d hate is if any measure of distance became standard. The best scores to use are those tied intimately with the decisions made with the models.

And there is also the idea of skill. The simplest regression is a “null x”, i.e. no x. All that remains is the premises which say the uncertainty in y is represented by some normal distribution (where the central parameter is not a function of anything). Now if your expert model, loaded with x, cannot beat this naive or null model, your model has no skill. Skill is thus a relative measure.

For time series models, like e.g. GCMs, one natural “null” model is the null regression, which is also called “climate” (akin to long-term averages, but taking into account the full uncertainty of these averages). Another is “persistence”, which is the causal-like model yt+1 = yt + fuzz. Again, sophisticated models which cannot “beat” persistence have no skill and should not be used. Like GCMs.


This is only a sketch. Books have been written on these subjects. I’ve compressed them all in 1,100 words.


1Simple causal model: y = x. It says y will be the value of x, that x makes y what it is. But even these models, though written mathematically like causality, are not treated that way. Fuzz is added to them mentally. So that if x = 7 and y = 9, the model won’t abandoned.

Older posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑