William M. Briggs

Statistician to the Stars!

Page 151 of 580

Most Probabilities Aren’t Quantifiable

Look at those colorful numbers!

Look at those colorful numbers!

We’ve done this before in different form. But it hasn’t stuck; plus we need this for reference.

Not all probability is quantifiable. The proof for this is simple: all that must be demonstrated is one probability that cannot be made into a unique number. I’ll do this in a moment, but first it is interesting to recall that in its infancy it wasn’t clear probability could or should be represented numerically. (See Jim Franklin’s terrific The Science of Conjecture: Evidence and Probability Before Pascal.) It is only obvious probability is numerical when you’ve grown up subsisting solely on a diet of numbers, a condition true any working scientist.

The problem is because some probabilities are numerical, the only time it feels real, scientific and weighty, is if it is stated numerically. Nobody wants to make a decisions based on mere words, not when figures can be used. Result? Over-certainty.


Kolmogorov in 1933’s Foundations of the Theory of Probability gave us stated axioms which put probability on a firm footing. Problem is, the first axiom said, or seemed to say, “probability is a number”, and so did the second (the third gave a rule for manipulating these numbers). The axioms also require a good dose of mathematical training to comprehend, which contributed to the idea probabilities are numbers.

Different, not-so-rigorous, but nevertheless appealing axioms were given by Cox in 1961. Their appeal was their statement in plain English and concordance with commonsense. (Cox’s lack of mathematical rigor was subsequently fixed by several authors.1) Now these axioms yield two interesting results. First is that probability is always conditional. We can never write (in standard symbols) Pr(A), which reads “The probability of proposition A”, but must write Pr(A|B), “The probability of A given the premise or evidence B.” This came as no shock to logicians, who knew that the conclusion of any argument must be “conditioned on” premises or evidence of some kind, even if this evidence is just our intuition. This result didn’t shock anybody else, either. Because it’s rarely remembered. Another victim of treating probability exclusively mathematically.

The second result sounds like numbers. Certainty has probability 1, falsity probability 0, just as expected. And, given some evidence B, the probability of some A plus the probability that A is false must equal 1: that is, it is a certainty (given B) that either A or not-A is true. Numbers, but only sort of, because there is no proof that for any A or B, Pr(A|B) will be a number. And indeed, there can be no proof, as you’ll discover. In short: Cox’s proofs are not constructive.

Cox’s axioms (and their many variants) are known, or better to say, followed by only a minority of physicists and Bayesian statisticians. They are certainly not as popular as Kolmogorov’s, even though following Cox’s trail can and usually does lead to Kolmogorov. Which is to say, to mathematics, i.e. numbers.

Numberless probability

Here’s our example of a numberless probability: B = “A few Martians wear hats” and A = “The Martian George wears a hat.” There is no unique Pr(A|B) because there is no unique map from “a few” to any number. The only way to generate a unique number is to modify B. Say B’ = “A few, where ‘a few’ means 10%, Martians wear hats.” Then Pr(A|B’) = 0.1. Or B” = “A few, where ‘a few’ means never more than one-half…” Then 0 < Pr(A|B”) < 0.5. It should be obvious that B is not B’ nor B” (if it isn’t, you’re in deep kimchi). More examples are had by changing “a few” to “some”, “most”, “a bunch”, “not so many” and on and on, none of which lead to a unique probability. This is all true even though, in each case, Pr(A|B) + Pr(not-A|B) = 1. (Why? Because that formula is a tautology.)

It turns out most probability isn’t quantifiable, because most judgments of uncertainty cannot be and are not stated numerically. “Scientific” propositions, many of which can be quantified, are very rare in human discourse. Consider this, from which you will see it easy to generate endless examples. B (spoken by Bill) = “I might go over to Bob’s” as the sole premise for A = “Bill will go to Bob’s”. Note very carefully that this is your premise, not Bill’s. It is your uncertainty in A given B that is of interest. The only way to come to a definite number is by adding to B; perhaps by your knowledge of Bill’s habits. But if you were a bystander and overheard the conversation, you wouldn’t know how to add to B, unless you did so by subtle hints of Bill’s dress, his mannerisms, and things like that. Anyway, all these change B, and make it into something which is not B. That’s cheating. If asked for Pr(A|B) one must provide Pr(A|B) and not Pr(A|B’) or anything else.

This seemingly trivial rule is astonishingly difficult to remember or to heed if one is convinced probability is numerical. It would never be violated when working through a syllogism, say, or calculating a mathematical proof, where blatant additions to specified evidence are rejected out of hand. A professor would never let a student change the problem so that the student can answer it. Not so with probabilities. People will change the problem to make it more amenable. “Subjective” Bayesians make a career out of it.

Why is the rule so hard? No sooner will you ask somebody what is Pr(A|B) and they’ll say, “Well there’s lot of factors to consider…” There are not. There is only one, and that is B’s logical relation to A. Anything else, however interesting, is not relevant. Unless one wants to change the problem and discover the plausible evidence B’ which gives A its most extreme probability (nearest to 0 or 1). The modifier “plausible” is needed, because it is always possible to create evidence which makes A true or false (e.g. B = “A is impossible”). The plausibility is to fit the evidence into a larger scheme of propositions. This is a large topic, skipped here, because it is incidental.

Lots of detail left out here, which you have to fill in. See the classic posts page for how.

Update 2 Fixed the d*&^%^*&& typo that one of my enemies placed in the equation below. Rats!

Update An algebraic analogy. “If y = 1 and x + y < 7, solve for x.” There isn’t provided enough information to derive a unique value for x. It thus would be absurd, and obviously so, to say, “Well, I feel most x are positive; I mean, if I were to bet. And I’ve seen a lot of them around 3, though I’ve come across a few 4s too. I’m going with 3.”

Precision is often denied us. As silly as this example is, we see its equivalent occur in probability all the time.


1See inter alia Dupré and Tipler, 2009. New Axioms for Rigorous Bayesian Probability Bayesians Analysis, 3, 599-606.


The Consensus In Philosophy

David Stove, philosopher sui generis.

David Stove, philosopher sui generis.

In 1887 almost every philosopher in the English-speaking countries was an idealist. A hundred years later in the same countries, almost all philosophers have forgotten this fact; and when, as occasionally happens, they are reminded of it, they find it almost impossible to believe. But it ought never to be forgotten. For it shows what the opinions, even the virtually unanimous opinions, of philosophers are worth, when they conflict with common sense.

Not only were nearly all English-speaking philosophers idealists a hundred years ago: absolutely all of the best ones were…In general, the British idealists were…good philosophers. Green, Bosanquet, Bradley, and Andrew Seth, in particular, were very good philosophers indeed. These facts need all the emphasis I can give them, because most philosophers nowadays either never knew or have forgotten them, and indeed…they cannot really believe them. They are facts, nevertheless, and facts which ought never to be forgotten. For they show what the opinions even, or rather, especially of good philosophers are worth, when conflict with common sense. (They therefore also throw some light on the peculiar logic of the concept ‘good philosopher': an important but neglected subject.)

David Stove, “Idealism: a Victorian Horror-story (Part One)” in The Plato Cult and other Philosophical Follies, 1991, Basil Blackwell, Oxford, p. 97; emphasis original.

The current near, or would-be, consensus is that we are all slaves to our neurons, or perhaps genes, or both; or maybe our environment, or class situation, or anything; anything which denies our free will and exonerates us from culpability.

Of course, it would be a fallacy to say, as some of you are tempted to say, that any consensus should not be trusted. Because there are plenty of truths we all, philosophers or not, agree on. The only lesson for us is that the presence of a consensus does not imply truth. And maybe that some fields are more prone to grand mistakes than others.

Update Stove on the Science Mafia in the Velikovsky affair.


Scrap Statistics, Begin Anew

I only am escaped alone to tell thee.

I only am escaped alone to tell thee.

You or I might perhaps be excused if we sometimes toyed with solipsism, especially when we reflect on the utter failure of our writings to produce the smallest effect in the alleged external world. —David Stove, “Epistemology and the Ishmael Effect.”

Statistics is broken. When it works, it usually does so in spite of itself. When it doesn’t, which is increasingly often, it inflates egos, promulgates scientism, idolizes quantification, supports ideologies, and encourages magical thinking.

I’m not going to prove any of that today (you’re welcome to read old posts for corroboration), but assume it. This is just a Friday rant.

I weep over the difficulty of explaining things. I can’t make what is obvious to me plain to others. Flaubert was right: “Human speech is like a cracked kettle on which we tap crude rhythms for bears to dance to, while we long to make music that will melt the stars.”

So most of the fault is mine. But not all of it.

Last week I had as a header this blurb: In Nate Silver’s book The Signal and the Noise: Why So Many Predictions Fail he says (p. 68) “Recently, however, some well-respected statisticians have begun to argue that frequentist statistics should no longer be taught to undergraduates.” That footnote recommended this paper.

Easy to say. Impossible to do. You cannot, in any university I know, teach unapproved material. There are exceptions for “PhD-level” courses and the like, where the air is thin and the seats never filled, but for undergraduates you must adhere to the party line. The excuse for this is circular: students must be taught what’s approved because what’s approved is what students must be taught.

The scheme does work, however, for material which resembles cookbook recipes. Rigid syllabuses are best for welding, accountancy, physics, and sharpshooting courses. That’s why the Army uses them. But they fail miserably in what used to be called the humanities, which I say includes probability; at least its philosophical side. Humanitarians see themselves as scientists these days. Only way to get funding, I guess. Skip it.

I don’t mean to swap Bayes with frequentism, at least not in the way most people think of Bayes. Problem is everybody learns Bayes after learning frequentism, which is like a malarial infection that can’t be shaken. Frequentists love to create hypotheses? So do Bayesians. Frequentists form an unnatural and creepy fascination with parameters? So too Bayesians. Frequentists point to the occult powers of “randomization”? Bayesians nervously follow suit. Effect is that there’s very little practical difference between the two methods. (Though you wouldn’t know it listening to them bickering.)

There is no cure for malaria. Best maneuver is to avoid areas where infections are prevalent. That unfortunately means learning probability and statistics outside those departments. There’s some hope they can be learnt from certain physicists, but a weak one. The lure of quantification is strong there, and the probability is incidental.

One can always wander to the website of some eccentric—a refugee from academia—but that isn’t systematic enough for lasting consequence.

I don’t have a solution. And what am I doing wasting my time wallowing? I have to finish my book.


Selling Fear Is A Risky Business: Part Last

Read Part I, Part II. Don’t be lazy. This is difficult but extremely important stuff.

Let’s add in a layer of uncertainty and see what happens. But first hike up your shorts and plant yourself somewhere quiet because we’re in the thick of it.

The size of relative risks (1.06) touted by authors like Jerrett get the juices flowing of bureaucrats and activists who see any number north of 1 reason for intervention. Yet in their zeal for purity they ignore evidence which admits things aren’t as bad as they appear. Here’s proof.

Relative risks are produced by statistical models, usually frequentist. That means p-values less than the magic number signal “significance”, an unfortunate word which doesn’t mean what civilians think. It doesn’t imply “useful” or “important” or even “significant” in its plain English sense. Instead, it says the probability of seeing a test statistic larger (in absolute value) than the one produced by the model and observed data if the “experiment” which gave the observations were indefinitely repeated and if certain parameters of the quite arbitrary model are set to 0.1 What a tongue twister!

Every time you see a p-value, you must recall that definition. Or fall prey to the “significance” fallacy.

Now (usually arbitrarily chosen and not deduced) statistical models of relative risk have a parameter or parameters associated with that measure.2 Classical procedure “estimates” the values of these parameters; in essence, makes a guess of them. The guesses are heavily—as in heavily—model and data dependent. Change the model, make new observations, and the guesses change.

There are two main sources of uncertainty (there are many subsidiary). This is key. The first is the guess itself. Classical procedure forms confidence or credible “95%” intervals around the guess.3 If these do not touch a set number, “significance” is declared. But afterwards the guess alone is used to make decisions. This is the significance fallacy: to neglect uncertainty of the second and more important kind.

Last time we assumed there was no uncertainty of the first kind. We knew the values of the parameters, of the probabilities and risk. Thus the picture drawn was the effect of uncertainty of the second kind, though at the time we didn’t know it.

We saw that even though there was zero uncertainty of the first kind, there was still tremendous uncertainty in the future. Even with “actionable” or “unacceptable” risk, the future was at best fuzzy. Absolute knowledge of risk did not give absolute knowledge of cancer.

This next picture shows how introducing uncertainty of the first kind—present in every real statistical model—increases uncertainty of the second.

Again, these are true probabilities and not "densities." See Part II.

Again, these are true probabilities and not “densities.” See Part II.

The narrow reddish lines are repeated from before: the probabilities of new cancer cases between exposed and not-exposed LA residents assuming perfect knowledge of the risk. The wider lines are the same, except adding in parameter uncertainty (parameters which were statistically “significant”).

Several things to notice. The most likely cancer cases stopped by eliminating completely coriandrum sativum is still about 20, but the spread in cancer stopped doubles. We now believe there could be more cancer cases, but there also could be many fewer.

There is also more overlap between the two curves. Before, we were 78% sure there would be more cancer cases in the exposed group. Now there is only a 64% chance: a substantial reduction. Pause and reflect.

Parameter uncertainty increases the chance to 36% (from 22%) that any program to eliminate coriandrum sativum does nothing. Either way, the number of affected citizens remains low. Affected by cancer, that is. Everybody would be effected by whatever regulations are enacted. And don’t forget: any real program cannot eliminate completely exposure; the practical effect on disease must always be less than ideal. But the calculations focus on the ideal.

We’re not done. We still have to add the uncertainty in measuring exposure, which typically is not minor. For example, Jerrett (2013) assumes air pollution measurements from 2002 effect the health of people in the years 1982-2000. Is time travel possible? Even then, his “exposure” is a guess from a land-use model. Meaning he used the epidemiologist fallacy to supply exposure measurements.

Adding exposure uncertainty pushes the lines above outward, and increase their overlap. We started with 78% chance any regulations might be useful (even though the usefulness affected only about 20 people); we went to 64% with parameter uncertainty; and adding in measurement error will move that number closer to 50%—the bottom of the barrel of uncertainties. At 50%, the probability lines for exposed and not-exposed would exactly overlap.

I stress I did not use Jerrett’s model—because I don’t have it. He didn’t publish it. The example here is only an educated guess of what the results would be under typical kinds of parameter uncertainty and given risks. The direction of uncertainty is certainly correct, however, no matter what his model was.

Plus—you knew this was coming: my favorite phrase—it’s worse than we thought! There are still sources of uncertainty we didn’t incorporate. How good is the model? Classical procedure assumes perfection (or blanket usefulness). But other models are possible. What about “controls”? Age, sex, etc. Could be important. But controls can fool just as easily as help: see footnote 2.

All along we have assumed we could eliminate exposure completely. We cannot. Thus the effect of regulation is always less than touted. How much less depends on the situation and our ability to predict future behavior and costs. Not so easy!

I could go on and on, adding in other, albeit smaller, layers of uncertainty. All of which push that effectiveness probability closer and closer to 50%. But enough is enough. You get the idea.


1Other settings are possible, but 0 is the most common. Different models on the same data give different p-values. Which one is right? All. Different test statistics used on the same model and data give different p-values. Which one is right? All. How many p-values does that make all together? Don’t bother counting. You haven’t enough fingers.

2Highly technical alley: A common model is logistic regression. Read all about them in chapters 12 and 13 of this free book (PDF). It says the “log odds of getting it” are linearly related to predictors, each associated with a “parameter.” The simplest such model is (r.h.s) b0 + b1 * I(exposed), where the I(exposed) equals 1 when exposed, else 0. With a relative risk of 1.06 and exposed probability of 2e-4, you cannot, with any sample size short of billions, find a wee p-value for b1. But you can if you add other “controls”. Thus the act of controlling (for even unrelated data) can cause what isn’t “significant” to become that way. This is another, and quite major, flaw of p-value thinking.

3“Confidence” intervals mean, quite literally, nothing. This always surprises. But everybody interprets them as Bayesian credible intervals anyway. These are the plus or minus intervals around a parameter, giving its most likely values.

« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑