In 1887 almost every philosopher in the English-speaking countries was an idealist. A hundred years later in the same countries, almost all philosophers have forgotten this fact; and when, as occasionally happens, they are reminded of it, they find it almost impossible to believe. But it ought never to be forgotten. For it shows what the opinions, even the virtually unanimous opinions, of philosophers are worth, when they conflict with common sense.
Not only were nearly all English-speaking philosophers idealists a hundred years ago: absolutely all of the best ones were…In general, the British idealists were…good philosophers. Green, Bosanquet, Bradley, and Andrew Seth, in particular, were very good philosophers indeed. These facts need all the emphasis I can give them, because most philosophers nowadays either never knew or have forgotten them, and indeed…they cannot really believe them. They are facts, nevertheless, and facts which ought never to be forgotten. For they show what the opinions even, or rather, especially of good philosophers are worth, when conflict with common sense. (They therefore also throw some light on the peculiar logic of the concept ‘good philosopher’: an important but neglected subject.)
The current near, or would-be, consensus is that we are all slaves to our neurons, or perhaps genes, or both; or maybe our environment, or class situation, or anything; anything which denies our free will and exonerates us from culpability.
Of course, it would be a fallacy to say, as some of you are tempted to say, that any consensus should not be trusted. Because there are plenty of truths we all, philosophers or not, agree on. The only lesson for us is that the presence of a consensus does not imply truth. And maybe that some fields are more prone to grand mistakes than others.
You or I might perhaps be excused if we sometimes toyed with solipsism, especially when we reflect on the utter failure of our writings to produce the smallest effect in the alleged external world. —David Stove, “Epistemology and the Ishmael Effect.”
Statistics is broken. When it works, it usually does so in spite of itself. When it doesn’t, which is increasingly often, it inflates egos, promulgates scientism, idolizes quantification, supports ideologies, and encourages magical thinking.
I’m not going to prove any of that today (you’re welcome to read old posts for corroboration), but assume it. This is just a Friday rant.
I weep over the difficulty of explaining things. I can’t make what is obvious to me plain to others. Flaubert was right: “Human speech is like a cracked kettle on which we tap crude rhythms for bears to dance to, while we long to make music that will melt the stars.”
So most of the fault is mine. But not all of it.
Last week I had as a header this blurb: In Nate Silver’s book The Signal and the Noise: Why So Many Predictions Fail he says (p. 68) “Recently, however, some well-respected statisticians have begun to argue that frequentist statistics should no longer be taught to undergraduates.” That footnote recommended this paper.
Easy to say. Impossible to do. You cannot, in any university I know, teach unapproved material. There are exceptions for “PhD-level” courses and the like, where the air is thin and the seats never filled, but for undergraduates you must adhere to the party line. The excuse for this is circular: students must be taught what’s approved because what’s approved is what students must be taught.
The scheme does work, however, for material which resembles cookbook recipes. Rigid syllabuses are best for welding, accountancy, physics, and sharpshooting courses. That’s why the Army uses them. But they fail miserably in what used to be called the humanities, which I say includes probability; at least its philosophical side. Humanitarians see themselves as scientists these days. Only way to get funding, I guess. Skip it.
I don’t mean to swap Bayes with frequentism, at least not in the way most people think of Bayes. Problem is everybody learns Bayes after learning frequentism, which is like a malarial infection that can’t be shaken. Frequentists love to create hypotheses? So do Bayesians. Frequentists form an unnatural and creepy fascination with parameters? So too Bayesians. Frequentists point to the occult powers of “randomization”? Bayesians nervously follow suit. Effect is that there’s very little practical difference between the two methods. (Though you wouldn’t know it listening to them bickering.)
There is no cure for malaria. Best maneuver is to avoid areas where infections are prevalent. That unfortunately means learning probability and statistics outside those departments. There’s some hope they can be learnt from certain physicists, but a weak one. The lure of quantification is strong there, and the probability is incidental.
One can always wander to the website of some eccentric—a refugee from academia—but that isn’t systematic enough for lasting consequence.
I don’t have a solution. And what am I doing wasting my time wallowing? I have to finish my book.
Read Part I, Part II. Don’t be lazy. This is difficult but extremely important stuff.
Let’s add in a layer of uncertainty and see what happens. But first hike up your shorts and plant yourself somewhere quiet because we’re in the thick of it.
The size of relative risks (1.06) touted by authors like Jerrett get the juices flowing of bureaucrats and activists who see any number north of 1 reason for intervention. Yet in their zeal for purity they ignore evidence which admits things aren’t as bad as they appear. Here’s proof.
Relative risks are produced by statistical models, usually frequentist. That means p-values less than the magic number signal “significance”, an unfortunate word which doesn’t mean what civilians think. It doesn’t imply “useful” or “important” or even “significant” in its plain English sense. Instead, it says the probability of seeing a test statistic larger (in absolute value) than the one produced by the model and observed data if the “experiment” which gave the observations were indefinitely repeated and if certain parameters of the quite arbitrary model are set to 0.1 What a tongue twister!
Every time you see a p-value, you must recall that definition. Or fall prey to the “significance” fallacy.
Now (usually arbitrarily chosen and not deduced) statistical models of relative risk have a parameter or parameters associated with that measure.2 Classical procedure “estimates” the values of these parameters; in essence, makes a guess of them. The guesses are heavily—as in heavily—model and data dependent. Change the model, make new observations, and the guesses change.
There are two main sources of uncertainty (there are many subsidiary). This is key. The first is the guess itself. Classical procedure forms confidence or credible “95%” intervals around the guess.3 If these do not touch a set number, “significance” is declared. But afterwards the guess alone is used to make decisions. This is the significance fallacy: to neglect uncertainty of the second and more important kind.
Last time we assumed there was no uncertainty of the first kind. We knew the values of the parameters, of the probabilities and risk. Thus the picture drawn was the effect of uncertainty of the second kind, though at the time we didn’t know it.
We saw that even though there was zero uncertainty of the first kind, there was still tremendous uncertainty in the future. Even with “actionable” or “unacceptable” risk, the future was at best fuzzy. Absolute knowledge of risk did not give absolute knowledge of cancer.
This next picture shows how introducing uncertainty of the first kind—present in every real statistical model—increases uncertainty of the second.
The narrow reddish lines are repeated from before: the probabilities of new cancer cases between exposed and not-exposed LA residents assuming perfect knowledge of the risk. The wider lines are the same, except adding in parameter uncertainty (parameters which were statistically “significant”).
Several things to notice. The most likely cancer cases stopped by eliminating completely coriandrum sativum is still about 20, but the spread in cancer stopped doubles. We now believe there could be more cancer cases, but there also could be many fewer.
There is also more overlap between the two curves. Before, we were 78% sure there would be more cancer cases in the exposed group. Now there is only a 64% chance: a substantial reduction. Pause and reflect.
Parameter uncertainty increases the chance to 36% (from 22%) that any program to eliminate coriandrum sativum does nothing. Either way, the number of affected citizens remains low. Affected by cancer, that is. Everybody would be effected by whatever regulations are enacted. And don’t forget: any real program cannot eliminate completely exposure; the practical effect on disease must always be less than ideal. But the calculations focus on the ideal.
We’re not done. We still have to add the uncertainty in measuring exposure, which typically is not minor. For example, Jerrett (2013) assumes air pollution measurements from 2002 effect the health of people in the years 1982-2000. Is time travel possible? Even then, his “exposure” is a guess from a land-use model. Meaning he used the epidemiologist fallacy to supply exposure measurements.
Adding exposure uncertainty pushes the lines above outward, and increase their overlap. We started with 78% chance any regulations might be useful (even though the usefulness affected only about 20 people); we went to 64% with parameter uncertainty; and adding in measurement error will move that number closer to 50%—the bottom of the barrel of uncertainties. At 50%, the probability lines for exposed and not-exposed would exactly overlap.
I stress I did not use Jerrett’s model—because I don’t have it. He didn’t publish it. The example here is only an educated guess of what the results would be under typical kinds of parameter uncertainty and given risks. The direction of uncertainty is certainly correct, however, no matter what his model was.
Plus—you knew this was coming: my favorite phrase—it’s worse than we thought! There are still sources of uncertainty we didn’t incorporate. How good is the model? Classical procedure assumes perfection (or blanket usefulness). But other models are possible. What about “controls”? Age, sex, etc. Could be important. But controls can fool just as easily as help: see footnote 2.
All along we have assumed we could eliminate exposure completely. We cannot. Thus the effect of regulation is always less than touted. How much less depends on the situation and our ability to predict future behavior and costs. Not so easy!
I could go on and on, adding in other, albeit smaller, layers of uncertainty. All of which push that effectiveness probability closer and closer to 50%. But enough is enough. You get the idea.
1Other settings are possible, but 0 is the most common. Different models on the same data give different p-values. Which one is right? All. Different test statistics used on the same model and data give different p-values. Which one is right? All. How many p-values does that make all together? Don’t bother counting. You haven’t enough fingers.
2Highly technical alley: A common model is logistic regression. Read all about them in chapters 12 and 13 of this free book (PDF). It says the “log odds of getting it” are linearly related to predictors, each associated with a “parameter.” The simplest such model is (r.h.s) b0 + b1 * I(exposed), where the I(exposed) equals 1 when exposed, else 0. With a relative risk of 1.06 and exposed probability of 2e-4, you cannot, with any sample size short of billions, find a wee p-value for b1. But you can if you add other “controls”. Thus the act of controlling (for even unrelated data) can cause what isn’t “significant” to become that way. This is another, and quite major, flaw of p-value thinking.
3“Confidence” intervals mean, quite literally, nothing. This always surprises. But everybody interprets them as Bayesian credible intervals anyway. These are the plus or minus intervals around a parameter, giving its most likely values.
Read Part I. Seriously. Read it. Not everything is easy. Today’s stuff is used to make decisions about your life, so pay attention.
Cue the organ… When we last left Tom, he was checking his albondigas for spots. He had read a breathless press report that the risk of cancer doubled by exposure to coriandrum sativum. I weep that nobody commented on this yesterday.
Now there’s doubling and there’s doubling. Moving from 1 in 10 million to 2 in 10 million is a doubling, but of an entirely different kind than in jumping from 1 in 2 to 2 in 2, i.e. 50% to 100%. Using relative opposed to absolute risk disguises this difference. There is no reason in the world to worry or fret over a relative risk of 2 when the probabilities are in the range of 1 in 10 million. But there’d be every reason in the universe to be vexed when jumping from coin flip to certainty.
Lesson one (again): never trust anybody trying to sell you anything using relative risk. Always demand the absolute numbers.
That 1 in 10 million seem low to you? It doesn’t to the EPA. In this guide (e.g. p. 5) they fret over tiny risks, and often reference 1 in a million and 1 in 10 thousand as regulation worthy. Let’s play: boost the exposed cancer chance to 2 in 10 thousand.
We now need a workable relative risk. Use 1.06, the high-water relative risk in a series of widely touted papers by Michael Jerret and others (more here, here, here, and here; EPA adores these papers). Jerret spoke of others diseases, but what matters is the size of relative risk deemed regulation worthy. My examples work with any disease. With a relative risk of 1.06, the chance of cancer in the not-exposed group is 0.000189.
Here is a picture of the probabilities for new cancer cases in the two groups in LA.
There’s a 99.99% chance that from about 300 to 440 not-exposed people will develop cancer, with the most likely number (the peak of the dotted line) about 380. And there’s the same 99.99% chance that from about 340 to 460 exposed people will develop cancer, with the most likely number about 400. A difference of about 20 folks. Surprisingly, there’s only a 78% chance that more people in the exposed group than in the not-exposed group will develop cancer. That makes a 21% chance the not-exposed group will have as many or more cancerous bodies. Make sure you get this before continuing.
This not-trick question helps: how many billions would you pay to reduce the exposure of coriandrum sativum to zero? If it disappeared, there’d be a 78% chance of saving at least one life. Not a 100% chance. Pause and reflect. Even if you shrink exposure to nothing—to absolutely nothing—there is still a 21% chance (1 in 5) of spinning your wheels.
And there’s a cap. If we use the 99.99% threshold1, then the best we could save is about 160 lives. That comes from assuming 460 exposed people develop cancer and 300 not-exposed people get it (the extremes of both pictures2). The most likely scenario is a saving of about 20 lives. Out of 4 million. Meaning at best you’d affect about 0.004% of the population, and probably more like 0.0005%. How many billions did you say?3
There are strong assumptions here. The biggest is that there is no uncertainty in the probabilities of cancer in the two groups. No as in zero. Add any uncertainty, even a wee bit, and that savings in lives goes down. In real life there is plenty of uncertainty in the probabilities. We’ll see how this effects things in Part III.
Assumption number two. That everybody who gets cancer dies. That won’t be so; at least, not for most diseases. So we have to temper that “savings” some more. Probably by a lot.
Assumption number three. Exposure is perfectly measured and there is no other contributing factor in the cancer-causing chain different between the two groups. We might “control” for some differences, but recall we’ll never know—as is never know—whether we measured and controlled for the right things. It could always—as in always—be that we missed something. But even assuming we didn’t, exposure is usually measured with error. After all, how easy is it to track exposure? I’ll tell you: not easy, not easy at all.
So difficult is exposure to track that there is substantial uncertainty in any estimated environmental “dose”.4 In our example, we said this measurement error was zero. In real life, it is not. Add any error, even a little bit, and the certainty of saving lives necessarily does down, down, down.
I ask you: is it any wonder that those with something to sell not only speak in terms of relative risk, but also ignore the various uncertainties?
1We could add some 9s to this and not change the fundamental conclusion.
2There’s only there’s a 0.005% chance that 460 or more exposed people get cancer, there’s a 99.985% chance at least as many as 300 in the not-exposed people get it.
3Same numbers for the state of California, which has about 38 million residents. 99.99% chance 6950 to 7400 not-exposed, and 7350 to 7850 exposed develop cancer. Most likely lives saved about 550. 99.99% cap about 900. I.e., roughly 0.0014% to 0.0024% of the population. Under perfect conditions with no uncertainty.
4It is only in rare laboratory experiments where the dose is known exactly.