Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

April 15, 2019 | 9 Comments

Randomization Isn’t Needed — And Can Be Harmful

Had a named person in statistics (Andrew Althouse) ask me about randomization, which he likes, and which I do not. “I want to compare outcomes for a specific patient group getting surgery A vs surgery B (assume clinical equipoise). If I’m not going to randomize, how should I allocate the patients in my study so I am confident that afterward I can make inference on the difference in outcomes?”

Excellent question. My response, though it was unsatisfying to the gentleman, was “I’d have independent experts allocate patients, ensuring balance of (what are thought to be) secondary causes, where the panel’s decisions are hidden form trial surgeons. Try to inject as much control as possible, while minimizing cheating etc.”

Too terse to be believed, perhaps. I expand the answer here.

Control in any experiment is what counts, not randomization. For one, there is no such thing as “randomization” in any mystical sense as required by frequentist theory. Probability does not exist. Randomness does not exist. This is proved elsewhere.

What we can do is to create some sort of device or artifice that removes control of allocating patients from a man and gives it to a machine. The machine then controls, by some mechanism, which patients get surgery A and which B.

A man could do it, too. But men are often interested in the outcome; therefore, the temptation to cheat, to shade, to manipulate, to cut corners, is often too strong to be resisted. I’ve said it a million times, and I say it again now: every scientist believes in confirmation bias, they just believes it happens to the other guy.

There is also the placebo effect to consider in medical trials. If a patient knows for sure he is getting a sham or older treatment, if affects him differently than if he were ignorant. The surgeons must know, of course, which surgeries they are performing; thus it is impossible to remove the potential for fooling oneself here. The surgeons doing the sham or older surgery (which we can imagine is A) might slack off; when switching to B they might cut with vigor and renewed enthusiasm.

Now if some sort of “randomization” (i.e. allocation control) device that spit out A and B, 100 of each (Althouse later gave this number), it could be that all 100 As were female and all 100 Bs male. It doesn’t matter that this is unlikely: it could happen. Imagine if it did. Would you be satisfied in analyzing the result?

No, because we all believe—it is a tacit premise of our coming model—that sex is important in analyzing results. Why? Because sex, or the various systems biologically related to sex, tend to cause different outcomes, which include, we suppose, the surgical outcomes of interest here. We would be foolish not to control for sex.

Which is exactly why many trials “randomize” within sex by removing the control from the device and giving it back to some man, to ensure a good balance of males and females in the groups. This makes eminent sense: control is everything.

I don’t know what the surgery is, but it has to be something. Suppose it’s some kind of vascular surgery applied near or to the heart. We know there are lots of other causes, such as CHF, that might also play a causal role in the outcomes we’re tracking. If we’re sure of this, we would also “block” on CHF. That is, we would again remove control of the allocation device and give it to a man.

And so on for the other causes. We might not have the funds or time to explicitly control for all of these, in this physical allocation sense. But we might later include these in any model of uncertainty of the outcome. This is also called “controlling”, although there is no control about it. We’re just looking at things as they stood: we had no control over these other measures. (I wish we’d drop the misleading terminology. See this award-eligible book for a longer discussion of this.)

Enter Don Rumsfeld’s unknown unknowns. There may be many other causes, secondary or more removed (mitigators and so on), of the outcome of which we are ignorant. This must be so, or science would be at its end. How many such things are there in our surgery? We don’t know. They are unknown unknowns. There could be one, there could be ten thousand. The human body is a complicated organism: there are feedbacks upon feedbacks.

How will the machine allocator split these possible causes in the groups? We have no idea. It could be that the machine, like we imagined for sex, puts all or most of a dastardly cause in A and all or most of a beneficent cause in B. And this could go back and forth, and forth and back across all the other causes.

There is nothing we can do about this. They are, after all, unknown unknowns. But the mechanical allocator can’t somehow magically fix the situation such that an equal number of all causes are distributed in the groups. You don’t know what you’ll get. Worse, this ignorance is true, too, for the mechanical allocator for causes we know but don’t explicitly control for. “Randomization” is the experimental procedure of tossing darts and hoping for the best.

Notice closely, though, that the desire for uniform distribution of causes is sound. It is often thought “randomization” gives this. It cannot, as we have seen. But if it is so important—and it is—why not then control explicitly for the causes we know? Why leave it to “chance”? (That’s a joke, son.)

Consider this is precisely how physics experiments are done. Especially in sensitive experiments, like tracking heat, extreme care is taken to remove or control all possible known causes of heat. Except, of course, for the cause the physicist is manipulating. He wants to be able to say that “When I pulled this lever by so much, the heat changed this much, because of the lever”. If he is wrong about removing other causes, it might not be the lever doing the work. This is what got Fleischmann and Pons into such deep kimchi.

Return to my panel of independent experts. They know the surgeries and the goals of these surgeries. They are aware, as can be, of the secondary and other causes. They do their best to allocate patients to the two groups so that the desired balance of the known causes is achieved.

Perfection cannot be had. Panel members can be bought; or, more likely, they won’t be as independent as we liked. Who on the panel wouldn’t, deep in his heart, not like the new treatment to work? I’ll tell you who: the rival of the man who proposed the treatment. The panel might control sub-optimally. Besides all that, there are always the possibility of unknown unknowns. Yet this panel still has a good chance to supply the control we so rightly desire.

Randomization isn’t needed, does nothing, can cause harm, while blinding is often crucial and control is paramount.

Bonus Althouse also asked this (ellipsis original): “Your ‘expert panel’ has assigned 100 patients to receive A and 100 patients to receive B. 14 of the patients that received A died, 9 of the patients that received B died. Your statistical analysis is…what, exactly?”

He wasn’t satisfied (again) with my “Predictive analysis complete with verification.” Too terse once more. As regular readers know, if we cannot deduce a model from accepted-by-all premises (as we sometimes but rarely can), we have to apply looser premises which often lead to ad hoc models. These are the most frequent kind of models in use.

I don’t know what ad hoc model I’d use in this instance; it would depend on knowing all the details of the trial. There are many choices of model, as all know.

“That’s a cop out. Which model is best here?”

Glad you asked, friend. We find that out by doing a predictive analysis (I pointed to this long paper for details on his this works) followed by a verification analysis—a form of analysis which is almost non-existent in the medical literature.

I can sum up the process short, though: make a model, make predictions, test the predictions against reality.

Makes sense, yes?

April 3, 2019 | 3 Comments

Reality-Based Probability & Statistics: Ending The Tyranny Of Parameters!

Here it is! The one, the only, the peer-reviewed (and therefore true) “Reality-Based Probability & Statistics: Solving the Evidential Crisis” (the link is to the pdf, which is 11 MB; there are many pictures).

This a large review paper, summing up the problems I see in statistics, with a guide of how escape from the void.

There is no question computer scientists are kicking statisticians’ asses. Hard. As we saw yesterday. The answer is anyway simple: “AI”, which is nothing but lists of if-then statements, at least divorces, or does not concern itself much with, parameters. Statistics believes these strange entities have life! All practice, frequentist or Bayes, revolves around them. We are in orbit around a fiction.

Enough already! Let’s turn our eyes toward Reality. Here’s how.


These are refinements to “Everything Wrong With P-values Under One Roof“, here with a mind toward cause.

P-values are now officially dead. The sooner we stop talking about them and about Reality, the better.


Do we need hypothesis tests? No. And we only need model selection sometimes. If we’re forced to pick between models—and since most models are free in the sense they are only bits of code, we don’t always have to pick—then we should pick with a Reality-based metric and nothing else. Sometimes models cost model, because observations cost money, and therefore we will need to select. We do this based on Reality, not parameters.

Regular readers will be familiar with the mechanics of predictive inference, probability leakage, and all that. So you can skim this section, but pay some attention to the example.

Section 4: Y CAUSE

This is it! This is the missing element. The lack of focus on cause.

Parameter estimates are often called “effect size”, though the causes thought to generate these effects are not well specified. Models are often written in causal-like form (to be described below), or cause is conceived by drawing figurative lines or “paths” between certain parameters.

Parameters are not causes, and causes don’t happen to parameters. Probability is not real. Thus cause cannot operate on it. Parameters aren’t real: same deal.

Cause is probability and statistics is mixed up, to say the least; right ideas mix with wrong and swap places. There is no consistency.

Cause is conditional. Three small words packed with meaning. All probability is conditional, too, and in the same way. Once this is understood, we have made a great leap, and we can see what is possible to know about cause and what is not.


“Scarcely any who use statistical models ever ask does the model work? Not works in the sense that data can be fit to it, but works in the sense that it can make useful predictions of reality of observations never before seen or used in any way. Does the model verify?”

Then some ways this can and must be done.

Section 6: THE FUTURE

“No more hypothesis testing. Models must be reported in their predictive form, where anybody (in theory) can check the results, even if they don’t have access to the original data. All models which have any claim to sincerity must be tested against reality, first in-sample, then out-of-sample. Reality must take precedence over theory.”


This is in the inaugural edition of the Asian Journal of Economics and Banking, which does not yet have a web site (it’s that new). Paper copies are available at all better libraries.

April 2, 2019 | 8 Comments

AI Is Kicking Statistics’s Ass

Here’s the headline: “AI can predict when someone will die with unsettling accuracy: This isn’t the first time experts have harnessed AI’s predictive power for healthcare.

Unsettling accuracy? Is accuracy unsettling? Has AI progressed so far that all you have to do is step on an AI scale and the AI computer spits out an unsettlingly accurate AI prediction of the AI end of your AI life? AI AI AI AI AI? AI!

I’ve said it many times, but the marketing firm hired by computer scientists has more than earned its money. Science fiction in its heyday had nothing on these guys. Neural nets! Why, those are universal approximators! Genetic algorithms! Genes in the machine. Machine learning! Deep learning! Like, that’s deep, man. Artificial intelligence! Better than the real thing!

What has statistics got? Statistically significant? No, that’s dead. Thank God. Uniformly most powerful test? Unbiased estimator? Auto-regressive? Dull isn’t in it. You won’t buy any headlines talking about mu-hat.

What’s the difference between statistics and AI? Besides the overblown hype, that is? One thing: a focus on the results. That’s the reason AI is landing every punch, and why statistics is reeling. Statistical models focus on fictional non-existent hidden unobservable parameters, whereas AI tries to get the model to make good predictions of reality.

Now AI is nothing but statistical modeling appended with a bunch of if-then statements. Only this, and nothing more. Computers do not know as we know; they do not grasp universals or understand cause. They don’t even know what inputs to ask for to predict the outputs of interest. We have to tell them. Just as we do in statistics.

The reason AI models beat statistical ones is because AI models are tuned to making good predictions, whereas statistical models are usually tuned to things like wee p-values or parameter estimates. Ladies and gentlemen, parameters are of no interest to man or beast. The focus on them has forced, in a way, a linearity culture, whereas if we can’t write down the model in pleasing parameterized form, we’re not interested. Besides, we need that form to do the limit math of statistics of estimators of these parameters so that we can get p-values, which do not mean what anybody thinks they do.

AI scoffs at parameters and says, how can I create a mathematical function, however complex, of these input measures so that skillful, but not over-fit, predictions of the output measure are good?

That, and its understanding, or its attempts at understanding, cause. We’ve discussed many times, and it’s still true, that you can’t get cause from a probability model. Cause is in the mind, not the data. We need to be part of the modeling process. And so on. AI, though it’s at the beginning of all this, tries to get this right. I’ll have a paper tomorrow on this. Stay tuned!

I say AI will never make it. Computers, being machines, aren’t intellects; they are not rational creatures like we are. Intellect is needed to extract universals from individual cases, and computers can never do that—unless we have first programmed them with the answer, of course.

That is to say, strong AI is not possible. Others disagree. To them I say, don’t wait up.

We can’t discount the over-blownness of the comparison. Reporters love AI, and nearly all cherish the brain-as-computer metaphor, so we’ll apt to see intellect where it is not. Plus hype sells. Who knew?

It’s not all hype, of course. AI is better, in general, at making predictions. But headlines like the one above are ridiculous.

When all the number crunching was done, the deep-learning algorithm delivered the most accurate predictions, correctly identifying 76 percent of subjects who died during the study period. By comparison, the random forest model correctly predicted about 64 percent of premature deaths, while the Cox model identified only about 44 percent.

These are not unsettling rates. The “deep learning” is AI, the “random forest” is “machine learning” (if you like, a technique invented by a statistician), and “Cox model” is regression, more or less. I didn’t look at the details of how the regression picked its variables, but if it’s anything like “stepwise”, the method was doomed.

We always have to be suspicious about the nature of the predictions, too. These should be on observations never before see or used in any way. They should not be part of a “validation set”, because everybody cheats and tweaks their models to do well on the validation set, which, as should be clear, turns the validation set into an extension of the training set.

March 27, 2019 | 8 Comments

An Argument Against The Multiverse

The multiverse might be real. God might in His wisdom and love of completeness and true diversity and the joy of filling all possible potentials with actuality might have created such a thing. Indeed, the wondrous complexity and size of the known universe is traditionally taken as an argument for the existence of God. The multiverse simply carries that idea to its limit—in the mathematical and philosophical sense.

Still, there is something appalling about the idea. The multiverse takes parsimony by the short hairs and kicks it in the ass. Talk about multiplying entities beyond necessity! An uncountable infinity of universes that cannot be seen might sound good on paper, but only because we trouble grasping how truly large an uncountable infinity is.

What follows is not a proof against the multiverse, but an argument which casts doubt on the idea. It was inspired by Jeremy Butterfield’s review of Sabine Hossenfelder’s Lost in Math: How Beauty Leads Physics Astray. (hat Tip Ed Feser; I haven’t read Hossenfelder’s book), a review we need to examine in depth first.

Her book “emphasizes supersymmetry, naturalness and the multiverse. She sees all three as wrong turns that physics has made; and as having a common motivation—the pursuit of mathematical beauty.” Regular readers have heard similar complaints here, especially about all those wonderful limit results for distributions of test statistics which give rise to p-values, which aren’t what people thought they were. Also time series. But never mind all that. On to physics!

“…Hossenfelder’s main criticism of supersymmetry is, in short, that it is advocated because of its beauty, but is unobserved. But even if supersymmetry is not realized in nature, one might well defend studying it as an invaluable tool for getting a better understanding of quantum field theories…A similar defence might well be given for studying string theory.”

How about the multiverse?

Here, Hossenfelder’s main criticism is, I think, not simply that the multiverse is unobservable: that is, the other pocket universes (domains) apart from our own are unobservable. That is, obviously, ‘built in’ to the proposal; and so can hardly count as a knock-down objection. The criticism is, rather, that we have very little idea how to confirm a theory postulating such a multiverse.

We discussed non-empirical confirmation of theories earlier in the week. We need to understand what is meant by fine-tuning, a crucial concept.

As to supersymmetry, which is a family of symmetries transposing fermions and bosons: the main point is not merely that it is unobserved. Rather, it is unobserved at the energies recently attained at the LHC at which—one should not say: ‘it was predicted to be observed’; but so to speak—‘we would have been pleased to see it’. This cautious choice of words reflects the connection to Hossenfelder’s second target: naturalness, or in another jargon, fine-tuning. More precisely, these labels are each other’s opposites: naturalness is, allegedly, a virtue: and fine-tuning is the vice of not being natural.

Butterfield says “naturalness” is “against coincidence”, “against difference”, “for typically.”

By against coincidence he means “There should be some explanation of the value of a fundamental physical parameter.” This is the key thought for us. There has to be a reason—a cause—of the value of the electron charge or fine structure constant; indeed any and every constant. Butterfield says “the value [of any constant] should not be a ‘brute fact’, or a ‘mere matter of happenstance’, or a ‘numerical coincidence’.”

The against difference concept is related to how parameters, i.e. constants, are estimated. And typicality means the value of the parameter must be defined in a rigorously defined “theoretical framework.”

Namely: there should be a probability distribution over the possible values of the parameter, and the actual value should not have too low a probability. This connects of course with orthodox statistical inference. There, it is standard practice to say that if a probability distribution for some variable is hypothesized, then observing the value of a variable to lie ‘in the tail of the distribution’—to have ‘a low likelihood’ (i.e. low probability, conditional on the hypothesis that the distribution is correct)—disconfirms the hypothesis that the distribution is the correct one: i.e. the hypothesis that the distribution truly governs the variable. This scheme for understanding typicality seems to me, and surely most interested parties—be they physicists or philosophers—sensible, perhaps even mandatory, as part of scientific method. Agreed: questions remain about:

(a) how far under the tail of the distribution—how much of an outlier—an observation can be without disconfirming the hypothesis, i.e. without being ‘atypical’;

(b) how in general we should understand ‘confirm’ and ‘disconfirm’, e.g. whether in Bayesian or in traditional (Neyman-Pearson) terms; and relatedly

(c) whether the probability distribution is subjective or objective; or more generally, what probability really means.

That “standard” statistical practice is now being jettisoned (follow the link above for details of this joyful news). Far better to assess the probability a proposition is true given explicitly stated evidence. For that is exactly what probability is: a measure of truth.

Again, never mind that. Let’s discuss cause. Butterfield says he follows Hume and his “constant conjunctions”, which is of course the modern way. But that way fails when thinking about what causes parameters. There are no conjunctions, constant or otherwise.

Ideally, what a physicist would love is a mathematical-like theorem with rigorous premises from which are deduced the value of each and every physical constant/parameter. That would provide the explanation for each constant, and an explanation is a lovely thing to have. But an explanation is not a cause, and knowing only a effect’s efficient cause might not tell you about its final cause, or reason for being.

Now in the multiverse (if it exists) sits our own universe, with its own set of constants with specific values which we can only estimate and which are, as should be clear, theory dependent. A different universe in the unimaginably infinite set could and would have different values for all or some of the constants.

An anthropic-type argument next enters which says we can see what we can see because we got lucky. Our universe had just the right values needed to produce beings like us—notice the implicit and enormous and unjustified assumption that only material things exist—beings that could posit such things as multiverses. But we had to get real lucky, since it appears that even minute deviations from the constants would produce universes where beings like us would not exist. We discussed before arguments against fine-tuning and parameter cause: here and here. Do read these.

Probability insinuates itself long about here. What is the probability of all this fine-tuning? It does’t exist. No thing has a probability/. All probability is conditional on the premises assumed. And once we start on the premises of the multiverse we very quickly run into some deep kimchi. For one of these premises is, or appears to be (I ask for correction from physicists), uncountability. There is not just a countable infinity of universes, but an uncountable collection of them. This follows from the continuity assumption about the values of constants. They live on the real line; or, because there may be relations between them, the real hyper-cube.

Well, probability breaks down at infinities. We instead speak of limits, but that’s a strictly mathematical concept. What does it mean physically to have a probability approach a limit? I don’t know, but I suspect it has no meaning. Butterfield is aware of the problem.

“For all our understanding of probability derives from cases where there are many systems (coins, dice…or laboratory systems) that are, or are believed to be, suitably similar. And this puts us at a loss to say what ‘the probability of a cosmos’, or similar phrases like ‘the probability of a state of the universe’, or ‘the probability of a value of a fundamental parameter’ really mean” [ellipsis original].

I disagree, for all the reasons we’ve discussed many times. Probability is not a measure of propensity, though probability can be used to assess uncertainty of propensity, and to make predictions. Butterfield then rightly rejects naive frequentism. But he didn’t quite say he rejected it because counting multiverses is impossible. Such a thing can never be done. Still, probability as true survives.

Back to fine-tuning and some words of Weinberg about fine-tuning quoted by Butterfield (all markings original):

We assumed the probability distribution was completely flat, that all values of the constant are equally likely. Then we said, ;What we see is biased because it has to have a value that allows for the evolution of life. So what is the biased probability distribution?’ And we calculated the curve for the probability and asked ‘Where is the maximum? What is the most likely value?’ … [Hossenfelder adds: ‘the most likely value turned out to be quite close to the value of the cosmological constant which was measured a year later’.]…So you could say that if you had a fundamental theory that predicted a vast number of individual big bangs with varying values of the dark energy [i.e. cosmological constant] and an intrinsic probability distribution for the cosmological constant that is flat…then what living beings would expect to see is exactly what we see.

What premise allowed the idea of a “flat” prior on a constant’s value? Only improper probabilities, which is to say, not probabilities at all result from this premise. Unless we want to speak of limits of distributions again—but where is the justification for that?

All right. Here’s where we are. No physicist has any idea why the constants which are measured (or rather estimated) take the values they do. The values must have a reason: Butterfield and Hossenfelder agree. That is, they must have a cause.

Now if the multiverse exists (and here we recall our previous arguments against fine-tuning), our universe, even though it is one of an uncountable infinity, must have a reason why it has these values for its constants. You cannot say “Well, all values are represented somewhere in the multiverse. We have these.” That’s only a restatement of the multiverse premises. We have to say why this universe was caused to have these values, and, it follows, why others were caused to have other values.

Well, so much is not new. Here is what is (finally!).

You’ll grant that math is used to do the figurings of all this work. Math itself relies on my constants, assumptions, and so on. Like the values of π and e. Something caused these constants to take the values they do, too (a longer argument about this is here). They cannot exist for no reason, and the reason cannot be “chance”, which is without power.

There is no hint, as far as I can discover, that multiverse theorists believe the values of these mathematical and logical constants differ, as do physical constants. That physical constants differ is only an assumption anyway. So why not assume math, logic, and truth differ? But if they do, then there is no saying what could happen in any multiverse. You can’t use the same math as in our universe to diagnose an imagined selection from the multiverse. You don’t know what math to use.

Everything is put into question. And we’re further from the idea of cause. That we run from it ought to tell use something important. The problem of cause is not solved by the multiverse. There has to be a reason each universe has its own parameter values, and there has to be a reason it has the values of mathematical constants. This might be the same reason; or again, it might not be. The cause has to be there, however. It cannot be absent.

It is of interest that we initially thought the physics might be variable but the math not. Math is deeper down, in an epistemological sense; so deep, that we have forgotten to ask about cause. At any rate, because it seems preposterous to assume math changes from universe to universe, because math seems best to use fixed and universal (to make a pun), there is reason to question whether physics changes, too.