Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

May 25, 2018 | 7 Comments

Other Practical Books On Par With Uncertainty? Reader Question

Got this email from VD. I’ve edited to remove any personal information and to add blog-standard style and links. I answered, and I remind all readers of the on-going claassre, but I thought I’d let readers have a go at answering, too.

I greatly appreciate the wealth of material contained on your website, and I am an avid reader of both your articles and papers and a consumer of your videos/lectures/podcasts on YouTube. You bring a clarity to the oft misunderstood, and—to an uncultured pleb such as myself—seemingly esoteric field of magical, complex formulae known as statistics.

I have a twofold question: First, do you have any plans to produce a textbook for students utilizing the principles within Uncertainty: The Soul of Modeling, Probability and Statistics—something along the lines of an updated Breaking the Law of Averages? I confess I have not yet read Uncertainty but assure you that it is at the top of my books-to-purchase list (although I’m under the impression much of the content therein is elucidated on your blog). If Uncertainty is the book I’m looking for then please let me know. I am also working through Breaking the Law and find it extremely helpful, lacking only in solutions to check my work.

If I simply need to go through Breaking the Law a few more times, please let me know if that’s the best route. In any event, I would appreciate a sequel that is an even better synthesis of the ideas since-developed and distilled in Uncertainty while also functioning as introductory-to-intermediate text on logical probability/objective Bayesian statistics. I appreciate your approach utilizing logic, common sense, and observation, to quantify the uncertainty for a given set of premises rather than becoming so consumed with parametrical fiddling that I forgot the nature of the problem I was trying to solve.

Second, if no new book is in the works, do you know of any good textbooks or resources for undiscerning novices such as myself for learning logical probability/objective Bayesian statistics that aren’t inundated with the baggage of frequentist ideals or the worst parts of classical statistics, baggage still dragged around by many of the currently available textbooks and outlets for learning statistics? It seems every other book or resource I pick up has at least a subset of the many errors and problems you’ve exposed and/or alluded to in your articles. If no such “pure” text exists, can you recommend one with a list of caveats? I also have found a copy of Jaynes’ Probability Theory, so I’ve added that to the pile of tomes to peruse. Since reading your blog I now make a conscious effort to mentally translate all instances of “random”, “chance”, “stochastic”, etc. to “unknown,” as well as actively oppose statements that “x entity is y-distributed (usually normally, of course!)” and recognize the fruits of the Deadly Sin of Reification (models and formulae, however elegant, are not reality).

I currently work to some degree as an analyst in Business Intelligence/Operations for a [large] company—a field where uncertainty, risk, and accurate predictive modeling are of paramount importance—and confess my grasp of mathematics and statistics is often lacking (I am in the process of reviewing my high school pre-calculus algebra and trigonometry so I can finally have a good-spirited go at calculus and hopefully other higher math). I think my strongest grasp at this point is philosophy (which I studied in undergrad with theology and language), and then logic and Boolean algebra, having spent a bit of time in web development and now coding Business Intelligence solutions. It’s the math and stats part that’s weak. If only I could go back 10 years and give myself a good talking to; hindsight’s 20-20 I suppose.

While not aiming to be an actuary by any measure, I want to be able to understand statements chock full of Bayesian terminology like the following excerpt from an actuarial paper on estimating loss. I want to discern whether such methods and statistics are correct:

“We will also be assuming that the prior distribution (that is, the credibility complement, in Bayesian terms) is normal as well, which is the common assumption. This is a conjugate prior and the resulting posterior distribution (that is, the credibility weighted result) will also be normal. Only when we assume normality for both the observations and the prior, Bayesian credibility produces the same results as Bühlmann-Straub credibility. The mean of this posterior normal distribution is equal to the weighted average of the actual and prior means, with weights equal to the inverse of the variances of each. As for the variance, the inverse of the variance is equal to the sum of the inverses of the within and between variances (Bolstad 2007).” (Uri Korn, “Credibility for Pricing Loss Ratios and Loss Costs,” Casualty Actuarial Society E-Forum, Fall 2015).

I understand maybe 25% of the previous citation.

My end goal is to professionally utilize the epistemological framework given on your blog and in Uncertainty. I want to be able to do modeling and statistics the right way, based on reality and observables, without the nuisances of parameters and infinity if they are not needed. I deal with mostly discrete events and quantifications bounded by intervals far smaller than (-infinity, +infinity) or (0, infinity),

I appreciate any advice you could share. Thank you sir!


May 24, 2018 | 3 Comments

Manipulating the Alpha Level Cannot Cure Significance Testing

Nothing can cure significance testing. Except a bullet to the p-value.

(That sound you heard was from readers pretending to swoon.)

The paper is out and official—and free!: “Manipulating the Alpha Level Cannot Cure Significance Testing“. I am one (and a minor one) of the—Count ’em!—fifty-eight authors.

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

My friends, this is peer-reviewed, therefore according to everything we hear from our betters, you have no choice but to believe each and every word. Criticizing the work makes you a science denier. You will also be reported to the relevant authorities for your attitude if you dare cast any doubt.

I mean it. Peer review is everything, a guarantor of truth. Is it not?

Or do we allow the possibility of error? And, if we do, if we are allowed to question this article, are we not allowed to question every article? That sounds mighty close to Science heresy, so we’ll leave off and concentrate on the paper.

Now I am with my co-authors a lot of the way. Except, as regular readers know, I would impose my belief that null hypothesis significance testing be banished forevermore. Just as the “There is some good in p-values if properly used” folks would impose their belief that there is some good in p-values. Which there is not.

Another matter is “effect size”, which almost always means a statement about a point estimate of a parameter inside an ad hoc model. These are not plain-English effect sizes, which implies causality. How much effect x has on y. But statistical models can’t tell you that. They can, when used in a predictive sense, say how much the uncertainty of y changes when x does. So “effect size” is, or should, be thought of in an entirely probabilistic way.

The conclusion we can all agree with:

It seems appropriate to conclude with the basic issue that has been with us from the beginning. Should p-values and p-value thresholds, or any other statistical tool, be used as the main criterion for making publication decisions, or decisions on accepting or rejecting hypotheses? The mere fact that researchers are concerned with replication, however it is conceptualized, indicates an appreciation that single studies are rarely definitive and rarely justify a final decision. When evaluating the strength of the evidence, sophisticated researchers consider, in an admittedly subjective way, theoretical considerations such as scope, explanatory breadth, and predictive power; the worth of the auxiliary assumptions connecting nonobservational terms in theories to observational terms in empirical hypotheses; the strength of the experimental design; and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

Bonus Disguising p-values as “magnitude-based inference” won’t help, either, as this amusing story details. Gist: some guys tout massaged p-values as innovation, are exposed as silly frauds, and cry victim, a cry which convinces some.

Moral: The best probability is probability, and not some ad hoc conflation of probability with decision, which is what all “hypothesis tests” are.

May 21, 2018 | 11 Comments

Choose Predictive Over Parametric Every Time

Gaze and wonder at picture which heads this article, which I lifted from John Haman’s nifty R package ciTools.

The numbers in the plot are made up whole cloth to demonstrate the difference between parameter-centered versus predictive-centered analysis. The code for doing everything is listed under “Poisson Example”.

The black dots are the made up data, the central dark line the result of the point estimate of a Poisson regression of the fictional x and y. The darker “ribbon” (from ggplot2) is the frequentist confidence interval around that point estimate. Before warning against confidence intervals, which every frequentist alive interprets in a Bayesian sense every time, because frequentism fails as a philosophy of probability (see this), look at the wider lighter ribbon, which is the 95% frequentist prediction interval, which again every frequentist interprets in the Bayesian sense every time.

The Bayesian interpretation is that, for the confidence (called “credible” in Bayesian theory) interval, there is a 95% the point estimate will fall inside the ribbon—given the data, the model, and, in this case, the tacit “flat” priors around the parameters. It’s a reasonable interpretation, and written in plain English.

The frequentist interpretation is that, for any confidence interval anywhere and anytime all that you can say is that the Platonic “true” value is in the interval or it is not. You may not assign any probability or real-life confidence that the true value is in the interval. It’s all or nothing—always. Same interpretation for the prediction interval.

It’s that utter uselessness of the frequentist interpretation that everybody switches to Bayesian mode when confronted by any confidence (credible) or prediction interval. And so we shall too.

The next and most important thing to note is that, as you might expect, the prediction bounds are very much greater than the parametric bounds. The parametric bounds represent uncertainty of a parameter inside the model. The prediction bounds represent uncertainty in the observables; i.e. what will happen in real life.

Now almost every report of results which use statistics use parametric bounds to convey uncertainty in those results. But people who read statistical results think in terms of observables (which they should). They therefore wrongly assume that the narrow uncertainty in the report applies to real life. It does not.

You can see from Haman’s toy example that, even when everything is exactly specified and known, the predictive uncertainty is three to four times the parametric uncertainty. The more realistic Quasi-Poisson example of Haman’s (which immediately follows) even better represents actual uncertainty. (The best example is a model which uses predictive probabilities and which is verified against actual observables never ever seen before.)

The predictive approach, as I often say, answers the questions people have. If my x is this, what is the probability y is that? That is what people want to know. They do not care about how a parameter inside an ad hoc model behaves. Any decisions made using the parametric uncertainty will therefore be too certain. (Unless in the rare case one is investigating parameters.)

So why doesn’t everybody use predictive uncertainty instead of parametric? If it’s so much better in every way, why stick with a method that necessarily gives too-certain results.

Habit, I think.

Do a search for (something like) “R generalized linear models prediction interval” (this assumes a frequentist stance). You won’t find much, except the admission that such things are not readily available. One blogger even wonders “what a prediction ‘interval’ for a GLM might mean.”

What they mean (in the Bayesian sense) is that, given the model and observations (and the likely tacit assumption of flat priors), if x is this, the probability y is that is p. Simplicity itself.

Even in the Bayesian world, with JAGS and so forth, there is not an automatic response to thinking about predictions. The vast, vast majority of software is written under the assumption one is keen on parameters and not on real observables.

The ciTools can be used for a limited range of generalized linear models. What’s neat about it is the coding requirements are almost none. Create the model, create the scenarios (the new x), then ask for the prediction bounds. Haman even supplies lots of examples of slick plots.

Homework: The obvious. Try it out. And then try it on data where you only did ordinary parametric analysis and contrast it with the predictive analysis. I promise you will be amazed.

May 11, 2018 | 36 Comments

Inference To The Best Explanation: Shapiro’s The Miracle Myth Reviewed — Part II

Read Part I.

A researcher puts you into a room. On the table is a blue ball. Somebody put it there. It could have been Alice, Bob, or Charlie. Given only that information—and no more—who put it there? You have to pick one and only one.

If the choice seems arbitrary, it’s because it is. Whoever you pick has equal justification given only the information provided.

Instead of choosing, we can switch to probability. Given only the information provided, what is the probability Alice placed the ball? Same as for the other two: one-third (the proof of that is found in here).

We have learned three things. One, probability is conditional on only the information given or assumed. Two, decision (or choice) is not probability: decision uses probability, but it is a step beyond it. Three, there must have been a cause for the ball.

The probability is straightforward (but see this page if you want to learn more). The choice, decision, or act is less so. Given the probability, and given what you think will happen if you were to guess right or wrong, you make a choice, a decision, or you act. Two people can have the exact same precise duplicate identical information, and thus must necessarily come to the same probability, but they can easily come to different (even wildly different) decisions because they believe their choices will have different consequences—and their choices may very well have different consequences. And no matter what the (conditional) probability is, and no matter what we decide, there will still be a true cause.

Probability (epistemology), act (or will), and cause (metaphysics). All different steps which must be kept distinct when analyzing any problem.

The philosophical concept of inference to the best explanation can confuse and conflate these three steps or categories. Not systematically, so that we can apply a correction, but willy-nilly, depending on who is wielding the tool.

Inference to the best explanation (IBE) asks us to make a choice on a cause without examining—in any thorough sense—probability or the consequences of decision. This is not to say the technique does not and cannot come to correct probabilities, decisions, and understandings of cause. It can and very often does, especially in those areas in which we have expertise or extensive knowledge.

The reason IBE works, when it works, is that people are good at the individual steps without knowing or explicitly acknowledging they are using those steps. That will be obvious in a moment.

What happens when you see a ball and you really want to know the cause of it being there? You run through possibilities. I specified only three, and then said nothing more except that there were these three. There is no information about Alice’s motives, or her placement (where was she?), her personality, nothing. The information allowed was restricted in the extreme. Given only it, we could make a choice, but we recognized that choice’s arbitrariness. That arbitrariness informs the decision we would make, depending on how we view the consequences of making right or wrong decisions (which may well be different for each reader).

We also implicitly recognized one aspect of the cause: the efficient cause. We know a person placed it there, but we don’t know why. We do not know anything of the final cause, the reason the ball was put there. That we don’t know the motivation does not, obviously, mean we do not know the ball isn’t there. It is there. We also do not know the formal and material causes: we do not know the means the person used. Again, our ignorance of these does not mean the ball is not there.

That the IBE does not work here—there is no single best explanation and no identification of all aspects of the cause—is not the fault of the artificial nature of the problem: it is the fault of IBE. Any epistemological technique that claims to be an algorithm to discover the best guess of truth on given information (IBE does not claim to always find truth) has to work everywhere, or we have to look elsewhere for better algorithms. I claim we can’t find one: we’re stuck with probability, decision, and cause. Life and thinking isn’t so easy.

Now in real life you are not as restricted as in this artificial situation. You are free to guess or assume or measure other probative evidence that will modify the probability, change the decision, or lead to fuller understanding of the causes.

Didn’t I see Alice here earlier? I thought Bob said he was driving somewhere. That looks a lot like a ball Charlie plays with. Fastidious Alice might have been here, but I can’t see why she’d leave a ball lying about. Et cetera. You must play detective.

Means, motive, opportunity. That’s what detectives look for, because why? Because these items identify all aspects of the cause of the event. Detectives know they might not always guess right, that the wrong man is sometimes pegged, that some motives are opaque, and on and on. Detectives also know that the defense attorneys are free to form their own list of probative evidence, and so will come to different probabilities, decisions, and understandings of cause.

The possibility of differences in assumptions is the key to understanding the IBE’s general weakness—and it’s sometime usefullness.

It is the freedom to choose the evidence, and that there is no algorithm that leads us to the right set of perfect evidence that results in uncertainty. Uncertainty is often our lot.

Of course, there will always be a right set of perfect evidence that puts the probability at 0 or 1, as the case may be, evidence that results in a flawless decision, and that nails all parts of the cause. Our goal is to get as close as we can to this perfect set. But there is no guarantee we will even come close to it much of the time. (And there is even proof that in some cases, such as in quantum mechanics, it is impossible to come to it.)

A strange blip on the bubble chamber screen. Something caused it. What? The physicist must piece together the evidence. Means, motive, opportunity. In the end, and especially if the blip never repeats, he may just shrug his shoulders and say “Chance”—which is only and ever a euphemism for “I don’t know.”

The nature of evidence is the same at home, in science, in math, and in religion. Why something is is different from that or how it is. (I won’t prove here it works in math, but I do prove it here.)

None of this is controversial, except to die-hard followers of IBE who somehow believe that if only they exerted themselves sufficiently, they can always come to the best explanation of all aspects of a case—which is not synonymous with true. When IBE works, it’s really common sense, carefully explicated.

Next week we’ll see how Shapiro’s use of IBE to dismiss miracles relies on premises he didn’t know he was assuming, on how he did not account for the freedom to assume what evidence is probative, and how he didn’t grasp all aspects of cause.

Update Since it has arisen, there are other interpretations of quantum mechanics which differ from the classical ones. For instance: Quantum Potency & Probability, which restores Heisenberg’s original surmise. About cause. Now everything potential that becomes actual only can do do by something actual—a fancy way of saying QM events are not “uncaused”, as some would have it. On the nature of cause in QM see inter alia Wolfgang Smith’s The Quantum Enigma: Finding the Hidden Key (3rd Edition) Scholastic Metaphysics: A Contemporary Introduction. There is another recent monograph by (I think) a Dominican scientist on the same subject which is escaping my memory. When I recall, I’ll post.