## Choose Predictive Over Parametric Every Time

Gaze and wonder at picture which heads this article, which I lifted from John Haman’s nifty R package `ciTools`

.

The numbers in the plot are made up whole cloth to demonstrate the difference between *parameter*-centered versus *predictive*-centered analysis. The code for doing everything is listed under “Poisson Example”.

The black dots are the made up data, the central dark line the result of the point estimate of a Poisson regression of the fictional x and y. The darker “ribbon” (from `ggplot2`

) is the frequentist confidence interval around that point estimate. Before warning against confidence intervals, which every frequentist alive interprets in a Bayesian sense every time, because frequentism fails as a philosophy of probability (see this), look at the wider lighter ribbon, which is the 95% frequentist prediction interval, which again every frequentist interprets in the Bayesian sense every time.

The Bayesian interpretation is that, for the confidence (called “credible” in Bayesian theory) interval, there is a 95% the point estimate will fall inside the ribbon—given the data, the model, and, in this case, the tacit “flat” priors around the parameters. It’s a reasonable interpretation, and written in plain English.

The frequentist interpretation is that, for *any* confidence interval *anywhere* and *anytime* *all* that you can say is that the Platonic “true” value is in the interval or it is not. You may *not* assign any probability or real-life confidence that the true value is in the interval. It’s all or nothing—always. Same interpretation for the prediction interval.

It’s that utter uselessness of the frequentist interpretation that everybody switches to Bayesian mode when confronted by any confidence (credible) or prediction interval. And so we shall too.

The next and most important thing to note is that, as you might expect, the prediction bounds are very much greater than the parametric bounds. The parametric bounds represent uncertainty of a parameter inside the model. The prediction bounds represent uncertainty in the observables; i.e. what will happen in real life.

Now almost every report of results which use statistics use parametric bounds to convey uncertainty in those results. But people who read statistical results think in terms of observables (which they should). They therefore wrongly assume that the narrow uncertainty in the report applies to real life. It does not.

You can see from Haman’s toy example that, even when everything is exactly specified and known, the predictive uncertainty is three to four *times* the parametric uncertainty. The more realistic Quasi-Poisson example of Haman’s (which immediately follows) even better represents actual uncertainty. (The best example is a model which uses predictive probabilities and which is verified against actual observables never ever seen before.)

The predictive approach, as I often say, answers the questions people have. *If my x is this, what is the probability y is that?* That is what people want to know. They do not care about how a parameter inside an *ad hoc* model behaves. Any decisions made using the parametric uncertainty will therefore be too certain. (Unless in the rare case one is investigating parameters.)

So why doesn’t everybody use predictive uncertainty instead of parametric? If it’s so much better in every way, why stick with a method that necessarily gives too-certain results.

Habit, I think.

Do a search for (something like) “R generalized linear models prediction interval” (this assumes a frequentist stance). You won’t find much, except the admission that such things are not readily available. One blogger even wonders “what a prediction ‘interval’ for a GLM might mean.”

What they mean (in the Bayesian sense) is that, given the model and observations (and the likely tacit assumption of flat priors), *if x is this, the probability y is that is p*. Simplicity itself.

Even in the Bayesian world, with `JAGS`

and so forth, there is not an automatic response to thinking about predictions. The vast, vast majority of software is written under the assumption one is keen on parameters and not on real observables.

The `ciTools`

can be used for a limited range of generalized linear models. What’s neat about it is the coding requirements are almost none. Create the model, create the scenarios (the new x), then ask for the prediction bounds. Haman even supplies lots of examples of slick plots.

**Homework**: The obvious. Try it out. And then try it on data where you only did ordinary parametric analysis and contrast it with the predictive analysis. I promise you will be amazed.