## How To Do Predictive Statistics: Part I New (Free) Software Introduction

**Introduction**

Here’s what we always want, but never get, using the old ways of doing statistics: The probability that some proposition Y is true given certain assumptions. Like this:

(1) Pr(Y | assumptions).

What happens if we change the assumptions? We change the probability. This is not a bug, it is a feature. There is NO SUCH THING, not ever, as Pr(Y), i.e. Pr(Y|no assumptions). It is an impossibility. And that means frequentism as a theory of probability fails. As does any theory which takes for granted that things “have” a probability and it is our job to discover it. For proof of these claims, read this *New York Times*-eligible bestseller *Uncertainty*.

Just because you don’t write down assumptions, doesn’t mean they are not there. At *the least* there will be the assumptions about what Y means, about grammar and language, and maybe rules of math. There are *always* assumptions.

That means if we change the model, and since the model is an assumption, we change the probability. If we add or subtract “data”, and our model uses data, and since the model is an assumption, we change the probability. If we tweak the model, as in adjust it or modify its parts, like priors, and since the model is an assumption, we change the probability. Again, this is not a bug, but a *necessary* feature of probability. (Some people actually complain about how changing the model or prior changes the probability, but that’s because they believe in Pr(Y) which is impossible.)

Most models are *ad hoc*. The vast majority of models make rather gross approximations of Y, such that it lies on the continuum. (No real thing does, though, oddly, the continuum is called the “real” line.) Regression is in this class of *ad hociness*. It is, by far, the most used model class, judging by published “research” (sorry for scare quotes). Regression is, or rather should be, this:

(2) Pr(Y | new X, old X & Y, Model & assumptions),

which is the probability of our proposition Y *given* a new assumed or measured proposition X, the old measured values of X & Y, and the *ad hoc* model with its assumptions. There is *no* notion of cause in this set up, despite how folks describe their results. All (2) says is “This is the probability of Y given these things.” That’s it, and nothing more.

For shorthand we can write

(3) Pr(Y | XDM),

where D is the old “data”, and the others are obvious. Change *anything*—as in *anything*—on the right hand side, change the probability of Y. For the last time, this is not a bug, it is a feature.

I’ve skipped over lots of detail, because regular readers have seen it all before, and because it’s in *Uncertainty* and in the other posts of this on-line class (of which this post is a member).

**Practicum**

One of the big reasons predictive methods have not taken off is the lack of free and easy-to-use out-of-the-box software. There is tons upon metric tons of software for classical parameter- and hypothesis-testing methods. But those methods are dead to us. We do *not* care about parameters and we most certainly do not care for hypothesis tests. We care *only* about things like eq. (2) (and 3).

The best solution would be to build this software from scratch, which requires time and money, and Yours Truly has neither in any abundance (I am wholly independent; no students, no grants, no affiliations of any kind). The next best is to build upon what’s already out there. This has the advantages of speed and familiarity, but the disadvantage of being stuck in a pre-determined format with all its inherited limitations.

We’re tried other things in the past, but I’ve settled—as a start—on extending MCMCpack. With full acknowledgement of the severe limitations of simulation methods). Simulations are not magic, nor do they “have” probability. Nothing “has” probability. There is no such thing as “drawing” from a “random” distribution. Read the links: simulations are only reasonable *approximations* to difficult numerical integrations. That’s is, and nothing more.

MCMCpack model routines return approximations of posterior distributions of parameters of models. We have zero interest in parameters *per se*, since they are only part of model innards. Concentrating on parameters as if they were probabilities of observables (like Y) has led to vast, vast, incalculable over-certainties. All that is also in *Uncertainty*.

We can use the posteriors, though, in approximating (2), and having knowledge of the model forms. The idea is to take the approximated parameter posteriors, and feed them into a second simulation-approximation to get (2). Simple as that.

MCMCpack allows all kinds of priors to be input, and lots of other abilities to form models. Here is just one of many examples—recalling, of course, that I do not endorse their philosophical interpretation. Here for example is MCMCregress. Look at everything you can set!

Usage:

```
```

` MCMCregress(formula, data = NULL, burnin = 1000, mcmc = 10000, thin = 1,`

verbose = 0, seed = NA, beta.start = NA, b0 = 0, B0 = 0,

c0 = 0.001, d0 = 0.001, sigma.mu = NA, sigma.var = NA,

marginal.likelihood = c("none", "Laplace", "Chib95"), ...)

The formula is usually typed right into the method, but unfortunately, for many of its models, MCMCpack does not store the formula as part of its output. So sometimes we’ll have to create the formula outside the method so we can store it. That’s easy in R, e.g.

form = formula('y ~ x1 + x2 + x3 + x4 * x5')

That simple but unfortunate notation, incidentally is why many believe probability models say something about cause. After all, does not the formula say y *is* x1 plus x2 etc.? When, of course, in regression, the real formula is “central parameter of approximate normal distribution of y = x1 + x2 etc.” Normal distributions are always approximations, as are any continuous-based parameterized models. As long as we keep in mind the sometimes severe limitations this imposes, this is not a problem. Many (most?) don’t keep it in mind, however, and slip into committing the Deadly Sin of Reification, which is to assume the model is reality.

The strategy of extension is to take the output of MCMCmodel and feed it into a new method MCMCmodel.pred, for prediction. Because MCMCpack is not consistent in what it passes to its ouput, we sometimes have to track things like the model formula, the limits of y, and so on. This isn’t hard to do, but it makes the software less automatic. So ideally a wrapper would be put around each MCMCmodel that handles this. I did NOT yet write such wrappers. In fact, all the predictions methods thus far have zero error or bounds checking, nor sanity checks of any kind at all. That level of coding takes the most time (to my mind) and is the least fun. So you will have to be careful what you pass into the prediction methods. This, again, is a usage limitation, which can be corrected in time.

What I’ll be introducing in further posts are prediction methods for, so far:

- MCMCregress, ordinary regression
- MCMClogit, ordinary logistic regression
- MCMCmnl, multinomial (bucket) regression
- MCMCpoisson, Poisson regression
- MCMCtobit, regular regression with censoring on y.

This represents a big chunk of routine procedures, and is therefore enough of a “package” for people to get started.

Regular readers will know that multinomial regression is shockingly underused. Why? Because, I say, each analysis should be related to the decisions that will be made with the model. If no decisions are going to be made, there is no reason to do the model! Two people can have three different (or more!) decisions to be made of the same model structure, so that it should come as no surprise that they, using the same data, can come to *different probability conclusions* and *different decisions*.

*All decisions, like all probability, are conditional*. Since all decisions are finite and discrete, therefore so too should be the models. No y is continuous, each falls naturally into buckets delimited by a measurement process and the decisions to be made. And the multinomial does that for us (to reasonable approximation). Well, we’ll come to that when the multinomial method is introduced.

Consider this software to be version 0.1. It is strictly in the testing stage. At this point, it has very little versatility. It works, though; or, it works as near as I can tell. I am even more famous for typos in my code than in my writing, so that if there are bugs or misconceptions, do not be surprised.

Next time , we do MCMCregress.pred!

LEGAL DISCLAIMER: using this software might break your computer irrecoverably, cause black holes to form in your toilet when flushed, and encourage your children to become lawyers. Do not say you weren’t warned.

**Update** Download the code: `mcmc.pred.R`

, `mcmc.pred.examples.R`

. **Update 2** If you downloaded before, download again. This is version 0.2!