Skip to content

Category: Class – Applied Statistics

August 13, 2018 | 6 Comments

How To Do Predictive Statistics: Part V New (Free) Software Multinomial Regression

Previous post in the series (or click above on Class). REVIEW!

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.22! Only the example code changed since last time.

For an explanation of the theory behind all this, which is markedly different than classical views, get this book: Uncertainty.

Mandatory rant

We’ll use another built-in dataset, the Netherlands voting data. Accessed like this:

x = Nethvote

As before, assigning it to x is for simplicity sake. Find out all about the data with ?Nethvote. Essentially, voters could pick one of several parties. And, supposing those parties last until the NEXT election, and given some demographic information, we want the probability that

     Pr(Y | new X, old X&Y, Model & Assumptions)

Which—surprise!—is identical to the same probability we want in every predictive model! The emphasis on the NEXT election cannot be stressed too highly. Why? Glad you asked. Let me shout the answer:

There is NEVER a need to model what happened, only what might happen.

We do not need a probability model to tell us what we saw. We need only use our eyes. If we want to know if more religious people voted for vote (i.e. party) = CDA WE JUST LOOK. There is no need to do a “hypothesis test”, which is insane. Either more religious would have voted for CDA, or they wouldn’t have. AND THAT IT IS.

The classical idea, frequentist or Bayes, p-value of Bayes factor, of ascertaining whether more religious “really” voted more frequently for CDA is nuts. About the future? Well, that’s what model is for. To quantify the probability more religious will vote CDA accepting as an assumption religion is probative. It is our decision whether we choose religion as probative or not; two people looking at the same data, and even same model, can come to different conclusions.

I shout, because these ideas are central to the (old, ancient) predictive approach. They are foreign to the hypothesis testing classical methods, which aim to uncover occult forces in data. We will have none of that. Our concern is only observables and measures.

On to the data!

The real beginning

Because of a limitation (out of my control) of MCMCmnl, we have to keep track of the model formula. So we call the method a little differently than when we did ordinary or logistic regression.

form = formula('vote ~relig + class + income + educ + age * urban')

lv = levels(x[, as.character(form[[2]]) ])

fit = MCMCmnl(form, mcmc.method="IndMH", B0=0, mcmc=5000, thin=10, tune=0.5, baseline='D66', data=x)

Notice form is a standard R formula. This one was chosen to match the one in the native help function. Fool around with others. The object lv holds the levels of the “y” observable. It’s written in a generic way, so that it works with any data set. We could have, of course, just wrote lv = levels(x$vote), but that works only on data frames with vote as an outcome. Notice, too, that we can change the baseline. We don’t have to: it will default to the normal R base level. We keep track of the levels because you’re allowed to change them, and MCMCmnl doesn’t save the model formula. Ah, well.

Predictions are somewhat different than before, too. We have pass in the model formula and levels of the y. We also need, as ever and as core of the predictive method, a scenario. How about this one? Mixing code and output, and ignoring the ‘dist’ measures, which we don’t use.


vote distD66 distPvdA distVVD distCDA relig class income educ age urban
PvdA 2.669695 2.335121 4.109881 6.45008 0 0 1 2 5 1


p = MCMCmnl.pred(fit,form,x[1,],lv)

I get

> p
     D66      CDA     PvdA      VVD 
0.076580 0.067476 0.822900 0.033044 

So, given non-region, class of 0, and so on, the probability a NEW voter will go D66 is about 8%. Your results will vary a bit, since as ever this is a numerical approximation. But they’ll be close. The most likely vote will be cast at 82% is for PvdA for NEW voters of this sort, and the least likely is VVD at 3%. I don’t know Dutch politics, so I offer no opinions on what this means.

The idea, if it isn’t clear, is that you get a probability for each possible category, because why? Because that’s what we wanted!

The form and lv ensure everything is labeled correctly at the end. Pain in the keister. But as yet there are no wrappers for any of these methods to make things easier.

How about all the scenarios in the data? You bet:

p = MCMCmnl.pred(fit,form,x[1,],lv)
for(i in 1:nrow(x)){
  # this preserves the proper names for p's columns
  if(i>1) p=rbind(p,MCMCmnl.pred(fit,form,x[i,],lv))
p =, row.names=FALSE)

for (i in 1:4){
    plot(x$class,p[,i],main=names(p)[i], ylab='Pr(Vote|D,M)',col=x$relig+1)

Notice we stacked the answers one on top of the other, and turned p into a data.frame. The plot is for each category or level of vote, as a function of class (which really does have all those odd values; probably the output of some other model). For fun, I colored the points by religion yes/no.

This is only one possible plot of many. Other obvious ones will suggest themselves to you. Do them as homework.

Everything is more complex because the model itself is more complex. There isn’t any real or general way to make this easy, either. Nor should there be!

“But, Briggs, can’t I do an average probability for each class level, using all the old scenarios? That way I can tell the impact of ”

Sure you can. But why would say impact when you meant influence? Second, it would be fooling yourself. Because your model included all those other things, you have to state probability only with regard to and conditional on all those other things. Otherwise you’re talking weird.

If you want to discuss only class, then build a model with only class.

form = formula('vote ~ class')
lv = levels(x[, as.character(form[[2]]) ])
fit = MCMCmnl(form, mcmc.method="IndMH", B0=0, mcmc=5000, thin=10, tune=0.5, baseline='D66',data=x)

Then you can say what you want about class considered only by itself. Or whatever.

The key lesson is that you specified a model with all those measures, so you can only speak of the model with all those measures. If you don’t want to speak of them, remain model-silent of them.

Mini rant

We are done with multinomial. But not really. It should be used in place of ordinary regression almost always. Why? Because all measures are discrete and finite, thus all Y are, thus all Y are better approximated by multinomials. Now, all Y are approximated by continuity, which is an ENORMOUS assumption, and untrue. No measure can be continuous, and none infinite in actuality.

All data should be transformed into the units of decision. We talked about this before with regard to CGPA data. If you are a dean only interested in counting numbers of students scoring 3 or hihger in CGPA (or whatever), then you have naturally created an analysis were the Y is dichotomous. Or maybe you want 3 or above, which naturally implies under 3s are of interest, and then 4s (to be given special recognition, say). Then we have a trichotom. Multinomial can handle this, ordinary regression cannot.

Two people can have the same data and come to different conclusions about it, as happens all the time in real life. People have different decisions to make, and different consequences to face about those decisions. Therefore, every analysis, i.e. model, should be tailored to the decision at hand. Since every decision, like every measure, is discrete and finite in act, then so should by every model.

“But Briggs, if I quash the data into buckets like you say, then I lose information. I won’t know the difference, in this case, between a CGPA of 2.876543 and 2.876544. I’m losing power or whatever. Besides, I’ve heard discretizing data is bad.”

You heard wrong. I remind you that there is no difference between 2.876543 and 2.876544—not one bit! nor between 0 and 2, or 0 and 2.9—when any decision you make recognizes no difference between these CGPAs! If you are going to make different decisions, then you will have different buckets, and thus a different model, and different results.

This is not a bug, it is a feature. Just like the conditionality of all probability.

Next is Poisson regression.

August 8, 2018 | 2 Comments

How To Do Predictive Statistics: Part IV New (Free) Software Logistic Regression

Previous post in the series (or click above on Class).

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.21! Only the example code changed since last time.

For an explanation of the theory behind all this, which is markedly different than classical views, get this book: Uncertainty.

We’ve done ordinary regression to death. So it’s time we move to unordinary but still humble regression, i.e. logistic regression, which, like all models, aims for this:

     Pr(Y | new X, old X&Y, Model & assumptions)

where in this case Y is a yes/no, up/down, 0/1 proposition. Like whether Y = “A baby is born with (what doctors say is) low birth weight.”

R has a built-in dataset to play with, invoked with

x$race = as.factor(x$race)

The first line brings it up, the second assigns it to x only for the sake of consistency of all the other code we have been writing, and the last turns the numbers 1, 2, 3 into a factor. You can read all about the dataset by typing ?birthwt.

As an ever-important reminder, the model we build will not be causal. It will be probabilistic. We are not saying, for instance, smoking causes low birth weight, but we do aim to check the conditional probability of low birth weight given a mother smokes or not. Conditional on, as the equation above says, the old data, new guesses of the data, the model itself, and all the other assumptions that went into the mix. Assuming all that is true, we get the probability.

Don’t forget to re-source the code, if you don’t have it in memory (with relevant path, if necessary).


Fit the classic parameterized model:

fit = MCMClogit(low~age+race+smoke, data=x)

Why just these measures and not the others also included? No reason at all, except that the help for MCMClogit had just these. By all means, try out the others!

We next form a scenario, the “new X”, so that we can calculate the probability. A minor limitation is that the scenario has to be an R data.frame matching the input measures. A limitation is NOT that it has to have all input measures. You (in this case I) said all those measures must be specified to get the probability, so every time you want the probability, you have to specify values of all measures.

Perplexed which scenario to try, since few of us are low birth weight experts? The old data can be used as if it were new. For instance (mixing code and output):

w = x[1,]
low age lwt race smoke ptl ht ui ftv bwt
85 0 19 182 2 0 0 0 1 0 2523

Ignore the “85”, it is a line number produced by whoever sorted the original data so that all low birth weights came first. It is assigned to w because of the limitation I mentioned of passing in the kind of data.frame, which includes knowledge of all factor levels of the relevant measures. We could also have passed in x[1,]. If we use this scenario we want to calculate

     Pr(low | age = 19, race=2, smoke=0, D, M & A)

where D = the old data. The answer is (drumroll please):

p = MCMClogit.pred(fit,w)

I get 0.34. Since this is an approximation to a complicated integral, your answer might differ slightly.

What if we wanted the model-old-data-conditional probability of low birth weight given a 19-year-old mother of race 2 but who smoked? This:

w$smoke = 1
p = MCMClogit.pred(fit,w)

I get 0.61, about double. Yours may differ slightly. So, given everything we punched into the model (the priors etc. included), the chance a 19-y-o race = 2 woman smokes doubles, in our estimation, of having a low birth weight baby. Any individual woman like that either will or won’t have a low birth weight baby, a state caused by various things. We are not measuring cause, only probability.

What about 20-y-o’s? 21-y-o’s? Higher? What about race = 1 and race = 3? The only way to find out is to do the calculations.

Again we can create lots of scenarios using the old data, which has the virtue of containing scenarios actually found in nature. But it has the disadvantage of only having scenarios actually measured. But we can use it to get some idea of what to expect. For instance:

# all the old x
p = NA
for(i in 1:nrow(x)){
  p[i] = MCMClogit.pred(fit,x[i,])

plot(x$age,p,col=x$race,ylab='Pr(low wt|old data,M)', pch=x$smoke+1)
legend('topright',c('r1','r2','r3','s0','s1'), col=c(1,2,3,1,1), pch = c(3,3,3,1,2), bty='n')

The legend is a hack because of author laziness, but you get the idea. If we wanted pretty pictures we’d be using ggplot2, but then I’d have to spend too much time explaining that code, and we’d miss the important bits.

The idea is that each old measure was taken as a new scenario. The model-etc.-conditional probabilities were estimated, and the result plotted by the measures.

This is the model-estimated-probability! That’s why everything looks so smooth. It says, more or less, that race = 1 non-smokers have the lowest probability, race = 1 smokers and race = 2, 3 non-smokers are the same in probability, and race = 2,3 smokers have the highest probability. And that regardless of race or smoking older women have smaller chances.

For people who are like the gathered data.

About those people I know nothing except what is printed in the help file. Whether these women are like women all over the world in all countries in all time periods of all economic and spiritual backgrounds and so on and so forth, I have no clue. But I doubt it.

Given all those necessary and hard-to-over-emphasize caveats, it is well to recall these are the probabilities of low birth weight babies given whatever conditions we specify. For individual mothers. What if you were a hospital administrator making a guess of the number of low birth weight babies in the coming year, these (perhaps) costing more for whatever reasons? Then you’d have to have another layer of modeling on top of our simple one, which includes patient-type scenarios. How many race = 1, 2, 3 and how many smokers, what ages. And how many patients!

We could do that easily for single-measure patients. Like when w = x[1,]. Then the model is binomial-like. After specifying the number of patients. If you don’t know that, you have to model it, too, as just mentioned.

I’ll leave that for homework. Meanwhile, we are done with logistic regression. It’s as easy as it looks. Next time we start multinomial regression!

July 26, 2018 | 2 Comments

How To Do Predictive Statistics: Part III New (Free) Software Regression 2

Mandatory! Read Part I, Part II. I will ignore all comments already answered in Parts I & II.

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.2!

Warning All the results were produced by an earlier version of the code, where one of the parameters to the normal was off. The code below is correct, but you will get (slightly) different results. I am on the road and haven’t time to redo pictures below. They are not far off, though.


Relevancy is what we’ll use to help deciding what information to include in our model. Relevancy is conditional, like all probability is conditional. A piece of evidence is relevant if it changes the probability of Y, given all the other pieces of evidence. Otherwise it is irrelevant.

     If Pr(Y|XW) ≠ Pr(Y|X), then W is relevant given X.

     If Pr(Y|XW) = Pr(Y|X), then W is irrelevant given X.

It could be that W is irrelevant given X, but relevant given Z, i.e. a different piece of evidence (that is not X). The kind of evidence W is does not matter.

If Pr(Y|new X, old X&Y_i, Model) ≠ Pr(Y|new X, old X&Y_i+1, Model), then the additional old data point is relevant given X and the Model; if the probabilities are equal, then the new data point is irrelevant. Change in probability with the addition of new data points is thus an interesting measure of interestingness of new data.

Relevancy is not the same as structural model change. For example Pr(Y|new X, old X&Y, Model_prior_1) ≠ Pr(Y|new X, old X&Y_i+1, Model_prior_2), i.e. everything the same except for two different priors, then we don’t say the prior is relevant. Obviously the model, of which is part of the model with all of its assumptions, change the probability of Y.

Now we often write Y = “y in s” when y is numeric. Then it could be that

     If Pr(y in s|XW) = Pr(Y in s|X), for some s and

     If Pr(y in s|XW) ≠ Pr(Y in s|X), for other s.

An s_1 interesting to one decision maker can come to a different conclusion of relevancy than a second decision maker who is keen on s_2.

This, like probability, is not a bug, it is a feature!

Relevancy, also like probability, is not decision. It could be

     If Pr(Y|XW) = Pr(Y|X) + epsilon,

where adding W changes the probability of Y (or y in s) only a tiny bit. W is then relevant as long as epsilon > 0. It is not—as in NOT—the statistician’s job to say which threshold of epsilon causes W to become useful. Usefulness is up to the decision maker. Let there be no standard threshold! We see the havoc wrought by insisting on the magic number of 0.05 for all classical analyses. Let us not make the same mistake.

Decisions do, however, have to be made. Is epsilon so small that it makes no difference to a decision that will be made using the model? If so, we leave W out; if not, we keep W. We often have causal knowledge, or something approaching it, about W. If we assume—and this therefore makes the assumption part of the right hand side of (2); ALL assumptions are part of the r.h.s. of (2)—W cannot possibly lie in the causal path of Y, then we judge W irrelevant regardless of any changes it makes to the probability of Y. A handful of thrown rice on the floor may resemble the face of President Taft, but since we know there is no causal mechanism in the rice that could do that, we judge rice to be irrelevant in judging probabilities of faces.


Recall eq. (2): Pr(Y|new X, old X&Y, Model and assumptions). We need to supply for our model supposes or surmises of what new values of HGPA and SAT will be. We have a some idea of what these will be in the old data. We can thus use the old data to make a stab, though a weak one, at gauging how good the model is. Predictive statistics does not solve the problem of data reuse. Fooling ourselves is, as ever, a real possibility.

One strategy, then, is to step through the old data and pretend it will look like new values, calculate the probability in (2), and then make various plots, calculations and such forth. Like this:

# all the old x
q = matrix(0,nrow(x),3)
for(i in 1:nrow(x)){

A limitation of R is that there is no mechanism for stepping through a data.frame while maintaining the class of each variable. The function apply coerces the data.frame into a matrix, and turns everything to characters (if any column is a factor). The others like lappy, mapply, Vectorize and so on turn the result into unworkable lists. I have searched, but have had no success, in discovering if anybody solved this problem. So we’re stuck with a loop; at least, for now.

This bit of code calculates the approximate quantiles for every old value of HGPA and SAT. There is nothing especially important in the values 0.1 and 0.9; but, really, calculating 95% intervals is rather severe. Do we need that much certainty in the results? Choose or add whichever you like, but be sure not to forget the definition of q.

A KEY NECESSARY MUST-HAVE notion is that since our model specified the probability of Y given HGPA and SAT, we must ALWAYS specify a value of HGPA and SAT when calculating (2). There is NO notion (besides relevancy) of studying HGPA in isolation of SAT, nor vice-versa. This applies to your models, too. If you have x1, x2, …, xp, each and every calculation, from now until forever, must specify values of x1, x2, …, xp.

There is NO notion of any variable in (complete) isolation of the others. This is an impossibility! If you don’t want to consider a measurement, then don’t put it in the model. This represents another major departure from the old ways of doing things.

What can we do with the q?

for(i in 1:nrow(x)){

Do the same for x$sat. Looks better in ggplot2, but since that code can be distracting, we’ll stick to simple plots whenever we can.

As I just stressed, I hope sufficiently, each hgpa has a corresponding sat, but you don’t see sat plotted; only hgpa. So we only have a slice of an idea what is going on. You want to see both at the same time, in all their glory? We can, for 2D data, use 3D plots. But in general it’s tough cookies, mister. There is no general solution.

One of the key problems with classical analysis was that it made model analysis seem so easy. Get a wee p? Success! Bah, humbug. Real analysis, predictive analysis, is hard work. In predictive analysis the bigger the model gets, the harder the work becomes because complexity increases fast. This is, or rather should be, expected. But in classical analysis, it’s wee ps no matter how many terms in the model, each investigated in isolation (or mostly). It’s no wonder so many mistakes were made. Over-certainty was guaranteed.

What about, say, probabilities of CGPA > 3? How about this? You’ll need to install the plot3D package if you don’t have it.


q = NA
g = 3
for(i in 1:nrow(x)){
  p = MCMCregress.pred(fit,x[i,])
  q[i]= sum(p>=g)/length(p)

# try these, too, for fun
#plot(x$sat,q,ylab='Pr(CGPA>g|old data,M)')
#plot(x$hgpa,q,ylab='Pr(CGPA>g|old data,M)')

scatter3D(x$sat, x$hgpa, q, xlab='SAT',ylab='HGPA', zlab=paste0('Pr(CGPA>',g,'|D,M)'), phi = 0, bty ="g", ticktype = "detailed")

Contour (flat) plots can also be used, and perhaps are superior here. Though this 3D plot shows the dramatic rise in probability for high values of both measurements.

It is obvious both HGPA and SAT are relevant, at least for most values of “y in s”, and that most decision makers would judge the change in probabilities as useful. I did not do a search of all s, but you can. Instead, let’s add recomm to the model and see what it does. We already calculated probability greater than 3 for the model, stored in object q, without recomm. Now let’s do it again with recomm.

fit.r = MCMCregress(cgpa~hgpa*sat+recomm,data=x)
q.r = NA
for(i in 1:nrow(x)){
  p = MCMCregress.pred(fit,x[i,])
  q.r[i]= sum(p>=g)/length(p)

plot(x$recomm, q-q.r, ylim=c(-0.05,0.05))


Since I, the Lord & Master of this site, and the decision maker, it was up to me to choose a level or threshold above which I judged recomm useful. I picked 0.05. That’s why I set the limits in the plot that way. Any dot sticking up above +/- 0.05 proves recomm is useful. To me. Maybe not to you.

As it is, the maximum (in absolute value) change in probability was 0.0026950. The median change was 0.0000925. Recommendation is relevant because the probability changes are non zero, and because I can’t imagine it would be non-causal (in the causal path). But a change in probability of 0.0001 isn’t even peanuts. This analysis only applies to a g = 3, i.e. “y in s” equals “y > 3”. Different s may lead to different conclusions. Your homework is to check.

Recall we included the multiplicative interaction between HGPA and SAT. Is that relevant? Is it useful? We now know how to check. So do so.

One word of caution. We have been using old data, on the reasonable theory that new values of HGPA etc. will look like the old. But the old data does not fill in all the gaps of possible values of HGPA, SAT, and Recommendation. This is no trouble. We can create our own data set—and we even must do so! That was the point of the analysis. We wanted this model to give uncertainty in Y (CGPA) given X (values of HGPA and SAT). So the analysis is not concluded until we make those predictions.

We already know how, too. It’s as simple as passing in a new data.frame with the values of HGPA and SAT in which we are interested.

data.frame(cgpa=0, hgpa = 4, sat = 1200)

Remmeber, at least for now, we have to pass in the Y with some value, but that it is ignored.

Enough already! I started off sickened by the CGPA data, and now I loathe it and never want to see it again. Try the software on new data, and compare the results of the predictive method to classical hypothesis testing. You will discover (a) sometimes the decisions are similar, sometimes they are wildly dissimilar, and (b) the predictive results are less certain, and (c) the predictive analysis is much harder.

Next time we start logistic regression with the low birth weight data.

July 24, 2018 | 4 Comments

How To Do Predictive Statistics: Part II New (Free) Software Regression 1

Mandatory! Read Part I. I will ignore all comments already answered in Part I.

Download the code: mcmc.pred.R, mcmc.pred.examples.R. Update If you downloaded before, download again. This is version 0.2!

You must have installed MCMpack. Get it like this:

install.packages('MCMCpack', dependencies=TRUE)

Once installed, every time you start R (if you didn’t save your session), you must load it:


We’re going to need some data, and though we’re sick of it (I am, anyway), we’ll use the college grade point data, which is the CGPA at the end of the first year for 100 or so kids, along with their SAT (old style) scores, high school GPA, and letter of recommendation strength. We want to peg the probability of certain CGPAs given this information and on the ad hoc assumption we can model uncertainty of CGPA with normal distributions. That will turn out to be, in many cases, a lousy approximation, because CGPA can only live on [0,4], but normals give probability to everything, resulting, as we’ll see, is massive probability leakage. We’ll later fix this problem using multinomial and tobit regression.

I expect everybody reading this will redo the examples on their own favorite datasets. Do so, and report back here.

Read the data in and start the model, using all defaults. You are responsible for figuring out how to use MCMCpack on your own. All we need know is that we’re after this:

     (1) Pr ( CGPA in s | new HGPA & SAT, old data, Model & assumptions),

where s is a set of values of CGPA of interest. Such as “greater than 3” or “equal to 4”, or whatever, and new or assumed values of HGPA and SAT. I don’t care immediately about letters of recommendation.


x <- read.csv("") fit = MCMCregress(cgpa~hgpa*sat,data=x)

Notice we went for the exciting interaction term! Next download and source this code MCMC.pred.R:

# PC
path = 'C:/MyDrive/MyTopFolder/MySubFolder/'
# Linux/MAC
path = '/home/MyName/MyTopFolder/MySubFolder/'


You can use ls() to see all the new code. The bit we will use is:

# Version 0.1
MCMCregress.pred <- function(fit,x){
   # fit from MCMCregress
   # x data.frame in same form as data entered into MCMCregress
   n = length(fit[1,])
   r = as.numeric(y %*% t(fit[,-n]))
   b = rnorm(length(r)*20,r,sqrt(fit[,n]))

This takes the model object fit from MCMCregress, and single new data in the form of a data.frame, and gives us eq. (1). We'll do multiple values later. The structure of the model, at least for MCMCregress, includes the formula, so we don't have to pass it. The row of fit consists in estimates of the model parameters, the last one of which is the spread (or "standard deviation") parameter of the normal distribution.

We grab the model form in the next two lines, which is why we (so far) pass in a data.frame, and then calculate the estimates of the central (or "mean") parameter of the normal distribution of CGPA. We then feed these "mu"s and "sigma"s into the last part of the simulation, which gives us estimates of (1). Estimates of the observable, that is, and not any parameter. We pass all these out.

Notice the "20" in the penultimate line, which is the number of repetitions: notice it is hard coded: notice that this is bad practice: notice how little you are paying. This will be fixed in future versions when the wrappers are written. What's that? What did I mean by wrappers? Ah, so you didn't read Part I. Ah, well.

Update: due to my boneheaded coding error, I used the variance and not standard deviation in the normal simulation. I fixed above and in on-line code, but the results below I did NOT fix yet. I am on the road and will get to it. The code below is right, but the results off.

Let's try:

# Replace x[6,] with any new data that has SAME variables as in model
# the outcome (here cgpa) can be ANY value and is ignored, but must be there. E.g.
# newdata = data.frame(hgpa = 1, sat = 600, recomm = 1, cgpa = 4)
p = MCMCregress.pred(fit,x[6,])

I picked the sixth observation of x to pass in because this is the first number my finger hit. The old data happens to have the proper structure, which the notes say is mandatory. A limitation is that you have to pass in a value of the observable, too, even though it's ignored. That's because the model matrix needs it. That can be fixed in time.

Turns out, for x[6,], hgpa = 0.33, sat = 756, so that p contains the approximation of

     (2) Pr ( CGPA in s | HGPA = 0.33 & SAT = 756, D,M),

where we haven't yet defined s.


# single newdata analysis

I got

> hist(p,200)
> quantile(p,c(0.05,.5,.95))
5% 50% 95%
0.1673601 0.8802050 1.5948456
> mean(p)
[1] 0.8807344

Your results will vary, because this is a simulation approximation. But they shouldn't vary much. If they do, we could boost that "20", or boost mcmc in MCMCregress, or both.

Notice the picture shows modest probability leakage, i.e. positive probability of impossible values, in this case less than 0. Vary hgpa and sat and see if you can find more.

The quantiles show that, for instance, the probability of seeing NEW CGPAs greater than 1.59 are 5%, given these assumed values of HGPA, SAT, and the old data, model and assumptions. So, in this case, s is "greater than 1.59".

Did I mention this was the (conditional) probability of seeing NEW values of CGPA? We do not need models to tell us what happened! If we want to know how many OLD values of CGPA were greater than 1.59, or whatever, we just look. Models are only for telling us about the unknown, not the known.

I cannot stress too highly that this 5% is not "the" probability. There is never, not ever, a case where there is "the" probability. This is the conditional probability as laid out in (2). This true understanding is the biggest departure between the old and predictive ways of modeling. For why it is true, see Uncertainty.

Change anything on the right hand side, and the probability changes. One of the things on the right hand side is the simulation assumptions (and prior and all other model assumptions). That's why, when you run a new simulation, you change the right hand side, and thus you change the probability!

The changes might not be very big---but they are always there (until the trump of doom sounds at Infinity); at least, as long as what is changing is probative to Y. If information on the right hand side is irrelevant to Y, then adding or removing it does nothing. This, after all, is the definition of irrelevancy.

We will use that notion of relevancy next time when we explore what SAT and HGPA, and even letters of recommendation strength do to the model.