Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

August 13, 2018 | 6 Comments

How To Do Predictive Statistics: Part V New (Free) Software Multinomial Regression

Previous post in the series (or click above on Class). REVIEW!

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.22! Only the example code changed since last time.

For an explanation of the theory behind all this, which is markedly different than classical views, get this book: Uncertainty.

Mandatory rant

We’ll use another built-in dataset, the Netherlands voting data. Accessed like this:

x = Nethvote

As before, assigning it to x is for simplicity sake. Find out all about the data with ?Nethvote. Essentially, voters could pick one of several parties. And, supposing those parties last until the NEXT election, and given some demographic information, we want the probability that

     Pr(Y | new X, old X&Y, Model & Assumptions)

Which—surprise!—is identical to the same probability we want in every predictive model! The emphasis on the NEXT election cannot be stressed too highly. Why? Glad you asked. Let me shout the answer:

There is NEVER a need to model what happened, only what might happen.

We do not need a probability model to tell us what we saw. We need only use our eyes. If we want to know if more religious people voted for vote (i.e. party) = CDA WE JUST LOOK. There is no need to do a “hypothesis test”, which is insane. Either more religious would have voted for CDA, or they wouldn’t have. AND THAT IT IS.

The classical idea, frequentist or Bayes, p-value of Bayes factor, of ascertaining whether more religious “really” voted more frequently for CDA is nuts. About the future? Well, that’s what model is for. To quantify the probability more religious will vote CDA accepting as an assumption religion is probative. It is our decision whether we choose religion as probative or not; two people looking at the same data, and even same model, can come to different conclusions.

I shout, because these ideas are central to the (old, ancient) predictive approach. They are foreign to the hypothesis testing classical methods, which aim to uncover occult forces in data. We will have none of that. Our concern is only observables and measures.

On to the data!

The real beginning

Because of a limitation (out of my control) of MCMCmnl, we have to keep track of the model formula. So we call the method a little differently than when we did ordinary or logistic regression.

form = formula('vote ~relig + class + income + educ + age * urban')

lv = levels(x[, as.character(form[[2]]) ])

fit = MCMCmnl(form, mcmc.method="IndMH", B0=0, mcmc=5000, thin=10, tune=0.5, baseline='D66', data=x)

Notice form is a standard R formula. This one was chosen to match the one in the native help function. Fool around with others. The object lv holds the levels of the “y” observable. It’s written in a generic way, so that it works with any data set. We could have, of course, just wrote lv = levels(x$vote), but that works only on data frames with vote as an outcome. Notice, too, that we can change the baseline. We don’t have to: it will default to the normal R base level. We keep track of the levels because you’re allowed to change them, and MCMCmnl doesn’t save the model formula. Ah, well.

Predictions are somewhat different than before, too. We have pass in the model formula and levels of the y. We also need, as ever and as core of the predictive method, a scenario. How about this one? Mixing code and output, and ignoring the ‘dist’ measures, which we don’t use.


vote distD66 distPvdA distVVD distCDA relig class income educ age urban
PvdA 2.669695 2.335121 4.109881 6.45008 0 0 1 2 5 1


p = MCMCmnl.pred(fit,form,x[1,],lv)

I get

> p
     D66      CDA     PvdA      VVD 
0.076580 0.067476 0.822900 0.033044 

So, given non-region, class of 0, and so on, the probability a NEW voter will go D66 is about 8%. Your results will vary a bit, since as ever this is a numerical approximation. But they’ll be close. The most likely vote will be cast at 82% is for PvdA for NEW voters of this sort, and the least likely is VVD at 3%. I don’t know Dutch politics, so I offer no opinions on what this means.

The idea, if it isn’t clear, is that you get a probability for each possible category, because why? Because that’s what we wanted!

The form and lv ensure everything is labeled correctly at the end. Pain in the keister. But as yet there are no wrappers for any of these methods to make things easier.

How about all the scenarios in the data? You bet:

p = MCMCmnl.pred(fit,form,x[1,],lv)
for(i in 1:nrow(x)){
  # this preserves the proper names for p's columns
  if(i>1) p=rbind(p,MCMCmnl.pred(fit,form,x[i,],lv))
p =, row.names=FALSE)

for (i in 1:4){
    plot(x$class,p[,i],main=names(p)[i], ylab='Pr(Vote|D,M)',col=x$relig+1)

Notice we stacked the answers one on top of the other, and turned p into a data.frame. The plot is for each category or level of vote, as a function of class (which really does have all those odd values; probably the output of some other model). For fun, I colored the points by religion yes/no.

This is only one possible plot of many. Other obvious ones will suggest themselves to you. Do them as homework.

Everything is more complex because the model itself is more complex. There isn’t any real or general way to make this easy, either. Nor should there be!

“But, Briggs, can’t I do an average probability for each class level, using all the old scenarios? That way I can tell the impact of ”

Sure you can. But why would say impact when you meant influence? Second, it would be fooling yourself. Because your model included all those other things, you have to state probability only with regard to and conditional on all those other things. Otherwise you’re talking weird.

If you want to discuss only class, then build a model with only class.

form = formula('vote ~ class')
lv = levels(x[, as.character(form[[2]]) ])
fit = MCMCmnl(form, mcmc.method="IndMH", B0=0, mcmc=5000, thin=10, tune=0.5, baseline='D66',data=x)

Then you can say what you want about class considered only by itself. Or whatever.

The key lesson is that you specified a model with all those measures, so you can only speak of the model with all those measures. If you don’t want to speak of them, remain model-silent of them.

Mini rant

We are done with multinomial. But not really. It should be used in place of ordinary regression almost always. Why? Because all measures are discrete and finite, thus all Y are, thus all Y are better approximated by multinomials. Now, all Y are approximated by continuity, which is an ENORMOUS assumption, and untrue. No measure can be continuous, and none infinite in actuality.

All data should be transformed into the units of decision. We talked about this before with regard to CGPA data. If you are a dean only interested in counting numbers of students scoring 3 or hihger in CGPA (or whatever), then you have naturally created an analysis were the Y is dichotomous. Or maybe you want 3 or above, which naturally implies under 3s are of interest, and then 4s (to be given special recognition, say). Then we have a trichotom. Multinomial can handle this, ordinary regression cannot.

Two people can have the same data and come to different conclusions about it, as happens all the time in real life. People have different decisions to make, and different consequences to face about those decisions. Therefore, every analysis, i.e. model, should be tailored to the decision at hand. Since every decision, like every measure, is discrete and finite in act, then so should by every model.

“But Briggs, if I quash the data into buckets like you say, then I lose information. I won’t know the difference, in this case, between a CGPA of 2.876543 and 2.876544. I’m losing power or whatever. Besides, I’ve heard discretizing data is bad.”

You heard wrong. I remind you that there is no difference between 2.876543 and 2.876544—not one bit! nor between 0 and 2, or 0 and 2.9—when any decision you make recognizes no difference between these CGPAs! If you are going to make different decisions, then you will have different buckets, and thus a different model, and different results.

This is not a bug, it is a feature. Just like the conditionality of all probability.

Next is Poisson regression.

August 8, 2018 | 2 Comments

How To Do Predictive Statistics: Part IV New (Free) Software Logistic Regression

Previous post in the series (or click above on Class).

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.21! Only the example code changed since last time.

For an explanation of the theory behind all this, which is markedly different than classical views, get this book: Uncertainty.

We’ve done ordinary regression to death. So it’s time we move to unordinary but still humble regression, i.e. logistic regression, which, like all models, aims for this:

     Pr(Y | new X, old X&Y, Model & assumptions)

where in this case Y is a yes/no, up/down, 0/1 proposition. Like whether Y = “A baby is born with (what doctors say is) low birth weight.”

R has a built-in dataset to play with, invoked with

x$race = as.factor(x$race)

The first line brings it up, the second assigns it to x only for the sake of consistency of all the other code we have been writing, and the last turns the numbers 1, 2, 3 into a factor. You can read all about the dataset by typing ?birthwt.

As an ever-important reminder, the model we build will not be causal. It will be probabilistic. We are not saying, for instance, smoking causes low birth weight, but we do aim to check the conditional probability of low birth weight given a mother smokes or not. Conditional on, as the equation above says, the old data, new guesses of the data, the model itself, and all the other assumptions that went into the mix. Assuming all that is true, we get the probability.

Don’t forget to re-source the code, if you don’t have it in memory (with relevant path, if necessary).


Fit the classic parameterized model:

fit = MCMClogit(low~age+race+smoke, data=x)

Why just these measures and not the others also included? No reason at all, except that the help for MCMClogit had just these. By all means, try out the others!

We next form a scenario, the “new X”, so that we can calculate the probability. A minor limitation is that the scenario has to be an R data.frame matching the input measures. A limitation is NOT that it has to have all input measures. You (in this case I) said all those measures must be specified to get the probability, so every time you want the probability, you have to specify values of all measures.

Perplexed which scenario to try, since few of us are low birth weight experts? The old data can be used as if it were new. For instance (mixing code and output):

w = x[1,]
low age lwt race smoke ptl ht ui ftv bwt
85 0 19 182 2 0 0 0 1 0 2523

Ignore the “85”, it is a line number produced by whoever sorted the original data so that all low birth weights came first. It is assigned to w because of the limitation I mentioned of passing in the kind of data.frame, which includes knowledge of all factor levels of the relevant measures. We could also have passed in x[1,]. If we use this scenario we want to calculate

     Pr(low | age = 19, race=2, smoke=0, D, M & A)

where D = the old data. The answer is (drumroll please):

p = MCMClogit.pred(fit,w)

I get 0.34. Since this is an approximation to a complicated integral, your answer might differ slightly.

What if we wanted the model-old-data-conditional probability of low birth weight given a 19-year-old mother of race 2 but who smoked? This:

w$smoke = 1
p = MCMClogit.pred(fit,w)

I get 0.61, about double. Yours may differ slightly. So, given everything we punched into the model (the priors etc. included), the chance a 19-y-o race = 2 woman smokes doubles, in our estimation, of having a low birth weight baby. Any individual woman like that either will or won’t have a low birth weight baby, a state caused by various things. We are not measuring cause, only probability.

What about 20-y-o’s? 21-y-o’s? Higher? What about race = 1 and race = 3? The only way to find out is to do the calculations.

Again we can create lots of scenarios using the old data, which has the virtue of containing scenarios actually found in nature. But it has the disadvantage of only having scenarios actually measured. But we can use it to get some idea of what to expect. For instance:

# all the old x
p = NA
for(i in 1:nrow(x)){
  p[i] = MCMClogit.pred(fit,x[i,])

plot(x$age,p,col=x$race,ylab='Pr(low wt|old data,M)', pch=x$smoke+1)
legend('topright',c('r1','r2','r3','s0','s1'), col=c(1,2,3,1,1), pch = c(3,3,3,1,2), bty='n')

The legend is a hack because of author laziness, but you get the idea. If we wanted pretty pictures we’d be using ggplot2, but then I’d have to spend too much time explaining that code, and we’d miss the important bits.

The idea is that each old measure was taken as a new scenario. The model-etc.-conditional probabilities were estimated, and the result plotted by the measures.

This is the model-estimated-probability! That’s why everything looks so smooth. It says, more or less, that race = 1 non-smokers have the lowest probability, race = 1 smokers and race = 2, 3 non-smokers are the same in probability, and race = 2,3 smokers have the highest probability. And that regardless of race or smoking older women have smaller chances.

For people who are like the gathered data.

About those people I know nothing except what is printed in the help file. Whether these women are like women all over the world in all countries in all time periods of all economic and spiritual backgrounds and so on and so forth, I have no clue. But I doubt it.

Given all those necessary and hard-to-over-emphasize caveats, it is well to recall these are the probabilities of low birth weight babies given whatever conditions we specify. For individual mothers. What if you were a hospital administrator making a guess of the number of low birth weight babies in the coming year, these (perhaps) costing more for whatever reasons? Then you’d have to have another layer of modeling on top of our simple one, which includes patient-type scenarios. How many race = 1, 2, 3 and how many smokers, what ages. And how many patients!

We could do that easily for single-measure patients. Like when w = x[1,]. Then the model is binomial-like. After specifying the number of patients. If you don’t know that, you have to model it, too, as just mentioned.

I’ll leave that for homework. Meanwhile, we are done with logistic regression. It’s as easy as it looks. Next time we start multinomial regression!

August 4, 2018 | 5 Comments

Insanity & Doom Update XLVI

A special all-reader-contribution edition of Doom!

Item WeWork will stop serving meat at company events in effort to reduce environmental impact (Thanks to Sheri for the link.)

The co-working startup WeWork says it will stop serving meat at company events in an effort to reduce its environmental impact.

The firm, which has shared workspace locations in 22 countries, informed employees in an email last week that in addition to not serving meat, it also will not reimburse workers for meals with meat.

We should encourage the company’s leaders to reduce their “impact” to zero—the true minimum. Barring that, we should encourage them to eat nothing but soy. Which eventually will have the same effect.

Item Why We Lie: The Science Behind Our Deceptive Ways (Thanks to Jim Fedako for the link.)

While parents often find their children’s lies troubling—for they signal the beginning of a loss of innocence—Kang Lee, a psychologist at the University of Toronto, sees the emergence of the behavior in toddlers as a reassuring sign that their cognitive growth is on track. What drives the increase in lying sophistication is the development of a child’s ability to put himself or herself in someone else’s shoes.

Known as theory of mind, this is the facility we acquire for understanding the beliefs, intentions, and knowledge of others. Also fundamental to lying is the brain’s executive function: the abilities required for planning, attention, and self-control. The two-year-olds who lied in Lee’s experiments performed better on tests of theory of mind and executive function than those who didn’t. Even at 16, kids who were proficient liars outperformed poor liars.

If this is so, then the best among us must be politicians, journalists, used car salesmen, and lawyers.

Item Children at ‘Commie Camp’ design dozens of Antifa flags, learn about ‘social justice’ (Thanks to Dave Legates for the link.)

Children attending a social justice-themed camp in Massachusetts — which has been dubbed the “Commie Camp” — designed their own Antifa flags this week, pictures posted to social media show.

Kids at Camp Kinderland in Tolland, Massachusetts, which bills itself as the “summer camp with a conscience,” designed nearly 50 Antifa flags, the Daily Caller News Foundation reported.

Flags are a good start. But parents should be advised the Advanced Gulag Training, Denouncing of Colleagues, and How To Form an Efficient Execution Squad hideaway at the end of the summer might lead to injury or to permanently missing children.

Item U. of Oklahoma Official Hired in Wake of Racist Fraternity Chant Says He Was Forced Out (Thanks to Kent Clizbe for the link.)

A vice president at the University of Oklahoma who says he was forced to resign after being accused of improperly using a state vehicle for personal reasons denied the charge on Thursday. The real reason Jabar Shumate contends he was forced out involved his opposition to a fraternity whose racist chant three years ago plunged the university into turmoil and led to the creation of his position…

On Wednesday, Shumate held a news conference with his lawyer, Lindsey Mulinix-Ewert, to say that the university was going to make false accusations against him to justify what he called his “high-tech lynching.” He had been given an ultimatum, he said: resign or be fired…

A report on the audit, released by the university in response to media inquiries on Thursday, found that Shumate had violated a state law that prohibits employees from making personal use of state vehicles, including parking overnight at their homes. The Tahoe was parked outside his home, in Norman, Okla., 124 times between July 2017 and March 2018, the audit found. He also used it for personal trips to Tulsa, where he formerly lived, and made false travel-expense claims, the report said.

But, hey, he might as well try screaming “Racism!” It nearly always works at universities. Since our goal here at is to see the universities purged so that they can be rebuilt, I hope Jabar Shumate gets away with it. And not only gets away with it, but forces the university to give him a free truck, lifetime gas included, so that he can patrol the State for “racists”.

July 31, 2018 | 11 Comments

Everything Is Already In The Simulation

Quick post since I am still away Up North. On Drudge was linked the article “Pandemic ‘could wipe out 900 million people,’ experts warn” in some tabloid or sensationalistic rag (New York Times?).

A chilling simulation has revealed just how easily a new pathogen could wipe out a huge slice of the world’s population — up to 900 million people.

Researchers at John Hopkins University simulated the spread of a new illness — a new type of parainfluenza, known as Clade X.

The simulation was designed so the pathogen wasn’t markedly more dangerous than real illnesses such as SARS — and illustrates the tightrope governments tread in responding to such illnesses.

Here’s the world’s simplest chilling simulation:

nbsp;nbsp;nbsp;nbsp; Input X —> Output X.

Now imagine you’re a scientist anxious to understand how millions will die. Input “Millions will die from splenetic fever” (i.e., a mind-fever produced by consuming too much news media). What’s the output? “Millions will die from splenetic fever.”

What’s the headline?

Artificial Intelligence Computer Model Predicts Millions To Die By Splenetic Fever!

Doubtless “climate change” would feature in the body of the breathless article.

You will say the example is silly, which it is. But it is no different in a fundamental sense from the linked article. There, there was an Input and Output, and an algorithm to turn the one into the other. (The algorithm was “—>”.)

The algorithm is designed by the scientist or “researcher.” It does what it is told to do. Always. The algorithm—any algorithm—was programmed on purpose to say “When you see X, say Y”, however complicated the steps in between from X to Y. This is so even if the algorithm uses “randomness” (see the full dope of the severe limitations of simulation methods).

Of course, some algorithms are so complicated that some people cannot see which combinations of X lead to which combinations of Y. So what? Some people can’t multiply two numbers without a calculator, but multiplication is no mystery. That X leads to Y is in any algorithm by design. It was put there!

If you want to cheat, or cheat yourself, the path is clear. Call X whatever you like, label the algorithm a “simulation” or “deep learning” or “artificial intelligence” or similar, and then express marvel at Y. Again, sometimes the path is not clear from X to Y, and the way the algorithm produces Y might teach you something about X. But since X is put there by you, and the algorithm does what you told it, it cannot be marvelous when it works as it should.

This, incidentally, is why there is not one whit of difference between a “simulation”, “forecast”, “prediction”, “prognostication”, “scenario”, or any of the other words that describe getting from X to Y. People who take refuge in a failed “scenario” by claiming the scenario wasn’t a forecast are fooling themselves. And possibly you, too.

There is no saying the Y has to be certain: it need only be probable with reference to X and the algorithm.

Anybody notice the similarities between any probability model, or mathematical model, or indeed any model at all? You should by now.

A simulation, prediction, etc., fails in two ways. X could be mismeasured or misspecified, and the algorithm is good. Mistakes happen. Or X could be fine and the algorithm stinks. Or both. Pros, like those behind the linked article above, rarely screw up X. But they love their algorithms too well. Algorithms can be right in saying Y from X, but wrong in why Y truly came about. Monkeys throwing darts can pick good stocks.

Of course, I am not saying there will not be a pandemic where a seventh of the population is wiped out. Nor am I claiming “a doomsday cult” won’t release a “a genetically engineered virus.” But if you’re writing a simulation that takes as input X = “Doomsday cult releases genetically engineered virus”, part of that algorithm that leads to Y = “Nearly a billion die” has to specify, by design, the kind of virus that would kill a billion in a manner that must be imagined by the algorithm’s designers.

That is, we are not at from our simple chilling algorithm.