Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

July 26, 2018 | 2 Comments

How To Do Predictive Statistics: Part III New (Free) Software Regression 2

Mandatory! Read Part I, Part II. I will ignore all comments already answered in Parts I & II.

Download the code: mcmc.pred.R, mcmc.pred.examples.R. If you downloaded before, download again. This is version 0.2!

Warning All the results were produced by an earlier version of the code, where one of the parameters to the normal was off. The code below is correct, but you will get (slightly) different results. I am on the road and haven’t time to redo pictures below. They are not far off, though.


Relevancy is what we’ll use to help deciding what information to include in our model. Relevancy is conditional, like all probability is conditional. A piece of evidence is relevant if it changes the probability of Y, given all the other pieces of evidence. Otherwise it is irrelevant.

     If Pr(Y|XW) ≠ Pr(Y|X), then W is relevant given X.

     If Pr(Y|XW) = Pr(Y|X), then W is irrelevant given X.

It could be that W is irrelevant given X, but relevant given Z, i.e. a different piece of evidence (that is not X). The kind of evidence W is does not matter.

If Pr(Y|new X, old X&Y_i, Model) ≠ Pr(Y|new X, old X&Y_i+1, Model), then the additional old data point is relevant given X and the Model; if the probabilities are equal, then the new data point is irrelevant. Change in probability with the addition of new data points is thus an interesting measure of interestingness of new data.

Relevancy is not the same as structural model change. For example Pr(Y|new X, old X&Y, Model_prior_1) ≠ Pr(Y|new X, old X&Y_i+1, Model_prior_2), i.e. everything the same except for two different priors, then we don’t say the prior is relevant. Obviously the model, of which is part of the model with all of its assumptions, change the probability of Y.

Now we often write Y = “y in s” when y is numeric. Then it could be that

     If Pr(y in s|XW) = Pr(Y in s|X), for some s and

     If Pr(y in s|XW) ≠ Pr(Y in s|X), for other s.

An s_1 interesting to one decision maker can come to a different conclusion of relevancy than a second decision maker who is keen on s_2.

This, like probability, is not a bug, it is a feature!

Relevancy, also like probability, is not decision. It could be

     If Pr(Y|XW) = Pr(Y|X) + epsilon,

where adding W changes the probability of Y (or y in s) only a tiny bit. W is then relevant as long as epsilon > 0. It is not—as in NOT—the statistician’s job to say which threshold of epsilon causes W to become useful. Usefulness is up to the decision maker. Let there be no standard threshold! We see the havoc wrought by insisting on the magic number of 0.05 for all classical analyses. Let us not make the same mistake.

Decisions do, however, have to be made. Is epsilon so small that it makes no difference to a decision that will be made using the model? If so, we leave W out; if not, we keep W. We often have causal knowledge, or something approaching it, about W. If we assume—and this therefore makes the assumption part of the right hand side of (2); ALL assumptions are part of the r.h.s. of (2)—W cannot possibly lie in the causal path of Y, then we judge W irrelevant regardless of any changes it makes to the probability of Y. A handful of thrown rice on the floor may resemble the face of President Taft, but since we know there is no causal mechanism in the rice that could do that, we judge rice to be irrelevant in judging probabilities of faces.


Recall eq. (2): Pr(Y|new X, old X&Y, Model and assumptions). We need to supply for our model supposes or surmises of what new values of HGPA and SAT will be. We have a some idea of what these will be in the old data. We can thus use the old data to make a stab, though a weak one, at gauging how good the model is. Predictive statistics does not solve the problem of data reuse. Fooling ourselves is, as ever, a real possibility.

One strategy, then, is to step through the old data and pretend it will look like new values, calculate the probability in (2), and then make various plots, calculations and such forth. Like this:

# all the old x
q = matrix(0,nrow(x),3)
for(i in 1:nrow(x)){

A limitation of R is that there is no mechanism for stepping through a data.frame while maintaining the class of each variable. The function apply coerces the data.frame into a matrix, and turns everything to characters (if any column is a factor). The others like lappy, mapply, Vectorize and so on turn the result into unworkable lists. I have searched, but have had no success, in discovering if anybody solved this problem. So we’re stuck with a loop; at least, for now.

This bit of code calculates the approximate quantiles for every old value of HGPA and SAT. There is nothing especially important in the values 0.1 and 0.9; but, really, calculating 95% intervals is rather severe. Do we need that much certainty in the results? Choose or add whichever you like, but be sure not to forget the definition of q.

A KEY NECESSARY MUST-HAVE notion is that since our model specified the probability of Y given HGPA and SAT, we must ALWAYS specify a value of HGPA and SAT when calculating (2). There is NO notion (besides relevancy) of studying HGPA in isolation of SAT, nor vice-versa. This applies to your models, too. If you have x1, x2, …, xp, each and every calculation, from now until forever, must specify values of x1, x2, …, xp.

There is NO notion of any variable in (complete) isolation of the others. This is an impossibility! If you don’t want to consider a measurement, then don’t put it in the model. This represents another major departure from the old ways of doing things.

What can we do with the q?

for(i in 1:nrow(x)){

Do the same for x$sat. Looks better in ggplot2, but since that code can be distracting, we’ll stick to simple plots whenever we can.

As I just stressed, I hope sufficiently, each hgpa has a corresponding sat, but you don’t see sat plotted; only hgpa. So we only have a slice of an idea what is going on. You want to see both at the same time, in all their glory? We can, for 2D data, use 3D plots. But in general it’s tough cookies, mister. There is no general solution.

One of the key problems with classical analysis was that it made model analysis seem so easy. Get a wee p? Success! Bah, humbug. Real analysis, predictive analysis, is hard work. In predictive analysis the bigger the model gets, the harder the work becomes because complexity increases fast. This is, or rather should be, expected. But in classical analysis, it’s wee ps no matter how many terms in the model, each investigated in isolation (or mostly). It’s no wonder so many mistakes were made. Over-certainty was guaranteed.

What about, say, probabilities of CGPA > 3? How about this? You’ll need to install the plot3D package if you don’t have it.


q = NA
g = 3
for(i in 1:nrow(x)){
  p = MCMCregress.pred(fit,x[i,])
  q[i]= sum(p>=g)/length(p)

# try these, too, for fun
#plot(x$sat,q,ylab='Pr(CGPA>g|old data,M)')
#plot(x$hgpa,q,ylab='Pr(CGPA>g|old data,M)')

scatter3D(x$sat, x$hgpa, q, xlab='SAT',ylab='HGPA', zlab=paste0('Pr(CGPA>',g,'|D,M)'), phi = 0, bty ="g", ticktype = "detailed")

Contour (flat) plots can also be used, and perhaps are superior here. Though this 3D plot shows the dramatic rise in probability for high values of both measurements.

It is obvious both HGPA and SAT are relevant, at least for most values of “y in s”, and that most decision makers would judge the change in probabilities as useful. I did not do a search of all s, but you can. Instead, let’s add recomm to the model and see what it does. We already calculated probability greater than 3 for the model, stored in object q, without recomm. Now let’s do it again with recomm.

fit.r = MCMCregress(cgpa~hgpa*sat+recomm,data=x)
q.r = NA
for(i in 1:nrow(x)){
  p = MCMCregress.pred(fit,x[i,])
  q.r[i]= sum(p>=g)/length(p)

plot(x$recomm, q-q.r, ylim=c(-0.05,0.05))


Since I, the Lord & Master of this site, and the decision maker, it was up to me to choose a level or threshold above which I judged recomm useful. I picked 0.05. That’s why I set the limits in the plot that way. Any dot sticking up above +/- 0.05 proves recomm is useful. To me. Maybe not to you.

As it is, the maximum (in absolute value) change in probability was 0.0026950. The median change was 0.0000925. Recommendation is relevant because the probability changes are non zero, and because I can’t imagine it would be non-causal (in the causal path). But a change in probability of 0.0001 isn’t even peanuts. This analysis only applies to a g = 3, i.e. “y in s” equals “y > 3”. Different s may lead to different conclusions. Your homework is to check.

Recall we included the multiplicative interaction between HGPA and SAT. Is that relevant? Is it useful? We now know how to check. So do so.

One word of caution. We have been using old data, on the reasonable theory that new values of HGPA etc. will look like the old. But the old data does not fill in all the gaps of possible values of HGPA, SAT, and Recommendation. This is no trouble. We can create our own data set—and we even must do so! That was the point of the analysis. We wanted this model to give uncertainty in Y (CGPA) given X (values of HGPA and SAT). So the analysis is not concluded until we make those predictions.

We already know how, too. It’s as simple as passing in a new data.frame with the values of HGPA and SAT in which we are interested.

data.frame(cgpa=0, hgpa = 4, sat = 1200)

Remmeber, at least for now, we have to pass in the Y with some value, but that it is ignored.

Enough already! I started off sickened by the CGPA data, and now I loathe it and never want to see it again. Try the software on new data, and compare the results of the predictive method to classical hypothesis testing. You will discover (a) sometimes the decisions are similar, sometimes they are wildly dissimilar, and (b) the predictive results are less certain, and (c) the predictive analysis is much harder.

Next time we start logistic regression with the low birth weight data.

July 24, 2018 | 4 Comments

How To Do Predictive Statistics: Part II New (Free) Software Regression 1

Mandatory! Read Part I. I will ignore all comments already answered in Part I.

Download the code: mcmc.pred.R, mcmc.pred.examples.R. Update If you downloaded before, download again. This is version 0.2!

You must have installed MCMpack. Get it like this:

install.packages('MCMCpack', dependencies=TRUE)

Once installed, every time you start R (if you didn’t save your session), you must load it:


We’re going to need some data, and though we’re sick of it (I am, anyway), we’ll use the college grade point data, which is the CGPA at the end of the first year for 100 or so kids, along with their SAT (old style) scores, high school GPA, and letter of recommendation strength. We want to peg the probability of certain CGPAs given this information and on the ad hoc assumption we can model uncertainty of CGPA with normal distributions. That will turn out to be, in many cases, a lousy approximation, because CGPA can only live on [0,4], but normals give probability to everything, resulting, as we’ll see, is massive probability leakage. We’ll later fix this problem using multinomial and tobit regression.

I expect everybody reading this will redo the examples on their own favorite datasets. Do so, and report back here.

Read the data in and start the model, using all defaults. You are responsible for figuring out how to use MCMCpack on your own. All we need know is that we’re after this:

     (1) Pr ( CGPA in s | new HGPA & SAT, old data, Model & assumptions),

where s is a set of values of CGPA of interest. Such as “greater than 3” or “equal to 4”, or whatever, and new or assumed values of HGPA and SAT. I don’t care immediately about letters of recommendation.


x <- read.csv("") fit = MCMCregress(cgpa~hgpa*sat,data=x)

Notice we went for the exciting interaction term! Next download and source this code MCMC.pred.R:

# PC
path = 'C:/MyDrive/MyTopFolder/MySubFolder/'
# Linux/MAC
path = '/home/MyName/MyTopFolder/MySubFolder/'


You can use ls() to see all the new code. The bit we will use is:

# Version 0.1
MCMCregress.pred <- function(fit,x){
   # fit from MCMCregress
   # x data.frame in same form as data entered into MCMCregress
   n = length(fit[1,])
   r = as.numeric(y %*% t(fit[,-n]))
   b = rnorm(length(r)*20,r,sqrt(fit[,n]))

This takes the model object fit from MCMCregress, and single new data in the form of a data.frame, and gives us eq. (1). We'll do multiple values later. The structure of the model, at least for MCMCregress, includes the formula, so we don't have to pass it. The row of fit consists in estimates of the model parameters, the last one of which is the spread (or "standard deviation") parameter of the normal distribution.

We grab the model form in the next two lines, which is why we (so far) pass in a data.frame, and then calculate the estimates of the central (or "mean") parameter of the normal distribution of CGPA. We then feed these "mu"s and "sigma"s into the last part of the simulation, which gives us estimates of (1). Estimates of the observable, that is, and not any parameter. We pass all these out.

Notice the "20" in the penultimate line, which is the number of repetitions: notice it is hard coded: notice that this is bad practice: notice how little you are paying. This will be fixed in future versions when the wrappers are written. What's that? What did I mean by wrappers? Ah, so you didn't read Part I. Ah, well.

Update: due to my boneheaded coding error, I used the variance and not standard deviation in the normal simulation. I fixed above and in on-line code, but the results below I did NOT fix yet. I am on the road and will get to it. The code below is right, but the results off.

Let's try:

# Replace x[6,] with any new data that has SAME variables as in model
# the outcome (here cgpa) can be ANY value and is ignored, but must be there. E.g.
# newdata = data.frame(hgpa = 1, sat = 600, recomm = 1, cgpa = 4)
p = MCMCregress.pred(fit,x[6,])

I picked the sixth observation of x to pass in because this is the first number my finger hit. The old data happens to have the proper structure, which the notes say is mandatory. A limitation is that you have to pass in a value of the observable, too, even though it's ignored. That's because the model matrix needs it. That can be fixed in time.

Turns out, for x[6,], hgpa = 0.33, sat = 756, so that p contains the approximation of

     (2) Pr ( CGPA in s | HGPA = 0.33 & SAT = 756, D,M),

where we haven't yet defined s.


# single newdata analysis

I got

> hist(p,200)
> quantile(p,c(0.05,.5,.95))
5% 50% 95%
0.1673601 0.8802050 1.5948456
> mean(p)
[1] 0.8807344

Your results will vary, because this is a simulation approximation. But they shouldn't vary much. If they do, we could boost that "20", or boost mcmc in MCMCregress, or both.

Notice the picture shows modest probability leakage, i.e. positive probability of impossible values, in this case less than 0. Vary hgpa and sat and see if you can find more.

The quantiles show that, for instance, the probability of seeing NEW CGPAs greater than 1.59 are 5%, given these assumed values of HGPA, SAT, and the old data, model and assumptions. So, in this case, s is "greater than 1.59".

Did I mention this was the (conditional) probability of seeing NEW values of CGPA? We do not need models to tell us what happened! If we want to know how many OLD values of CGPA were greater than 1.59, or whatever, we just look. Models are only for telling us about the unknown, not the known.

I cannot stress too highly that this 5% is not "the" probability. There is never, not ever, a case where there is "the" probability. This is the conditional probability as laid out in (2). This true understanding is the biggest departure between the old and predictive ways of modeling. For why it is true, see Uncertainty.

Change anything on the right hand side, and the probability changes. One of the things on the right hand side is the simulation assumptions (and prior and all other model assumptions). That's why, when you run a new simulation, you change the right hand side, and thus you change the probability!

The changes might not be very big---but they are always there (until the trump of doom sounds at Infinity); at least, as long as what is changing is probative to Y. If information on the right hand side is irrelevant to Y, then adding or removing it does nothing. This, after all, is the definition of irrelevancy.

We will use that notion of relevancy next time when we explore what SAT and HGPA, and even letters of recommendation strength do to the model.

July 23, 2018 | 6 Comments

How To Do Predictive Statistics: Part I New (Free) Software Introduction


Here’s what we always want, but never get, using the old ways of doing statistics: The probability that some proposition Y is true given certain assumptions. Like this:

     (1) Pr(Y | assumptions).

What happens if we change the assumptions? We change the probability. This is not a bug, it is a feature. There is NO SUCH THING, not ever, as Pr(Y), i.e. Pr(Y|no assumptions). It is an impossibility. And that means frequentism as a theory of probability fails. As does any theory which takes for granted that things “have” a probability and it is our job to discover it. For proof of these claims, read this New York Times-eligible bestseller Uncertainty.

Just because you don’t write down assumptions, doesn’t mean they are not there. At the least there will be the assumptions about what Y means, about grammar and language, and maybe rules of math. There are always assumptions.

That means if we change the model, and since the model is an assumption, we change the probability. If we add or subtract “data”, and our model uses data, and since the model is an assumption, we change the probability. If we tweak the model, as in adjust it or modify its parts, like priors, and since the model is an assumption, we change the probability. Again, this is not a bug, but a necessary feature of probability. (Some people actually complain about how changing the model or prior changes the probability, but that’s because they believe in Pr(Y) which is impossible.)

Most models are ad hoc. The vast majority of models make rather gross approximations of Y, such that it lies on the continuum. (No real thing does, though, oddly, the continuum is called the “real” line.) Regression is in this class of ad hociness. It is, by far, the most used model class, judging by published “research” (sorry for scare quotes). Regression is, or rather should be, this:

     (2) Pr(Y | new X, old X & Y, Model & assumptions),

which is the probability of our proposition Y given a new assumed or measured proposition X, the old measured values of X & Y, and the ad hoc model with its assumptions. There is no notion of cause in this set up, despite how folks describe their results. All (2) says is “This is the probability of Y given these things.” That’s it, and nothing more.

For shorthand we can write

     (3) Pr(Y | XDM),

where D is the old “data”, and the others are obvious. Change anything—as in anything—on the right hand side, change the probability of Y. For the last time, this is not a bug, it is a feature.

I’ve skipped over lots of detail, because regular readers have seen it all before, and because it’s in Uncertainty and in the other posts of this on-line class (of which this post is a member).


One of the big reasons predictive methods have not taken off is the lack of free and easy-to-use out-of-the-box software. There is tons upon metric tons of software for classical parameter- and hypothesis-testing methods. But those methods are dead to us. We do not care about parameters and we most certainly do not care for hypothesis tests. We care only about things like eq. (2) (and 3).

The best solution would be to build this software from scratch, which requires time and money, and Yours Truly has neither in any abundance (I am wholly independent; no students, no grants, no affiliations of any kind). The next best is to build upon what’s already out there. This has the advantages of speed and familiarity, but the disadvantage of being stuck in a pre-determined format with all its inherited limitations.

We’re tried other things in the past, but I’ve settled—as a start—on extending MCMCpack. With full acknowledgement of the severe limitations of simulation methods). Simulations are not magic, nor do they “have” probability. Nothing “has” probability. There is no such thing as “drawing” from a “random” distribution. Read the links: simulations are only reasonable approximations to difficult numerical integrations. That’s is, and nothing more.

MCMCpack model routines return approximations of posterior distributions of parameters of models. We have zero interest in parameters per se, since they are only part of model innards. Concentrating on parameters as if they were probabilities of observables (like Y) has led to vast, vast, incalculable over-certainties. All that is also in Uncertainty.

We can use the posteriors, though, in approximating (2), and having knowledge of the model forms. The idea is to take the approximated parameter posteriors, and feed them into a second simulation-approximation to get (2). Simple as that.

MCMCpack allows all kinds of priors to be input, and lots of other abilities to form models. Here is just one of many examples—recalling, of course, that I do not endorse their philosophical interpretation. Here for example is MCMCregress. Look at everything you can set!


MCMCregress(formula, data = NULL, burnin = 1000, mcmc = 10000, thin = 1,
verbose = 0, seed = NA, beta.start = NA, b0 = 0, B0 = 0,
c0 = 0.001, d0 = 0.001, = NA, sigma.var = NA,
marginal.likelihood = c("none", "Laplace", "Chib95"), ...)

The formula is usually typed right into the method, but unfortunately, for many of its models, MCMCpack does not store the formula as part of its output. So sometimes we’ll have to create the formula outside the method so we can store it. That’s easy in R, e.g.

form = formula('y ~ x1 + x2 + x3 + x4 * x5')

That simple but unfortunate notation, incidentally is why many believe probability models say something about cause. After all, does not the formula say y is x1 plus x2 etc.? When, of course, in regression, the real formula is “central parameter of approximate normal distribution of y = x1 + x2 etc.” Normal distributions are always approximations, as are any continuous-based parameterized models. As long as we keep in mind the sometimes severe limitations this imposes, this is not a problem. Many (most?) don’t keep it in mind, however, and slip into committing the Deadly Sin of Reification, which is to assume the model is reality.

The strategy of extension is to take the output of MCMCmodel and feed it into a new method MCMCmodel.pred, for prediction. Because MCMCpack is not consistent in what it passes to its ouput, we sometimes have to track things like the model formula, the limits of y, and so on. This isn’t hard to do, but it makes the software less automatic. So ideally a wrapper would be put around each MCMCmodel that handles this. I did NOT yet write such wrappers. In fact, all the predictions methods thus far have zero error or bounds checking, nor sanity checks of any kind at all. That level of coding takes the most time (to my mind) and is the least fun. So you will have to be careful what you pass into the prediction methods. This, again, is a usage limitation, which can be corrected in time.

What I’ll be introducing in further posts are prediction methods for, so far:

  1. MCMCregress, ordinary regression
  2. MCMClogit, ordinary logistic regression
  3. MCMCmnl, multinomial (bucket) regression
  4. MCMCpoisson, Poisson regression
  5. MCMCtobit, regular regression with censoring on y.

This represents a big chunk of routine procedures, and is therefore enough of a “package” for people to get started.

Regular readers will know that multinomial regression is shockingly underused. Why? Because, I say, each analysis should be related to the decisions that will be made with the model. If no decisions are going to be made, there is no reason to do the model! Two people can have three different (or more!) decisions to be made of the same model structure, so that it should come as no surprise that they, using the same data, can come to different probability conclusions and different decisions.

All decisions, like all probability, are conditional. Since all decisions are finite and discrete, therefore so too should be the models. No y is continuous, each falls naturally into buckets delimited by a measurement process and the decisions to be made. And the multinomial does that for us (to reasonable approximation). Well, we’ll come to that when the multinomial method is introduced.

Consider this software to be version 0.1. It is strictly in the testing stage. At this point, it has very little versatility. It works, though; or, it works as near as I can tell. I am even more famous for typos in my code than in my writing, so that if there are bugs or misconceptions, do not be surprised.

Next time , we do MCMCregress.pred!

LEGAL DISCLAIMER: using this software might break your computer irrecoverably, cause black holes to form in your toilet when flushed, and encourage your children to become lawyers. Do not say you weren’t warned.

Update Download the code: mcmc.pred.R, mcmc.pred.examples.R. Update 2 If you downloaded before, download again. This is version 0.2!

July 20, 2018 | 5 Comments

Mail Bag

I daily receive thoughtful emails from readers. Some ask questions, many point to articles or “studies” that need comment. Others just want to say hi.

I am grateful for these missives and thank you all for them. Please keep sending them. But I must apologize that I do not have time to answer them all. I try. I am now just over two years behind.

Here is a minor attempt at answering some.

Mail #1 Five or six times a week I get emails like this.


My name is Emily Olsen and I work with Perennial Relations, a PR firm based in NYC.

I have a client who is interested in getting some basic exposure on your website via a guest blog post, or even just a quick mention of them within one of your articles. This is a reputable, well-known company that I’m confident you’ll be comfortable mentioning on your website.

I’m authorized to offer up to $40 for the post and can pay by Paypal. Please let me know if you have any questions. I look forward to hearing from you!

Yours truly,

Emily Olsen
Perennial Relations

The highest offer was $100. Now I’ve never taken any of these, and never will, but that there are so many for this site (which obviously has modest traffic) proves it’s a productive business. Meaning a good deal of what you see (elsewhere) on line is fake or touted or suspect.

The only donations accepted, for this wholly independent blog, are from readers.

Mail #2 True Enlightenment (ellipses original).

What an excellent read your book “Uncertainty” is. I am a very quick and experienced reader. At this moment,…on page 24.

It is a true joy to read intelligent “common sense” from time to time, especially as a German citizen…

Thank you!
(PhD student, devouring your ideas)

The must-read book this fine young man is talking about can be had here. Buy two copies in case you ever, heaven forefend, lose one.

Speaking of students. I haven’t any. This puts a damper on the amount of practical work I can get done. I’m trying, among other things, to put together a package of predictive methods. A major reason for the lack of adoption for these True Methods is that no pre-packaged software exists. One I’m doing is based on MCMCpack (with full acknowledgement of the severe limitations of simulation methods). This would ordinarily be work I could delegate.

So if there are any students out there who want to volunteer to do some leg work, for no pay and no official acknowledgement, let me know. (Has to be unofficial, because I have no ties to any institution.)

Mail #3 Difference Made (name withheld).

Hello William,

I somehow stumbled on your site. Anyway its an interesting read especially since you make a interesting arguments and from the Liberal test (the trigger into your site) I found myself nearly with a 100% agreement (with the progressive).

Yet – as I reach midlife I am starting to question some of these beliefs. Starting with spirituality (from a lowest base possible) where do you start to get a basis where your coming from?

Best regards,

J, I used to be, and not that long ago, and even in some parts still, too in love with the world. Many years ago when I started this blog, I was still mired in atheism and the sins of rank individuality. Mine was no Road to Damascus. I stupidly chose to walk back to Reality barefoot over the most overgrown path.

What helped me return to Sanity? Books. Like these:

The Last Superstition by Ed Feser. All of Feser’s books are worth reading, but this is the best to start with.

Against the Idols of the Age by David Stove. Stove was an Australian philosopher who never wrote a boring sentence. He was a self-proclaimed atheist, but I don’t believe, and can’t believe, judging by his work, that he was sincere. Like Feser, anything Stove wrote is worth reading. Start with Idols, or even What’s Wrong with Benevolence. Or On Enlightenment.

Bible. Read it. Start with the New Testament. Read slow.

Next, turn off the TV. Or if you can’t, as an experiment, don’t watch anything produced after, say, 1960. This includes sports. Cut yourself off from your usual sources of information, such as (if you use it) NPR, any newspaper.

Read other old books, this blog, Social Matter and the links within, The Orthosphere. And so forth.

Really do this. You will be amazed at the end of the fortnight how different things appear.