No, I don’t think so, but the Census Bureau thought (thinks?) as much.

What follows is one of the more curious emails I’ve received, describing the experiences of Juan (not his real name), who used to work at the Census. Perhaps his story proves what readers have suspected: the more time spent with statistics, the looser your grasp of reality.

I’d really like your help with this one. I’m not sure how to answer.

…I was working at the U.S. Census Bureau doing quality control on the Current Population Survey. The primary way that we checked data quality was by performing re-interviews. We would ask the same set of questions, for the same time period, from a sub-sample of the households in our monthly survey.

One day I got the bright idea that the re-interview data I had looked a lot like a Markov chain. There was possibility that a different answer was given in the re-interview than there was in the interview. Questions that had a high frequency of answers changing were considered unreliable. I had a matrix for each question showing the frequency/probability of moving from one state (answer) to another. This looked just like the transition probability matrix that I had been taught about in my first stochastics class. I remember a problem where we had to predict tomorrow’s weather based on today’s. This was exactly the same and I went about taking a limit and calculating the stationary distributions for several of the re-interview questions.

My branch chief had me run the idea by the chief statistician at the Census Bureau and his reaction was not what I was expecting. He said that calculating the stationary distribution was simulating an immoral experiment! His thought process, as best I can remember it, was that taking the limit of that matrix was simulating re-interviewing our sample households an infinite number of times which was immoral.

A couple of years later I asked a friend, who holds a PhD in Biostatistics from Harvard, about this and she agreed with the chief statistician. This seems to me like they are taking the abstract and trying to make it real which is a huge stretch for me. Is the Bayesian interpretation of this approach different? Would Bayesians have moral qualms about calculations the stationary distribution in such a situation?

I followed up with Juan and he gave me more details confirming the story. An example of how a question on race (which was mutable) heads this post. Terrific evidence that most survey data should not be taken at face value.

Markov chains are the fancy names given to certain probabilities. Juan used weather as an example. Suppose the chance of a wet day following a dry day, *given some evidence*, is p_{01}, and the chance of a wet day following a wet day, *given some evidence*, is p_{11}, and say, p_{11} > p_{01}. Since these are probabilities, they don’t for instance tell us *why* wet days are more likely to follow wet than dry days; they only characterize the uncertainty.

This is a “two-state” (wet or dry) Markov chain. Of course, you can have as many states as you like (there are 6 above), and the matrix of the probability of going from one state to another, given some evidence, describes the chain. The “stationary distribution” of any chain are the calculated probabilities of how likely the system will be in any state “in the long run”. These probabilities are no longer conditional on the previous state, but they still are (obviously) on whatever evidence was used.

There is no such thing as “the long run” as in Keynes’s quip and in the directors odd idea of infinite simulations, but these stationary distributions are useful as approximations. Say we wanted to know the chance of wet day conditional only on the evidence and not on whether yesterday was wet or dry. We get that from the stationary distribution. If, for example, p_{11} = 0.5 and p_{01} = 0.1, then the stationary distribution is π(0) = 0.17 and π(1) = 0.83 (if the back of my envelope isn’t misleading me).

What Juan did was to use the evidence of questions changing answers in the sample to guess the probability of each of the answers would be given by the population as a whole, e.g. the probability of being white, etc. Understand that this final guess was based on *the guess* of the transition probabilities. No matter what, guessing, i.e. models, were involved.

Is modeling immoral? Those stationary distributions are *deduced*, i.e. they follow no matter what, from the transition probabilities. They’re there whether they’re used or not.

One possibility is that the Census is supposed to be an enumeration—no models. And thank the Lord for that. Perhaps the director thought any introduction of models breached their mandate? Doubtful, since the document which gave the table above is filled with statistical models.

There’s even a model for the table above, which attempts to do exactly what Juan did, but using another classical test (“Index of consistency”). So I’m lost. Is this yet another frequentist panic, another instance of the Deadly Sin of Reification?

What do you think?