All Of Statistics: Part III

(B) New data

It might surprise you, but in classical (both frequentist and Bayesian) practice, if we expect to see new X, the procedure is almost always no different than the procedure when we expected no new data. That is, an M = Mθ is proposed, calculations are done, certain θ are set to 0, and Mθ’ is then said to describe X, finis. In the vast majority of cases of statistical analyses, Mθ’ is just assumed true; discussion centers around the parameters, and uncertainty all but disappears.

Contrast this with the procedure physics, chemistry, or even mathematics usually follows. Some evidence E is used to proposed a limited set of M—usually a historical M0 and one or more new theories, M1, M2, etc. These are all, as in classical statistical practice, assessed in light of the historical X. These M also sometimes have unobservable parameters (think of Planck’s constant, etc.) which are guessed using statistical methods. Discussion occurs over these parameters, but only when M has been verified (to some extent) by its “closeness” to historical X.

In many of the physical sciences, the analysis does not stop at discussing the models’ closeness to historical data, nor is the focus just on the parameters (usually). These sciences instead use the models to predict new data: these predictions will say that new X, given each M, will take certain values at such-and-such probability. It is usually the case that the probabilities of these new observations differ for each model (if they did not, the models cannot be distinguished).

Time passes, new data is collected, and the models are assessed in light of the predictions which were made. The models are then ordered by how well they predicted this new data. “How well” is a subjective measure: it can and does differ, meaning that models might be useful to some but not to others. Verification can be done formally, as in statistics, by calculating the probability each model in the set is true, in light of the new X, old X, and the given E. But usually, this ordering is done informally (this informality does not invalidate the findings; when I opened I claimed not all probabilities can—nor should—be quantified).

These new models are not always accepted; often they are rejected (even mathematical proofs are sometimes found to have flaws). Perhaps newer still models arise from the ashes of these rejects, but these phoenixes are subject to the same pitiless confirmation process. This procedure has worked out rather well for these fields (excepting climatology, for its lack of verified forecasts). We are not certain sure each physical model is true, but most of them are very probably sure.

Now consider the so-called softer sciences like sociology where the situation is markedly different; classical statistical procedure (both frequentist and Bayesian) is used as if no new data were expected, as explained. Because the models are never tested to make predictions, the models proposed by individuals are taken as true. The data is used, at best, to say something about the unobservable parameters of M. Over-certainty abounds.

The conjectures in these fields are rarely put to the test of verification. When new data is anticipated or is collected, the statistical procedure begins anew, as if the old data did not exist. The form of the model is the same, and discussion again centers on parameters. Worst of all, the certainty that is felt to lie in the parameters is said to lie in any new data that is expected. If new data is sought it is often collected only to confirm the M. This search is usually rewarded, not necessarily because the M assumed true are true, but more because of the wisdom in the saying, “Seek and ye shall find.” Confirmation bias creeps in and sticks to everything.

Contrast again the situation in the physical sciences. New data is sought that will confirm the M, but also sought is data that would disconfirm or invalidate the M. I need only say the words “cold fusion” to show how rigorous and routine this process is. This search does not happen, or happens rarely, in the soft sciences: people there are comfortable sticking with their preconceptions. Because they expose their models to new data, the physical sciences are usually (a word which implies “but not always”) trustworthy: ships float, cameras take pictures, lasers cut, and so on. The soft sciences do not have such a fund of success to point to.

The one area of statistics in which future data is considered is time series, where it is acknowledged from the start that X is part of a stream of data. Unfortunately, the procedure differs little from ordinary statistics except that it is acknowledged that the models belong to a more limited class than in ordinary statistics. Discussion still centers on (and ends with) the parameters (see this post for what can happen). The models can be, and are to a greater extent than usual, put to the test, but not still not often. The models are just assumed true, the parameters are said to be “it.”

All statistical procedure should be seen as “time series;” at least, when new data is expected, but in the way the physical sciences treat old and new data. Models should be put to rigorous, unforgiving tests of validation. Except when absolutely necessary (which will be rare times indeed), discussion should move away from parameters and focus on uncertainty of actual observables (or testable conjectures). This is the only way to eliminate over-certainty.


  1. Can I just say I enjoyed this series, and feel a little bit that the fact that it has only generated ~4 comments in total while the next post over 100 suggests there is a need to get this kind of stuff into a more popular format.

    I should add that there is some good stuff going around (the Newton Institute @ Cambridge stuff last year is a case in point). It rather undermines the argument that serious scientists don’t worry about uncertainty and the limits of models, and are paying it all some attention.

Leave a Comment

Your email address will not be published. Required fields are marked *