# What Statistics Really Is: Part I

Warning number two: This is not an incremental change in demonstration, but a fundamental rethinking. So slow down.

All of statistics is this: the art of showing the probabilities of propositions p with respect to evidence q (also propositions). If you’re a fan of equations, and there’s no reason you should be, it’s this:

Once you’ve memorized that, you’ll have mastered most of what you need. The remainder is in this tidbit: the reason equations are not always to be loved is because statistics is *showing* the probabilities, not *quantifying* the probabilities. This is essential. Not all, perhaps most, probabilities are quantifiable. But we’re sure good at fooling ourselves into thinking they are. This is where the plague of over-certainty begins. Equations give the idea that *numbers are happening here* and this usually isn’t true.

*Usually* isn’t true. Consider q = “Most tweets about politics are querulous” and p = “Tweet 430838597776715778 (an actual tweet about politics, which you haven’t yet seen) is querulous”. The probability of p given this q *and only this* q is not quantifiable, except in the weak sense that “most” means “more than half but not all”, thus the probability is not a unique number, but the interval greater than 0.5 and less than 1. This “weak sense” interpretation of q is, if it was not obvious, *part* of q, the baggage that *all* propositions possess, baggage which includes our understanding of the words and grammar, the context, and so forth (as is true is *any* logical argument).

Now in assessing the probability of p using *this* q you can—this is such a simple rule but one which can’t be remembered—*only use q*. Of course, if you’re a maverick and want to *tell* people you used q but actually use some q’, well, you’re at least in good (or large) company. Example: q’ is where you plug the tweet into Twitter, learn about p, then judge p with respect to this different knowledge. That’s cheating *if* you claim to have relied on q alone. But if you’re intent was to issue the probability of p given q’, well and fine.

“But this *isn’t* statistics,” I hear you saying, “Where’s the data? Where’s the model?”

Ah. Data. My dears, q *is* data, or, more precisely, a datum. You’re not used to see data like this, but data it is. Data with which you’re familiar are propositions, too. “A man, then a woman, then two men, and they were 34, 42, 36 and 58 years old, and …” Any statement of observations is a proposition: usually complex compound difficult strung-out propositions, but propositions nonetheless. Prove this to yourself before continuing.

The model? It was there, standing and waving for attention. You didn’t notice it because its dropped it bangles and adjustable jewelry, i.e., it’s parameters. What is a model? Well, the mechanism to assign probabilities of propositions p given evidence q. Here that mechanism was a direct deduction from premises, our q, to the probability p was true. Given (only) q there were no other possibilities for the probability of p. Deductions are the cleanest and best kind of models. They are clean in the sense that no *ad hoc* evidence was added to q as it nearly always is—as it certainly is if parameters are present.

In typical practice, the evidence, or premises, q are extended by pasting “I believes”, such as “I believe p is represented by a normal distribution with these two parameters.” Beliefs are nice and sometimes they can even be true but ardency is rarely a guide; tradition might be helpful, but not in fields with a history of screwy ideas (sociology, psychology, education, etc., etc.). The “I believes” are the models.

Now no matter *how* or *where* q originated, the probability of p with respect to the (augmented) q is (barring calculation mistakes) the probability of p with respect to q. Meaning, p is correct, it’s the right answer conditional on *assuming* q is true.

It is a *separate* question whether q itself is true or likely true. And the only way to tell that is to conditional q on other evidence which is not p or q; that is, q becomes ‘p’ in a new calculation. If we had conclusive evidence to q’s truth, the we’d be able to deduce the model, as in the example above. If we knew q was false, we could still calculate the probability of p, but why bother?

Indeed, why bother. For that, read Part II.

It’s too bad we’re never told what the priors mean outside of simple games of chance (or other frequentist set-ups–open to anyone to use), nor how we get them, nor why we’d want to use them to ascertain how well probed statistical and scientific hypotheses are. I am also hearing these days that Bayesians have largely restricted themselves to reporting ratios, not posteriors (do you agree?). So no measure of warrant there either. And if you accept Bayesian logical accounts, are you also prepared to accept the “confirmation” of irrelevant disjuncts as in the following discussion? http://errorstatistics.com/2013/10/19/bayesian-confirmation-philosophy-and-the-tacking-paradox-in-i/

Mayo,

You’ll get no love of “priors” from me, though I admit I used to have a crush on them back in my “I’m a Bayesian and proud of it!” days. Parameters, it will turn out, are limits of functions of observables, and therefore priors are usually just as arbitrary as many suspect (though this does not mean they are always wrong).

Bayesians love Bayes factors, as you say. I do not. They are a surrender. Q: “Mr Statistician, what’s the probability H is true?” A: “I won’t tell you, but I can give you a Bayes factor.”

Just as a hint to my first contention, ever notice those paradoxes always creep in at the limit, when fooling around with continuity and infinities? Many (most? all? I haven’t read yours yet) do not exist in the discrete, finite “real” world. Nothing wrong with the continuum (why, some of my favorite numbers are irrational) but the way you approach it matters.

The paradox is for those who claim x is evidence for H so long as the posterior probability of H exceeds that of x.

If x Bayesian confirms H, then x Bayesian-confirms (H & J), where P(x| H & J ) = P(x|H) for any irrelevant conjunct J that is consistent with H.

J is an irrelevant conjunct to H, with respect to x just in case P(x|H) = P(x|J & H).

For instance, x might be radioastronomic data in support of:

H: the deflection of light effect (due to gravity) is as stipulated in the General Theory of Relativity (GTR), 1.75â€ at the limb of the sun.

and the irrelevant conjunct:

J: the radioactivity of the Fukushima water being dumped in the Pacific ocean is within acceptable levels.

The reasoning is as follows:

P(x|H) /P(x) > 1 (x Bayesian confirms H)

P(x|H & J) = P(x|H) (given)

So [P(x|H & J) /P(x)]> 1

Therefore x Bayesian confirms (H & J)

Mayo,

Yeah, I don’t buy the “paradox” part. I’ll answer in post tomorrow.