## William M. Briggs

### Statistician to the Stars!

#### Page 145 of 613

The Mandarin word for bear if not given the proper tone sounds just like breast. Be careful at the zoo saying What a pretty bear!

Regular readers will recall there are two main kinds of bad statistics. First is when the technique has been done wrong or is misapplied. Errors of this kind comprise only half of all mistakes. The second, and more subtly nefarious, and just as pervasive, is where researchers announce they have used “science” to “discover” that which everybody already knew was true.

Nefarious because it strengthens or inculcates the bizarre and horrible fallacy that true knowledge can only come from science. That is, bad statistics of the second kind boosts scientism and makes scidolators of us all.

Our latest entry is Sarah Gervais, Arianne Holland, and the (given he has two female co-authors, presumably slavering) Michael D. Dodd in their peer-reviewed paper “My Eyes Are Up Here: The Nature of the Objectifying Gaze Toward Women” in the aptly named journal Sex Roles.

Here is the blockbuster opening sentence of the Abstract. Pay attention:

Although objectification theory suggests that women frequently experience the objectifying gaze with many adverse consequences, there is scant research examining the nature and causes of the objectifying gaze for perceivers.

Everything that can go wrong already has, which must set a mark or goal for other researchers to follow. Objectification theory? As the modern aphorism in the right-hand sidebar to this webpage indicates, “The love of theory is the root of all evil.” Only an academic could be puzzled enough that men look at women lovingly and in lust to create a theory of such behavior.

And then comes the “adverse consequences.” Like marriage? The joy, the bliss, the beautiful heartbreak from raising families? I can confess to you, my dear readers, that I first gave a serious eye to the female to whom I eventually plighted my troth. Of course there are also brutes and cads and construction-worker fashion critics, the men who, when they digress, should be instructed by gentlemen. But don’t forget those who gaze in rapturous silence. The mating process is imperfect. Human beings outside the academy understand this.

Our trio, relying on theory which comes before observations, pretend to believe two things which are blatantly false. First, that nobody knows men actually look at women in practice and that “data” is needed to confirm the theory. And second, that a theory is needed to explain this.

There is little point to surveying the “study” they did, but in brief, they used Photoshop to doctor the pictures of women to represent “cultural ideals of feminine attractiveness to varying degrees”. Now one wonders from where did they derive these cultural ideals except through the observations which they say have not yet happened? Never mind. Here are the body types:

high ideal (i.e., hourglass-shaped women with large breasts and small waist-to-hip ratios), average ideal (with average breasts and average waist-to-hip ratios), and low ideal (i.e., with small breasts and large waist-to-hip ratios).

Lo! Men preferred the hourglasses. A wee p-value confirmed this “finding”, or “discover”, if you prefer. That was the “main hypothesis.” Hypothesis forsooth!

And there were secondary “findings.” They “found that participants focused on women’s chests and waists more and faces less when they were appearance-focused (vs. personality-focused).” In other words, men gave the bodies of the pictures on the computer screen the once over before taking a gander at the faces. Who could have guessed? Well, everybody.

The researchers also were shocked—shocked!—to learn that women acted the same as men and that women were (to coin a word) judgmental. Golly.

But enough. Because we are now at the last sentence of the abstract, where all the errors above are compounded and multiplied. “Implications for objectification and person perception theories are discussed.” Person perception theories? This at least explains what academics do with their plentiful free time. They make up stuff to study.

10 points for whoever can spot the mistake in this formula. (Yes, there is one).

I claimed, and it is true, that all statistical problems could be written $\Pr(p|q)$, where p is a proposition of interest and q is our evidence, or premises, or data, or data-plus-model, whatever you like to call it. Recall q is a compound proposition, including the data and whatever other knowledge we assume or possess.

I also claimed that q often contains “I believes”, in the form of “I believe the uncertainty in p is represented by this parameterized probability ‘distribution’.” Regardless whether these beliefs are true, as long as there are no calculation errors, $\Pr(p|q)$ is the true probability—because it assumes q is true, but does not seek to prove it. This is no small distinction; it must be kept continuously in mind or mistakes will be (and are) made. (More on this in the last Part.)

So let’s separate the “I believes” from q and call them m (for “models”). Thus we have $\Pr(p|qm)$ where q is as before sans the arbitrary model. Now, we don’t always need models. The example I showed last time didn’t need one. Here is another where a model is not needed. Example: p = “At least 7 4’s will show in the next n throws” and q = “We have a k-sided object (where k is at least 4) which when tossed must show only one side, with sides labeled 1, 2, …, k.” We deduce the probability of p directly (it is binomial).1

It turns out, at least in theory, that we can always deduce probabilities when p and q speak of finite, discrete things, which are really all the things of interest to civilians.2 Mathematicians, statisticians, and the odd physicist, however, insist on stretching things to limits to invoke continuity. Noble tasks, worthy goals and the only real mistake these folks make in pursuing them is anxiousness. Because the “I believes” are usually stated in the continuous, infinite forms as if given to us from on high and are not themselves deduced or inferred from the evidence on hand. And—as one of my favorite jokes has it—that’s when the fight started.3

The m’s, the “I believes”, are the cause of (rightful) contention between the two main sects of statisticians, the frequentists and Bayesians. Give you an example: p = “Tomorrow’s high temperature will be 72F”; q is any sort of data we have on the subject, and m = “The uncertainty in p is characterized by a normal distribution with parameters a and b.” The parameters of this model, as they are in most, are themselves continuous and unobservable; well, they are just fictions necessary to compute the probability of p.

Which in this case is 0 regardless of the value of a and b. That’s because a normal distribution, like all continuous distributions, give 0 probabilities to all single observables. (Don’t forget this probability is true assuming q and m.) This is why we can’t ask normal questions of normals (a pun!). You can see this is the point where adherence to a lovely theory can screw with reality. Anyway, if we want to use continuous distributions we must change our propositions so that they become answerable: let p = “Tomorrow’s high temperature will be greater than 72F”. This will have some non-zero value no matter what a and b are.

And just what are a and b? Nobody knows. There is no evidence in q or m to tell us. But since knowing what they are is absolutely necessary to solve the problem, we have to make some evidence up. Bayesians start talking about “flat” or “non-informative” or “improper” priors; some like to say “maximum entropy!” (the exclamation mark is always there). This move baffles the frequentists who say, and say truly, “You’re just making it up! How do you know it’s the right answer for this problem?” The Bayesian demurs and starts discussing “objectivity” and so forth, all different names for the same maneuver he just pulled.

So the frequentists go their own way and say, “I don’t know a or b either, so I’ll just guess them using one of several functions, or test their values against this null hypothesis.” Now it’s the Bayesians turn to demand accountability. “But you have no idea if your guesses are right in this problem! And, anyway, nobody in the world believes your so-called null hypothesis.” The frequentists retort, “Well maybe we don’t know if the guesses are right in this instance, but they will be if we do problems exactly like this an infinite number of times. And nobody ever believes null hypotheses, sort of.”

The steaming opponents—who you will have noticed ignore that both made up m out of whole cloth—leave the field of battle and head back to their encampments to produce their guesses which—surprise!—are usually not that different from each other’s. This is partly because all or almost all statisticians start as frequentists and only see the light later, so everybody uses the same kind of math, and partly because there’s usually a lot of good, meaty knowledge in q to keep people from going too far astray.

But the criticisms of both are right: from the arbitrariness of the m to the arbitrary guesses of the parameters, there’s a lot of mystery. Both sides are guessing and don’t like to say so.

The alternative? Restate the problem in discrete, finite terms and then use q to deduce the probabilities in p—if they even exist as single numbers, which most times they don’t. For most applications this would be enough. For instance, do we really care about 72F in particular? Maybe the temperature at the levels (‘below 60’, ‘between 60 and 70’, ‘between 70 and 75’, ‘above 75’) are all we really care about. After all, we can’t make an infinite number of decisions based on what the temperature might be, only a finite number. This moves gives us only four categories, some good observations in q, and we won’t be adding anything arbitrary. Everything is deduced starting with premises that make sense to us, and not to some textbook.

Well, this works. And if we really are enthusiastic, we work out all the math and then, and only then, take things to the limit and ask what would happen.

See this poorly written paper for an example of the typical “unknown probability of success”.

Next, and last, time: how do we learn about q?

———————————————————

1I’m not going to prove it here, but we don’t need information about “uniformity”, “symmtery”, “priors” or any of that stuff. See the statistics and probability philosophy papers for more details. Just believe it for now.

2I’m not proving this here either, but if you disagree I challenge you to state a measure of interest not of the categories listed about that isn’t discrete and finite.

3My wife and I were out to eat and there was a drunk at the next table. My wife said, “That’s the guy I used to date before we were married. He started drinking the day we broke up and hasn’t stopped since.” “My God,” I said, “Who would’ve thought a guy could go on celebrating for that long!” And that’s when the fight started.

One dox, two dox, a pair of…

We’re taking a small digression to answer a question put by Deborah Mayo in Part I, pointing to this article on her site. Mayo should be on everybody’s list because she has good critiques of orthodox Bayesian statistics (which I don’t follow; we’re logical probabilists here), and because she has many named persons in statistics who comment on her articles. The material below is worth struggling through to see the kinds of arguments which exist over foundations.

Loosely quoting Mayo, a hypothesis (proposition) h is confirmed by x (another proposition) if $\Pr(h|xd) > \Pr(h|d)$ where d is any other proposition (this will make sense in the example to come). The proposition is disconfirmed if $\Pr(h|xd) < \Pr(h|d)$. If $\Pr(h|xd) = \Pr(h|d)$ then x is irrelevant to h. Lastly, h’ means “h is false,” “not h,” or the complement of h.

Mayo (I change her notation ever-so-slightly) says “a hypothesis h can be confirmed by x, while h’ disconfirmed by x, and yet $\Pr(h|xd) < P(h'|dx)$. In other words, we can have $\Pr(h|xd) > \Pr(h|d)$ and $\Pr(h'|xd) < \Pr(h'|d)$ and yet $\Pr(h|xd) < \Pr(h'|xd).$” In support of this contention, she gives an example due to Popper (again changing the notation) about dice throws. First let d = “a six-sided object which will be tossed and only one side can show and with sides labeled 1, 2, …”, i.e. the standard evidence we have about dice.

Consider the next toss with a homogeneous die.

h: 6 will turn up

h’: 6 will not turn up

x: an even number will turn up.

$\Pr(h|d) = 1/6, \Pr(h'|d) = 5/6, \Pr(x|d) = 1/2.$

The probability of h is raised by information x, while h’ is undermined by x. (It’s probability goes from 5/6 to 4/6.) If we identify probability with degree of confirmation, x confirms h and disconfirms h’ (i.e., $\Pr(h|xd) > \Pr(h|d) and \Pr(h'|xd) < \Pr(h'))$. Yet because $\Pr(h|xd) < \Pr(h'|xd)$, h is less well confirmed given x than is h’. (This happens because $\Pr(h|d)$ is sufficiently low.) So $\Pr(h|xd)$ cannot just be identified with the degree of confirmation that x affords h.

I don’t agree with Popper (as usual). Because $\Pr(h|d) = 1/6 < \Pr(h|xd) = 2/6$ and $\Pr(h'|d) = 5/6 > \Pr(h'|xd) = 4/6$. In other words, we started believing in h to the tune of 1/6, but after assuming (or being told) x, then h becomes twice as likely. And we start by believing h’ to the tune of 5/6, but after assuming x, this decreases to 4/6, or 20% lower. Yes, it is still true that h’ given x and d is more likely than h, but so what? We just said (in x) that we saw a 2 or 4 or 6: h’ is two of these and h is only one.

“Does x (in the presence of d) confirm h?” is a separate question from “Which (in the presence of x and d) is the more likely, h or h’?” The addition of x to d “confirms” h in the sense that h, given the new information, is now more likely.

No problems so far, n’est-ce pas? And Mayo recognizes this in quoting Carnap who noted “to confirm” is ambiguous. It can mean (these are my words) “increases the probability of” or it might mean “making it more likely than any other.” Well, whichever. Neither is a difficulty for probability, which flows perfectly along its course. The problems here are the ambiguities of language and labels, not with logic.

No real disagreements yet. Enter the so-called “paradox of irrelevant conjunctions.” Idea is if x “confirms” h, then x should also “confirm” hp, where p is some other proposition (hp reads “h & p”). There are limits: if p = h’, then hp is always false, no matter which x you pick. Ignore these. As before we can say p is irrelevant to x if $\Pr(x|hd) = \Pr(x|hpd)$. Continuing the example, let p = “My hat is a fedora”; then $\Pr(x|hd) = 1$ and so is $\Pr(x|hpd) = 1$.

The next step in the “paradox” is to note that if x “confirms” h in the first sense above, then $\Pr(x|hd)/\Pr(x|d) > 1$. In our example, this is 1/(1/3) which is indeed greater than 1. So we’re okay. Now we assume p is irrelevant, so $\Pr(x|hpd) = \Pr(x|hd)$. Divide this by $\Pr(x|d)$, then because $\Pr(x|hd)/\Pr(x|d) > 1$ so too does $\Pr(x|hpd)/\Pr(x|d) > 1$. Ho hum so far; just some manipulation of symbols.

Then it is claimed that x, since it “confirmed” h, must also “confirm” hp. Well, this is so. Then Mayo says (still with my notation):

(2) Entailment condition: If x confirms T, and T entails p, then x confirms p.

In particular, if x confirms (hp), then x confirms p.

(3) From (1) and (2), if x confirms h, then x confirms p for any irrelevant p consistent with H.

(Assume neither h nor p have probabilities 0 or 1).

It follows that if x confirms any h, then x confirms any p.

That’s the “paradox.” I don’t buy it. Like most (all?) paradoxes, there was a trip up in evidence along the way.

In our example, in (2), h does not entail p, but hp does entail p. What does entail mean? Well, $\Pr(p|hp) = 1$. The paradox says x confirms p just because hp entails p. Not a chance.

What’s happened here is the conditioning information, which is absolutely required to compute any probability, got lost in the words. We went from “x and hp” to “x and p”, which is a mistake. Here’s the proof.

If x confirms h, then $\Pr(h|xd) > \Pr(hp|d)$ (using the weaker sense of “confirmed”). Because p is irrelevant to h and x, then $\Pr(x|pd) = \Pr(x|d)$ and $\Pr(h|pd) = \Pr(h|d)$ and $\Pr(x|hpd) = \Pr(x|hd)$. But if p is confirmed by x, then it must be that $\Pr(p|xd) > \Pr(p|d)$. But $\Pr(p|d)$ doesn’t exist: it has no probability. Neither does $\Pr(p|xd)$ exist.1 What does wearing a hat or not have to do with dice? Nothing. You can’t get there from here. This is a consequence of p’s irrelevancy.

So p can’t be confirmed by x in the usual way. What if we add h to the mix, insisting $\Pr(p|xhd) > \Pr(p|hd)$? Not much, because again neither of those probabilities exist. You can’t have inequalities with non-existent quantities. And when we “tack on” irrelevant p, we’re always asking questions about $\Pr(hp|xd)$ or $\Pr(hp|d)$ and not $\Pr(p|xd)$ or $\Pr(p|d)$.

Result? No paradox, only some confusion over the words. Probability as logic remains unscathed. If anybody thinks the paradox remains, she should try her hand at stating the paradox purely using the probability symbols and not the mixture of words and symbols. The exercise will be instructive.

See the necessary comment by Jonathan D and my reply. Looks like JD found the mistake actually starts earlier in the problem.

————————————————————–

1Thinking every probability has a unique number is a mistake subjectivists make. They’ll say “Well I believe $\Pr(p|d) = 0.14779$” or whatever, but what they really have done is inserted information and withheld it from the formula, i.e. when they make statements like that they’re really saying $\Pr(p|qd) = 0.14779$ for some mysterious q that forms their belief. Given q that probability might even be right, but $\Pr(p|qd)$ just is not $\Pr(p|d)$. Still no paradox.

All she wrote.

It’s so simple that you’ll think I’m kidding. Worse, the way I’ll show it is such a radical departure from the manner in which you’re used to thinking about probability and statistics that at first it won’t make any sense. You’ll say, “Sure, what he says is probably true for the kinds of examples he shows, but I’m still going with what I’m used to.”

Warning number two: This is not an incremental change in demonstration, but a fundamental rethinking. So slow down.

All of statistics is this: the art of showing the probabilities of propositions p with respect to evidence q (also propositions). If you’re a fan of equations, and there’s no reason you should be, it’s this:

$\Pr( p | q)$

Once you’ve memorized that, you’ll have mastered most of what you need. The remainder is in this tidbit: the reason equations are not always to be loved is because statistics is showing the probabilities, not quantifying the probabilities. This is essential. Not all, perhaps most, probabilities are quantifiable. But we’re sure good at fooling ourselves into thinking they are. This is where the plague of over-certainty begins. Equations give the idea that numbers are happening here and this usually isn’t true.

Usually isn’t true. Consider q = “Most tweets about politics are querulous” and p = “Tweet 430838597776715778 (an actual tweet about politics, which you haven’t yet seen) is querulous”. The probability of p given this q and only this q is not quantifiable, except in the weak sense that “most” means “more than half but not all”, thus the probability is not a unique number, but the interval greater than 0.5 and less than 1. This “weak sense” interpretation of q is, if it was not obvious, part of q, the baggage that all propositions possess, baggage which includes our understanding of the words and grammar, the context, and so forth (as is true is any logical argument).

Now in assessing the probability of p using this q you can—this is such a simple rule but one which can’t be remembered—only use q. Of course, if you’re a maverick and want to tell people you used q but actually use some q’, well, you’re at least in good (or large) company. Example: q’ is where you plug the tweet into Twitter, learn about p, then judge p with respect to this different knowledge. That’s cheating if you claim to have relied on q alone. But if you’re intent was to issue the probability of p given q’, well and fine.

“But this isn’t statistics,” I hear you saying, “Where’s the data? Where’s the model?”

Ah. Data. My dears, q is data, or, more precisely, a datum. You’re not used to see data like this, but data it is. Data with which you’re familiar are propositions, too. “A man, then a woman, then two men, and they were 34, 42, 36 and 58 years old, and …” Any statement of observations is a proposition: usually complex compound difficult strung-out propositions, but propositions nonetheless. Prove this to yourself before continuing.

The model? It was there, standing and waving for attention. You didn’t notice it because its dropped it bangles and adjustable jewelry, i.e., it’s parameters. What is a model? Well, the mechanism to assign probabilities of propositions p given evidence q. Here that mechanism was a direct deduction from premises, our q, to the probability p was true. Given (only) q there were no other possibilities for the probability of p. Deductions are the cleanest and best kind of models. They are clean in the sense that no ad hoc evidence was added to q as it nearly always is—as it certainly is if parameters are present.

In typical practice, the evidence, or premises, q are extended by pasting “I believes”, such as “I believe p is represented by a normal distribution with these two parameters.” Beliefs are nice and sometimes they can even be true but ardency is rarely a guide; tradition might be helpful, but not in fields with a history of screwy ideas (sociology, psychology, education, etc., etc.). The “I believes” are the models.

Now no matter how or where q originated, the probability of p with respect to the (augmented) q is (barring calculation mistakes) the probability of p with respect to q. Meaning, p is correct, it’s the right answer conditional on assuming q is true.

It is a separate question whether q itself is true or likely true. And the only way to tell that is to conditional q on other evidence which is not p or q; that is, q becomes ‘p’ in a new calculation. If we had conclusive evidence to q’s truth, the we’d be able to deduce the model, as in the example above. If we knew q was false, we could still calculate the probability of p, but why bother?

Indeed, why bother. For that, read Part II.