William M. Briggs

Statistician to the Stars!

Page 154 of 419

The Great Roberts Ruse—Or Ruin?

The best take on that fateful Thursday came from Ed Morrissey, who quipped:

Now here is the truth: either Roberts is canny or he is a coward. Either he is a sly, patriotic Machiavellian or a frightened, loyalist turncoat. There is no third possibility.

Forget everything you read from any leftist chattering about “compromise,” “integrity,” “bipartisanship,” and “foregoing ideology.” These are code words which signal unconditional surrender to, and not compromise with, the progressive view.

Consider: just hours before Roberts revealed his enigma, we had Democratic Congresspeoples and lefty pundits thrusting towards microphones with dire warnings, “We better not see a 5-4 ruling driven by politics!” Uncoded, this meant, “They had better not vote against us, or else” and nothing more. For after witnessing the very 5-4 ruling they forecast would tear the Union asunder, they immediately suffered complete amnesia and were full of praise for Roberts’s Ruse.

If a ruse it was. The real question is into whose back did the Chief Justice slip his dagger? Charles Krauthammer, no slouch at fingering phonies, is certain sure Progressivism is now one of the walking dead. He called the Ruse “one of the great constitutional finesses of all time.” Krauthammer figured Roberts figured that—who exactly?—“we” did not want yet another “5 to 4 decision split along ideological lines that might be perceived as partisan and political.” The decision to re-write Obamacare as a tax to save lefties dentist bills that would have come from gnashing their teeth over an overturn, while simultaneously squashing the attempt to expand the Commerce Clause, “draws the line against the inexorable decades-old expansion of congressional power.”

George Will agrees. “Roberts got the court to embrace emphatic language rejecting the Commerce Clause rationale for penalizing the inactivity of not buying insurance”. Will imagines Roberts behind the Court’s arras watching New York Times’ reporters jigging on the Court steps while rubbing his hands together and saying quietly, “Heh, heh, heh.” If only Roberts had a moustache!

“This victory”—yes, victory—“will help revive a venerable tradition of America’s political culture, that of viewing congressional actions with a skeptical constitutional squint, searching for congruence with the Constitution’s architecture of enumerated powers.”

In favor of this favorable interpretation we have Roberts himself. He did win the right to write the majority opinion, because Roberts knew (we all did) that four of the Justices would have voted to uphold any law short of a Constitutional amendment declaring Obama President for life. He knew that three of the justices saw Obamacare for what it was and were poised to strike it down. He, like progressives chanting “compromise”, apparently forgot that not one Republican voted for the law.

But never mind that. The words which convinced Krauthammer and others of the Ruse were words like these:

This case concerns two powers…which must be read carefully to avoid creating a general federal authority akin to the police power…Every day individuals do not do an infinite number of things. Allowing Congress to justify federal regulation by pointing to the effect of inaction on commerce would bring countless decisions an individual could potentially make within the scope of federal regulation, and…empower Congress to make those decisions for him.

Under the Government’s logic, that authorizes Congress to use commerce power to compel citizens to act as Government would have them act…the Government’s logic would justify a mandatory purchase to solve almost any problem…The Framers gave Congress the power to regulate commerce, not to compel it…The Commerce Clause is not a general license to regulate an individual from cradle to grave…

These are strong, manly words, and sworn to not just by Roberts but by Sotomayor, Breyer, and Kagan, lefties all. (Ginsberg signed too, but then petulantly disavowed herself.) It surely appears as if the PBS crowd had just received a dramatic dressing down. Krauthammer and others thus believe that progressives, happy with the immediate benefit of taking over one-sixth (or whatever) of the economy, are also now chastened and will not attempt such boldness in the future.

John Yoo says bollocks to that. In today’s Wall Street Journal, he says that progressives had their fingers in their ears saying “Yeah, yeah, yeah” while receiving their lecture. No lessons learned here. Yoo likens Roberts to Republican Chief Justice Charles Evans Hughes who whimpered whenever then president FDR threw a tantrum. Against his conscience, but afeared of being disparaged in the press and concerned about the legacy of the Court, Hughes hewed to the New Deal line and legitimized that first great increase in government power. And now Roberts, sharing Hughes’s timid temperament, has done the same and “sacrificed fidelity to the Constitution’s original meaning in order to repel an attack on the court.”

And it’s even worse:

Given the advancing age of several of the justices, an Obama second term may see the appointment of up to three new Supreme Court members. A new, solidified liberal majority will easily discard Sebelius’s limits on the Commerce Clause and expand the taxing power even further.

Let us pray that Yoo, smart as he is, is wrong and that Krauthammer, just as sharp as Yoo but less gloomy, is right.

Why Do Statisticians Answer Silly Questions That No One Ever Asks?

Julian Champkin, editor of Significance magazine somehow came across the percipient insights of yours truly and asked me to write l’article controversé. Which I did. And with gusto. Champkin, a perspicacious individual with the insight and experience of one long accustomed to the peculiarities and peccadillos of publishing, added the word “silly” to the title. I find myself not objecting.

The article is here. I beg forgiveness that reading the piece requires a subscription (yours or an institution’s).

Statisticians are, in actual fact, as an Englishman would put it, Significance being an organ of the Royal Statistical Society, in the bad habit of answering questions in which nobody has the slightest interest. More rottenness is put forth in the name of Science because of the twisted cogitations of statisticians than because of any other cause.

The problem is that the questions statisticians answer are not the questions civilians put to us. But the poor trusting saps who come to us, on seeing the diplomas on our walls and upon viewing the perplexing mathematics in which we couch our responses, go away intimidated and convinced that what we have told them are the answers to their queries. They can’t, then, be blamed for writing results as if they had received the One Final Word.

There are many reasons why we lead our flocks astray, but the main culprit is we instill a sort of scientific cockiness. A civilian appears and asks, “How much more likely is drug B than drug A at curing this disease?” We do not answer this. We instead tell him which drug, in the opinion of our theory, is “better”, imputing a certainty to our pronouncement which is unwarranted.

We’re tired of these examples, but they are paradigmatic. It is through the wiles of statistics that sociologists can “conclude” that those who either watch a 4th of July parade or who see, oh so briefly, a miniature picture of the American flag can turn one into a Republican.

The old ways of statistics allowed over-certainty in the face of small samples sizes. The new ways of doing statistics (now not always called statistics, but perhaps artificial intelligence, data mining, and machine learning) allows over-the-top surety in the face of large sample sizes, a.k.a. Big Data. The difference being that the later methods are automated, while the former are hands-on. False beliefs can now be generated at a much faster rate, so some progress is being made.

If you followed last week’s “Let’s try this again” on temperatures, you’ll have an idea what I mean about over-certainty (incidentally, due to time constraints, I will not be able to answer questions posted there until tomorrow). Also click the Start Here tab at the top of this page and look under the various articles under Statistics.

Update Posting date change to allow more comments.

Teaching Journal: Day 11—Rewrite, Red Wine, Hat Clips

We started by learning that probability is hard and not always quantifiable. For instance, I imagine many of you would have judged it more likely than not that the Supreme Court would have invalidated at least the “mandate” portion of Obamacare. Clearly, many of us had the wrong premises.

Just as many of us have new incorrect premises about what the Roberts’ ruling means. As to that, follow my Twitter stream from yesterday (see the panel to the right), where I sequentially pull out what I think are relevant quotes from the ruling. However, this is a topic for another time.

I’ve taught this class at Cornell for several years. The in-class version is, naturally, quite different than the on-screen presentation, so don’t be misled by what you’ve seen here the past fortnight. After many (but not infinite) repetitions teaching, I have learned three things.

Lesson One: try not to do too much. The book/class notes was originally designed for a semester-length course for undergraduates. It works for that, but not as well for a class meant to teach fundamentals of analysis. For instance, I could leave out all the stuff on counting and how to build a binomial distribution. Basic probability rules might stay, because they’re easy and useful. But to even learn to compute the chance of winning the (say) Mega Millions requires learning some basic combinatorics. Can’t have it all, though.

So I think it better to stop after Bayes’s rule and present everything else as a “given”, like I do with the ubiquitous, and ubiquitously inappropriate, normal distribution. This would allow more time to discuss ideas instead of mechanics. More time to cover how things go wrong, and why there is so much over-certainty produced using classical statistical methods.

This means a re-write of the book/notes is in order. Which I have been doing slowly, but now must finish to get it ready in time for next year.

Lesson Two: red wine does not go with white linen.

Lesson Three: there are churches left that still have hat clips in the backs of pews. I had tremendous fun snapping these during mass at Divine Child when I was a kid, usually during homilies. Just for the sake of the good old days, I did so last weekend. But only once. It might have been twice.

Teaching Journal: Day 9—Hypothesis Testing: Part II

A review. We have sales data from two campaigns, A and B, data in which we choose (as a premise) to quantify our uncertainty with normal distributions. We assume the “null” hypothesis that the parameters of these two distributions are equal: mA = mB or sA = sB. This says that our uncertainty in sales at A or B is identical. It does not say that A and B are “the same” or “there is no difference” in A and B.

All that is step one of hypothesis testing. Now step two: choose a “test statistic.” This is any function of the data you like. The most popular, in this situation, is some form of the “t-statistic” (there is more than one form). Call our statistic “t”. But you are free to choose one of many, or even make up your own. There is nothing in hypothesis testing theory which requires picking this and not that statistic.

Incidentally, there are practical (and legal) implications over this free choice of test statistic. See this old post for how different test statistics for the same problem were compared in the Wall Street Journal.

Finally, calculate this object:

     (4) Pr( |T| > |t|   | “null” true, normals, data, statistic)

This is the p-value. In words, it is the probability of seeing a test statistic (T) larger (in absolute value) than the test statistic we actually saw (t) in infinite repetitions of the “experiment” that gave rise to our data, given the “null” hypothesis is exactly true, that normal distributions are the right choice, the actual data we saw, and the statistic we used.

There is no way to make this definition pithy—without sacrificing accuracy. Which most do: sacrifice accuracy, that is. Although it does a reasonable job, Wikipedia, for instance, says, “In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.” This leaves out the crucial infinite repetitions, and the premises of the distribution and test statistic we used. In frequentist definitions of probability, it is always infinity or bust. Probabilities just do not exist for unique or finite events (of course, people always assume that these probabilities exist; but that is because they are natural Bayesians).

Now there has developed a traditional that whenever (4) is less than the magic number, by an act of sheer will, you announce “I reject the null hypothesis,” which is logically equivalent to saying, “I claim that mA does not equal mB” (let’s, as nearly everybody does, just ignore sA and sB).

The magic, never-to-be-questioned number is 0.05, chosen, apparently, by God Himself. If (4) is less than 0.05 you are allowed to claim “statistical significance.” This term means only that (4) is less than 0.05—and nothing else.

There is no theory which claims that 0.05 is best, or that links the size of (4) with the rejection of the “null.” Before we get to that, understand that if (4) is larger than the magic number you must announce, “I fail to reject the ‘null'” but you must never say, “I accept the ‘null.'” This contortion is derived from R.A. Fisher’s love of Karl Popper’s “falsifiability” ideas, ideas which regular readers will recall no longer have any champions among philosophers.

This “failing to reject” is just as much an act of will as “rejecting the ‘null'” was when (4) was less than 0.05. Consider: if I say, as I certainly may say, “mA does not equal mB” I am adding a premise to my list, but this is just as much an act of my will as adding the normal etc. was. (4) is not evidence that “mA does not equal mB“. That is, given (4) the probability “mA does not equal mB” cannot be computed. In fact, it is forbidden (in frequentist theory) to even attempt to calculate this probability. Let’s be clear. We are not allowed to even write

     (5) Pr ( “mA does not equal mB” | (4) ) = verboten!

This logically implies, and it is true, that the size of (4) has no relation whatsoever to the proposition “mA does not equal mB.” (See this paper for formal proofs of this.) This is what makes it an act of will that we either declare “mA does not equal mB” or “mA equals mB.”

But, really, why would we want to compute (5) anyway? The customer really wants to know

     (6) Pr ( B continuing better than A | data ).

There is nothing in there about unobservable parameters or test statistics, and why should there be? We learn to answer (6) later.

But before we go, let me remind you that we have only begun criticisms of p-values and hypothesis testing. There are lists upon lists of objections. Before you defend p-values, please read through this list of quotations.

Teaching Journal: Day 8—Hypothesis Testing: Part I

Hypothesis testing nicely encapsulates all that is wrong with frequentist statistics. It is a procedure which hides the most controversial assumption/premise. It operates under a “null” belief which nobody believes. It is highly ad hoc and blatantly subjective. It incorporates magic p-values. And it ends all with a pure act of will.

Here is how it works. Imagine (no need, actually: go to the book page and download the advertising.csv file and follow along; to learn to use R, read the book, also free) you have run two advertising campaigns A and B and are interested in weekly sales under these two campaigns. I rely on you to extend this example to other areas. I mean, this one is simple and completely general. Do not fixate on the idea of “advertising.” This explanation works equally well on any comparison.

I want to make the decision which campaign, A or B, to use country-wide and I want to base this decision on 20 weeks of data where I ran both campaigns and collected sales (why 20? it could have been any number, even 1; although frequentist hypothesis testing won’t work with just one observation each; another rank failure or the theory).

Now I could make the rule that whichever campaign had higher median sales is the better. This was B. I could have also made the rule that whichever campaign had higher third-quartile sales is better. This was A. Which is “better” is not a statistical question. It is up to you and the relates to the decisions you will make. So I could also rule that whichever had the higher mean sales was better. This was B. I could have made reference, too, to a fixed number of sales, say 500. Whichever had a greater percentage of sales greater than 500 was “better.” Or whatever else made sense to the bottom line.

Anyway, point is, if all I did was to look at the old data and make direct decisions, I do not need probability or statistics (frequentist or Bayesian). I could just look at the old data and make whatever decision I like.

But that action comes with the implicit premise that “Whatever happened in the past will certainly happen in likewise characteristic in the future.” If do not want to make this premise, and usually I don’t, I then need to invoke probability and ask something like, “Given the data I observed, what is the probability that B will continue to be better than A” if by “better” I mean higher median or mean sales. Or “A will continue to be better” if by “better” I meant higher third-quartile sales.” Or whatever other question makes sense to me about the observable data.

Hypothesis testing (nearly) always by assuming that we can quantify our uncertainty in the outcome (here, sales) with normal distributions. When I say “(nearly) always” I mean statistics as she is actually practiced. This “normality” is a mighty big assumption. It is usually false on the premise that, like here, sales cannot be less than 0. Often sociologists and the like ask questions which force answers from “1 to 5″ (which they magnificently call a “Likert scale”). Two (or more) groups will answer a question, and the uncertainty in the mean of each group is assumed to follow a normal distribution. This is usually wildly false, given that, as we have just said, the numbers cannot be smaller than 1 nor larger than 5.

Normal distributions, then, are often wrong, and often wrong by quite a lot. (And if you don’t yet believe this, I’ll prove it with real data later.) This says that hypothesis testing starts badly. But ignore this badness, or the chance of it, like (nearly) everybody else does and let’s push on.

If it is accepted that our uncertainty in A is quantified by a normal distribution with parameters mA and sA, and similarly B with mB and mB, then the “null” hypothesis is that mA = mB and (usually, but quietly) sA = sB.

Stare at this and be sure to understand what it implies. It DOES NOT say that “A and B are the same.” It says our uncertainty in A and B is the same. This is quite, quite different. Obviously—as in obviously—A and B are not the same. If they were the same we could not tell them apart. This is not, as you might think, a minor objection. Far from it.

Suppose it were true that as the “null” says mA = mB (exactly, precisely equal). Now if sA were not equal to sB, then our uncertainty in A and B can be very different. It could be, depending on the exact values of sA and sB, that the probability of higher sales under A was larger than B, or the opposite could also be true. Stop and understand this.

Just saying something about the central parameters m does not tell us enough, not nearly enough. We need to know what is going on with all four parameters. This is why if we assume that mA = mB we must also assume that sA = sB.

The kicker is that we can never know whether mA = mB or sA = sB; no, not even for a Bayesian. These are unobservable, metaphysical parameters. This means they are unobservable. As in “cannot be seen.” So what do we do? Stick around and discover.

« Older posts Newer posts »

© 2014 William M. Briggs

Theme by Anders NorenUp ↑