# Teaching Journal: Day 8—Hypothesis Testing: Part I

Hypothesis testing nicely encapsulates all that is wrong with frequentist statistics. It is a procedure which hides the most controversial assumption/premise. It operates under a “null” belief which nobody believes. It is highly *ad hoc* and blatantly subjective. It incorporates magic p-values. And it ends all with a pure act of will.

Here is how it works. Imagine (no need, actually: go to the book page and download the advertising.csv file and follow along; to learn to use R, read the book, also free) you have run two advertising campaigns A and B and are interested in weekly sales under these two campaigns. I rely on you to extend this example to other areas. I mean, this one is simple and completely general. Do not fixate on the idea of “advertising.” This explanation works equally well on any comparison.

I want to make the decision which campaign, A or B, to use country-wide and I want to base this decision on 20 weeks of data where I ran both campaigns and collected sales (why 20? it could have been any number, even 1; although frequentist hypothesis testing won’t work with just one observation each; another rank failure or the theory).

Now I could make the rule that whichever campaign had higher median sales is the better. This was B. I could have also made the rule that whichever campaign had higher third-quartile sales is better. This was A. Which is “better” is *not* a statistical question. It is up to you and the relates to the decisions you will make. So I could also rule that whichever had the higher mean sales was better. This was B. I could have made reference, too, to a fixed number of sales, say 500. Whichever had a greater percentage of sales greater than 500 was “better.” Or whatever else made sense to the bottom line.

Anyway, point is, if all I did was to look at the old data and make direct decisions, I do not need probability or statistics (frequentist *or* Bayesian). I could just *look at the old data* and make whatever decision I like.

But that action comes with the implicit premise that “Whatever happened in the past will certainly happen in likewise characteristic in the future.” If do not want to make this premise, and usually I don’t, I then need to invoke probability and ask something like, “Given the data I observed, what is the probability that B will continue to be better than A” if by “better” I mean higher median or mean sales. Or “A will continue to be better” if by “better” I meant higher third-quartile sales.” Or whatever other question makes sense to me *about the observable data*.

Hypothesis testing (nearly) always by assuming that we can quantify our uncertainty in the outcome (here, sales) with normal distributions. When I say “(nearly) always” I mean statistics as she is actually practiced. This “normality” is a mighty big assumption. It is usually false on the premise that, like here, sales cannot be less than 0. Often sociologists and the like ask questions which force answers from “1 to 5” (which they magnificently call a “Likert scale”). Two (or more) groups will answer a question, and the uncertainty in the mean of each group is assumed to follow a normal distribution. This is usually wildly false, given that, as we have just said, the numbers cannot be smaller than 1 nor larger than 5.

Normal distributions, then, are often wrong, and often wrong by quite a lot. (And if you don’t yet believe this, I’ll prove it with real data later.) This says that hypothesis testing starts badly. But ignore this badness, or the chance of it, like (nearly) everybody else does and let’s push on.

If it is accepted that our uncertainty in A is quantified by a normal distribution with parameters m_{A} and s_{A}, and similarly B with m_{B} and m_{B}, then the “null” hypothesis is that m_{A} = m_{B} and (usually, but quietly) s_{A} = s_{B}.

Stare at this and be sure to understand what it implies. It *DOES NOT* say that “A and B are the same.” It says our *uncertainty* in A and B is the same. This is quite, quite different. Obviously—as in obviously—A and B are not the same. If they were the same we could not tell them apart. This is not, as you might think, a minor objection. Far from it.

Suppose it were true that as the “null” says m_{A} = m_{B} (exactly, precisely equal). Now if s_{A} were not equal to s_{B}, then our uncertainty in A and B can be very different. It could be, depending on the exact values of s_{A} and s_{B}, that the probability of higher sales under A was larger than B, or the opposite could also be true. Stop and understand this.

Just saying something about the central parameters m does not tell us enough, not nearly enough. We need to know what is going on with *all four* parameters. This is why if we assume that m_{A} = m_{B} we must also assume that s_{A} = s_{B}.

The kicker is that we can never know whether m_{A} = m_{B} or s_{A} = s_{B}; no, not even for a Bayesian. These are unobservable, metaphysical parameters. This means they are unobservable. As in “cannot be seen.” So what do we do? Stick around and discover.

The blog bounced around a lot this morning. Having a bad code day?

The road is sometimes not smooth.