Skip to content
October 28, 2008 | 13 Comments

Stand by!

My book is coming!

It’s almost there, so let me tell you how modern math publishing works these days.

The author of course writes the work, and we all do it in a typesetting language called Latex (some just use Tex). Google it. It’s not different in spirit from web pages, which are content surrounded by “markup code” that tells the words where to go.

We can extend the analogy. Web pages are written in a markup code which is further subject to cascading style sheet rules. The style sheet rules say how big headlines are, what background images to use, and so on. In Latex, these are called class files (or “.cls” files).

Point of all this is that we write the words and math and the publisher provides us with a class file that does all the typesetting for them. Builds the Table of Contents, numbers all the pages and formulas, lays the footnotes properly, and so on, all automatically. Latex is sweet and orders of magnitude better than other word processing programs, such as MS Word.

But, unless you are a really famous author (not me), you are even given the privilege of writing your own Index! So, in math/physics/etc. books written with Latex, there is nothing for the publisher to do. They don’t even—again, unless you are famous—provide any direct copy editing. They let the authors do that, too.

Since I’m doing everything, I decided, a la Tufte1, to bring out the book myself. Most of the copies I sell will be to the students who are forced—er, elect—to take my class. This way I can keep the price way down.

When I was a visiting professor at CMU, the textbook cost, if you bought the “Solutions Pack” and “Calculator Guide” (or whatever it was called), was well north of $100. 100 bucks! That’s nuts. Mine will be $24.95.

The rest is done automatically, including uploading the text and sending it to printers, everything is actually pretty quick. The real time is in getting the book out to the distribution channels. So while my book will be available first from the publisher’s site, it will take from 1 to 2 months to show up on etc.

What do you do if you can’t wait? You can check out this book. My attempt at inserting skepticism into a strange field.

1 Tufte does statistical graphics. If you haven’t seen his work, you should. His books, which are famous, are also non-traditional since there are, unfortunately, few statistical graphics courses at colleges. Still, he’s done OK with the books.

October 27, 2008 | 13 Comments

An early start to the “holiday” season

From the Wall Street Journal comes the headline: “Retailers Expect Gloomy Holiday.”

Problem is, I have read the entire article—these kinds of stories seem to appear earlier and earlier every year—but I could find no mention of what “holiday” they meant.

There are some clues. The writer, Jennifer Saranow, more than twice mentioned “consumers” and wondered how much money these creatures will spend on “the holiday.” I am not sure what a “consumer” is, but it doesn’t sound good, in fact it sounds scary, which makes me think this “holiday” can’t be a joyful one.

I’d therefore guess the holiday was Halloween, an event filled with frightening creatures, but the article specifically mentioned “consumer” spending in the months of November and December, so that’s out.

Well, like I said, these articles appear with regularity once the weather turns cooler up here in the Northern Hemisphere, so I think we’ll see more of them, some of which might give us more hints about this mysterious “holiday.”

October 26, 2008 | 8 Comments

Anybody see this one?

The book is The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives by Deirdre Nansen McCloskey and Steve Ziliak.

From the description at Amazon:

The Cult of Statistical Significance shows, field by field, how “statistical significance,” a technique that dominates many sciences, has been a huge mistake. The authors find that researchers in a broad spectrum of fields, from agronomy to zoology, employ “testing” that doesn’t test and “estimating” that doesn’t estimate. The facts will startle the outside reader: how could a group of brilliant scientists wander so far from scientific magnitudes? This study will encourage scientists who want to know how to get the statistical sciences back on track and fulfill their quantitative promise. The book shows for the first time how wide the disaster is, and how bad for science, and it traces the problem to its historical, sociological, and philosophical roots.

This is part of the theme I’ve long been pushing. McCloskey and Steve Ziliak are shocked, perplexed, and bewildered that classical statistics and p-values are still being used.

I’m not so shocked. They want people to abandon p-values and start using effect sizes. A fine first step, but one that doesn’t solve the whole problem.

I say we should drop p-values like Obama dropped Rev. Wright, eschew effect sizes like Joe Biden did reality, and return to observables. Let me, as they say, illustrate with a (condensed) example from by book.

Suppose there are two advertising campaigns A and B for widget sales. Since we don’t know how many sales will happen under A or B, we quantify our uncertainty in this number using a probability distribution. We’ll use a normal, since everybody else does, but the example works for any probability distribution.

Now, a normal distribution requires two unobservable numbers, called parameters, to be specified so that you can use it. The names of these two parameters are μ and σ. Both ad campaigns need their own, so we have μA and σA, and μB and σB. Current practice more or less ignore the σA and σB, so we will too.

Here is what “statistical significance” is all about.

Actual sales data under the two campaigns A and B is taken. A statistic is calculated: Call it T. It is a function of differences in the observed sales under both campaign. Never mind how it’s calculated. T is not unique, and for any problem dozens are available. With T in hand, the classical statistician makes this mathematical statement:


and then the infamous p-value is calculated, which is

   Probability(Another T > Our T given that μAB)

where the “Another T” is the statistic we would get if we were to repeat the entire experiment again. Do we repeat it again? No, so we are already in deep waters. But never mind.

If the p-value is less than the magic number of 0.05, then the results are said to be statistically significant.

Quick readers will have spotted the major difficulty. What does equating two unobservable parameters in order to calculate some weird probability have to do with whether the campaigns are different than one another?

The words are not much, which is why McCloskey and Ziliak call the dependence on p-values a cult.

They recommend, in its place, estimating the effect size, which is this:

   μA – μB.

Eh. It’s part way there, but it’s still a statement about unobservable parameters (and it still ignores the other unobservable parameters σA and σB).

What people really want to know is this:

   Probability(Sales A > Sales B given old data).

Or they’d like to estimate the actual sales under A or B. There are new ways that can calculate these actual probabilities of interest. However, you won’t learn these methods in any but the most esoteric statistics class.

And that is what should change.

Because, I am here to tell you, you can have a p-value as small as you like, you can have an effect size as big as you like, but it can still be the case that

   Probability(Sales A > Sales B given old data) ~ 50%!

which is the same as just guessing. Yes, the actual, observable numbers, the real-life stuff, the physical, measurable, tangible decisionable reality can be no different at all. At least, we might not be able to tell they are any different.

And that’s the point. The old ways of doing things were set up to make things too certain.

I wouldn’t go so far as to say reliance on the old ways was cultish. Most people just don’t know of the alternatives.

October 24, 2008 | 20 Comments

Health care crisis!?

Take a look at his picture:

Life expectancy rates through time

This is from an article by “The Numbers Guy” Carl Bialik at the Wall Street Journal. The story is about how life expectancy calculators are not terribly accurate. This really isn’t much of a surprise, but the picture should be.

This is because both presidential candidates, and of course many other people, nervously claim that there is a “Health Care Crisis! We have to do something!”

Yes, it’s so bad that the people are living longer and longer and longer… This picture says that whatever the crisis is, it clearly doesn’t have to do with that part of health that keeps people alive. I would argue that that part is the most important; apparently, others disagree.

This is another example of the phenomenon that the better things get, the more people complain. Or maybe people don’t complain more, they complain at the same rate, but because things are better, the complaints are about matters and points that are increasingly trivial.

Hasn’t somebody given a name for this dynamic?