Another attempt to show abstinence is bad: “Losing virginity early or late tied to health risks”

That headline is from an MSNBC story of the same name. Taken at face value, it argues that, for maximum healthiness, we should all aim to lose our virginity at the same age.

No, just kidding. What the article actually says is that “People who start having sex at a younger or older than average age appear to be at greater risk of developing sexual health problems later in life”.

Which actually does sound like they think we should all start having sex simultaneously. But, still no. Actually, their intent is to “cast some doubts on the benefits of abstinence-only sexual education that has been introduced in U.S. public schools.”

Huh?

How can you get from saying the people who start having sex at a younger age or those who start at an older age are “at higher risk” to “abstinence is bad”?

Continue reading →

Beer: alcohol, calories, and carbs

Boxplots of calories by beer style

Which beer style has the most calories? In general: porters. The least: lager, the style of beer with which you are probably most familiar. Budweiser, Miller, Coors, the majority of all mass-market beers are all brewed in the lager style.

These box-plots use data from the web site RealBeer.com. The editors of that site keep a running list of brewers, beers and the alcohol, calories, and carbohydrate content of, at this writing, 229 different beers from 72 different breweries. There are, naturally, many more beers and breweries than this around the world; this data reflects the beers of most interest to readers and users of RealBeer.com. The classification into styles of beer is my attempt, and any mistakes in classification are my own. You should visit RealBeer.com to learn more about beer styles. The RealBeer.com data set is most complete with alcohol values, but there is far less information about calories and carbs, owing to the greater difficulty of obtaining or measuring those values.

Here’s a quick lesson on how to read box plots: the dark, center line is the median, the point at which 50% of the values are above, 50% below. The next two horizontal lines are the quartiles: the top one is the 3rd quartile, which means 25% of the values are above it; the next is the 1st quartile, which means 25% of the values are below it. The top and bottom lines are the 5% and 95%-tiles, with the obvious interpretation. Points beyond these are more extreme values. Box-plots are intended to give you an idea of the spread, variability, and distribution of data.

But the main lesson is: if you are counting calories (and don’t insist on taste), lager beers are your choice. Lager and ales also have the widest ranges of calories, but this may reflect the fact that most of the data are from these two main groups. 44% of the beers listed are ales, 38% lagers, 4% porters, 8% stouts, and 6% wheats. There was also one barley wine, a style noted for its high alcohol content, which I classified into an ale since it is difficult to do statistics with just one data point.

How about alcohol content?

Continue reading →

Never use bar charts! Case study #1

Social secuity disability applications

This graphic comes from the New York Times article “Social Security Disability Cases Last Longer as Backlog Rises.” It obviously intends to show how applications have increased since 1998.

This is a terrible plot.

The reason is not that you should never, with only rare exceptions, use a bar chart. They are simple to construct, but there are nearly always better alternatives.

But the evil of bar charts is well known. The reason this plot is bad has to do with the number 0. Notice that the chart starts at 0, even though it isn’t until 2 million or so that we meet our first number. The only reason that the chart starts with 0 is that it is true that you can’t have less than 0 applications. This is not a good reason. They should have started with a higher number.

Don’t think it makes a difference? Then take a look at this re-drawing:

Continue reading →

How to Exaggerate Your Results: Case study #2

That’s a fairly typical ad, which is now running on TV, and which is also on Glad’s web site. Looks like a clear majority would rather buy Glad’s fine trash bag than some other, lesser, bag. Right?

Not exactly.

So what is the probability that a “consumer” would prefer a Glad bag? You’ll be forgiven if you said 70%. That is exactly what the advertiser wants you to think. But it is wrong, wrong, wrong. Why? Let’s parse the ad used and see how you can learn to cheat from it.

The first notable comment is “over the other leading brand.” This heavily implies, but of course does not absolutely prove, that Glad commissioned a market research firm to survey “consumers” about what trash bag they preferred. The best way to do this is to ask people, “What trash bag do you prefer?”

But evidently, this is not what happened here. Here, the “consumer” was given a dichotomy, “Would you rather have Glad? Or this other particular brand?” Here, we have no idea what that other brand was, nor what was meant by “leading brand.” Do you suppose it’s possible that the advertiser gave in to temptation and chose, for his comparison bag, a truly crappy one? One that, in his opinion, is obviously inferior to Glad (but maybe cheaper)? It certainly is possible.

So we already suspect that the 70% guess is off. But we’re not finished yet.

Continue reading →