Skip to content

Author: Briggs

September 18, 2008 | 8 Comments

How to cheat with statistics: CNN ad

In today’s New York Post (p. 31) runs a full page add by CNN. The ad itself looks like a PowerPoint presentation, that is, a dull layout driven by bullet points. Here are the first three:

  1. #1 Most Watched Cable News Network Across Both DNC and RNC Conventions for P25-54, P18-49, & P18-34
  2. #1 Most Watched Cable News Network at 10PM Across Both Conventions
  3. #1 News and Information Site During Both Conventions-CNN.com

In the first bullet, what is that odd “P25-54”? What in the world could that be? The ad, even in the fine print, nowhere says. But we can guess is means “people aged 25 to 54”. OK, so people aged 25 to 54—a good slice of people—according to “sources”, liked CNN. Sounds impressive, but there are two misleading elements.

The first is that it was the most watched “cable news network”, which means that the “non-cable news networks” might have been watched by more people. We can guess that this is true else the ad would have touted that CNN was the most watched network period.

The second problem is the bizarre way they sliced the age groups. The 18-49 groups entirely contains the 18-34 group, does it not? So why mention the smaller-sized group? Does it mean that the 35-49 years olds did not prefer CNN? But the 35-39 year olds are certainly among the 25-54 years olds, which was the first group mentioned.

There is no making sense of any of this except by supposing that CNN scrounged through the data to find any hint of subgroups that supported their “#1” contention. Experience shows that you can do this for any statistical analysis, which is why so many rightly suspect whatever statisticians have to say.

The second bullet point is just as screwy. How many different networks did they compare anyway? The barely readable small print says “CNN, FNC, and MSNBC.” So they only had two competitors, the last of which, MSNBC, has always struggled for viewership. Coming in number 1 in a few categories with only one real competitors is not a laudable achievement.

But it also means that CNN must have lost, probably to Fox, in the 7p-8p slot, the 8p-9p slot, the 9p-10p slot, and the 11p-12a slot, where are 4 out of the 5 slots the small print says the “sources” checked. Losing 80% of the time hardly makes you number 1.

The third bullet is more tepid. The “source” says “Information Site” means “Current events and Global News Category.” We have no idea how many other sites were compared against CNN, nor how many other categories—say Analysis, Opinion, Politics, and so on—were checked.

Still, for cheating, the third bullet is best, because it’s a rare person will be pause much over the claim, nor will most browse the small print.

In any case, CNN should just have presented their fourth bullet, which was

  • #1 Most Trusted & Credible Name in News

This bullet is so vague it can mean anything. It’s crafted so that readers can take any meaning they like from it. Most people will be left with a dull sense of the importance of CNN.

This ad, while fairly misleading, only earns a 4 (out of 10) on the Briggs Statistical Deception Scale.

For the teachers out there, these ads often make good homework problems for students. Chopping up an ad into component parts and reading it critically is always great fun for the students. Especially when you find deceitful ads.

September 16, 2008 | 20 Comments

The limits of statistics: black swans and randomness

The author of Fooled by Randomness and The Black Swan, Nassim Nicholas Taleb, has penned the essay THE FOURTH QUADRANT: A MAP OF THE LIMITS OF STATISTICS over at Edge.org (which I discovered via the indispensable Arts & Letters Daily).

Taleb’s central thesis and mine are nearly the same: “Statistics can fool you.” Or “People underestimate the probability of extreme events”, which is another way of saying that people are too sure of themselves. He blames the current crisis on Wall Street on people misusing and misunderstanding probability and statistics:

This masquerade does not seem to come from statisticians—but from the commoditized, “me-too” users of the products. Professional statisticians can be remarkably introspective and self-critical. Recently, the American Statistical Association had a special panel session on the “black swan” concept at the annual Joint Statistical Meeting in Denver last August. They insistently made a distinction between the “statisticians” (those who deal with the subject itself and design the tools and methods) and those in other fields who pick up statistical tools from textbooks without really understanding them. For them it is a problem with statistical education and half-baked expertise. Alas, this category of blind users includes regulators and risk managers, whom I accuse of creating more risk than they reduce.

I wouldn’t go so far as Taleb: the masquerade also often comes from classical statistics and statisticians, too. Much of the statistical methods that are taught to non-statisticians had their origin in the early and middle part of the 20th century before there was access to computers. In those days, it was rational to make gross approximations, assume uncertainty could always be quantified by normal distributions, guess that everything was linear. These simplifications allowed people to solve problems by hand. And, really, there was no other way to get an answer without them.

But everything is now different. The math is new, our understanding of what probability is has evolved, and everybody knows what computers can do. So, naturally, what we teach has changed to keep pace, right?

Not even close to right. Except for the modest introduction of computers to read in canned data sets, classes haven’t change one bit. The old gross approximations still hold absolute sway. The programs on those computers are nothing more than implementations of the old routines that people did by hand—many professors still require their students to compute statistics by hand! Just to make sure the results match what the computer spits out.

It’s rare to find an ex-student of a statistics course who didn’t hate it (“You’re a statican [sic]? I always hated statistics!” they say brightly). But it’s just as rare to find a person who had, in the distant past, one of two courses who doesn’t fancy himself an expert (I can’t even list the number of medical journal editors who have told me my new methods were wrong). People get the idea that if they can figure out how to run the software, then they know all they need to.

Taleb makes the point that these users of packages necessarily take a too limited view of uncertainty. They seek out data that confirms their beliefs (this obviously is not confined to probability problems), fit standard distributions to them, and make pronouncements that dramatically underestimate the probability of rare events.

Many times rare events cause little trouble (the probability that you walk on a particular blade of grass is very low, but when that happens, nothing happens), but sometimes they wreak havoc of the kind happening now with Lehman Brothers, AIG, WAMU, and on and on. Here, Taleb starts to mix up estimating probabilities (the “inverse problem”) with risk in his “Four Quadrants” metaphor. The two areas are separate: estimating the probability of an event is independent of what will happen if that event obtains. There are ways to marry the two areas in what is called Decision Analysis.

That is a minor criticism, though. I appreciate Taleb’s empirical attempt at creating a list of easy to, hard to, and difficult to estimate events along with their monetary consequences should the events happen (I have been trying to build such a list myself). Easy to estimate/small consequence events (to Taleb) are simple bets, medical decisions, and so on. Hard to estimate/medium consequence events are climatological upsets, insurance, and economics. Difficult to estimate/extreme consequence events are societal upsets due to pandemics, leveraged portfolios, and other complex financial instruments. Taleb’s bias towards market events is obvious (he used to be a trader).

A difficulty with Taleb is that he writes poorly. His ideas are jumbled together, and it often appears that he was in such a hurry to gets the words on the page that he left half of them in his head. This is true for his books, too. His ideas are worth reading, however, though you have to put in some effort to understand him.

I don’t agree with some of his notions. He is overly swayed by “fractal power laws”. My experience is that people often see power laws where they are not. Power laws, and other fractal math, give appealing, pretty pictures that are too psychologically persuasive. That is a minor quibble. My major problem is philosophical.

Taleb often states that “black swans”, i.e. extremely rare events of great consequence, are impossible to predict. Then he faults people, like Ben Bernanke, for failing to predict them. Well, you can’t predict what is impossible to predict, no? Taleb must understand this, because he often comes back to the theme that people underestimate uncertainty of complex events. Knowing this, people should “expect the unexpected”, a phrase which is not meant glibly, but is a warning to “increase the area in the tails” of the probability distributions that are used to quantify uncertainty in events.

He claims to have invented ways of doing this using his fractal magic. Well, maybe he has. At the least, he’ll surely get rich by charging good money to learn how his system works.

September 15, 2008 | 43 Comments

How many geniuses are there?

Here is a question I often give on exams:

What is the probability that the next child to be born will be a genius? Give me a number and fully explain your answer.

There is not, of course, a single correct answer. What I just said is an important point, so let’s not skip lightly over it: there is no correct answer; at least, there is no way anybody can know the correct answer.

That nobody can know with certainty answers to questions of this type is under appreciated. I want people to learn this because we are, as I often say, too sure of ourselves.

What I want to see in the answer is acknowledgment of the ambiguities. First, what is a genius? Surely that word is overused to a remarkable extent. For example, this list says, with a straight face, authoress JK Rowling and movie maker Stephen Spielberg are geniuses. I often have the idea that to not call some eminence a genius is nowadays taken as a slight. However, a moments’ thought suffices to show that people exaggerate—if you are willing to take that moment.

The next step is to think of some geniuses for the sake of comparison. It’s best to think of dead ones so that you are not overly influenced by current events. After all, only history can truly judge genius. If you agree with even part of this, you will have made the next most important step: admitting that you can be biased.

How about some dead geniuses? Einstein pops into nearly everybody’s head first. Then, for me, Mozart, Beethoven, Shakespeare, Newton, and the guy who invented beer. No, I’m not joking about that last name. The point is my historical knowledge is modest, and most of the names I pick are men from the last 500 years, and most are from Western culture. Humanity is older than 500, of course, and there are other cultures besides our own, so I know that my knowledge of who is a genius is limited. That’s what got me to thinking about the brilliant soul who invented beer. He did so, probably in Sumer, before people wrote down incredible deeds of this sort.

This line of thought eventually leads to other cultures (Confucius, maybe Lao Tzu) and other times where writing was non-existent (was there just one person responsible for the wheel and agriculture?). There must be a lot of geniuses I don’t know, and some that nobody can ever know.

Next step is to count, and to acknowledge that exact counting is an impossibility. Still, we can count to the nearest order of magnitude. This means “power of 10”, and it represents an enormously popular method of approximation. If you can get your answer to within “an order of magnitude” (a power of 10), you are doing good. The first power of 10, or 101, is just 10. The second power is 102=100, and so on.

So how many geniuses? Certainly more than 10, definitely less than 10,000, or the 5th order of magnitude. Could there have been a 100 geniuses? Given my above list, I say yes. 1000? I’m less likely to believe this number, but since I have said that there were lots of geniuses who went unsung, I can’t exclude it. Still, an order of magnitude more than this seems too large.

We have done a lot so far, but we still haven’t answered the question “What is the probability that the next child to be born will be a genius?” The answer will look something like # of geniuses who have ever lived / # of people who ever lived. Coming to this equation is crucial. This is because the question implies—I emphasize, it does not explicitly state—we are asking a question about all humanity. And all humanity certainly means all humans who have ever lived.

Thus far, we have nailed down the numerator in this equation to the nearest order of magnitude or so (102 to 103). How about the denominator?

What evidence do we have? Well, there about about, to an order of magnitude, 1010 or 10 billion people alive today. 100 years from now, nearly of these people will be dead and a new set, probably the same order of magnitude will take its place. Anyway, 100 billion people alive 100 years from now feels way too large to me, and 1 billion way too small, especially given recent population trends.

100 years ago, there were about an order of magnitude less people alive (nearly all of them different from the set we have today), or about 109 or 1 billion. How many 100s of years can we go back? About 2000, since the best guess is humanity arose about 200,000 years ago. That’s close enough; it’s within an order of magnitude. Without doing any math—just going by gut—we can guess that adding today’s 1010 to last century’s 109 (11 billion so far), and to the previous 199 centurys’ diminishing contributions (each previous century had fewer people), we arrive at about 1011, or 100 billion.

Was that larger than you had first guessed? This number usually surprises most people. But having a guess gives us our denominator as that we can finally solve our equation, which is

102
—— = 10-9
1011

of, if there were 103 geniuses, 10-8. In words, it’s anywhere from 1 in a billion to 1 in 100 million.

Not very good odds, right?

This was a lot of thinking for such a simple question, wasn’t it? If you would have written down, as student’s often do, an answer “1 in a 100” or “1 in 1000” you would have got the answer wrong. Both answers imply that we should be flooded with geniuses, an answer which no observation supports.

Of oft-heard complaint among professors is that students don’t think about the answers they give. I agree with this, but I think it’s more than just students. It holds for professors and ordinary civilians, too.

“1 in a 100” is absurd, and far too certain. Just a few moment’s thought shows this. How many answers that we give in life are just as absurd?

Some kids will write, “I don’t know.” I usually give them 1 point for this because, after all, it is the strictly correct answer. But that answer is too certain itself. We do know something about the answer and we can answer it partially. We should always quantify uncertainty in any question and not seek the easy way out by given answers that are too certain.

Here, for fun, is another question I give:

How many umbrellas are there in New York City?

September 14, 2008 | 9 Comments

Much too certain: miscellaneous Sunday topics

Today, a topic that I mean to expand—greatly—in the coming weeks. That theme, as you might has guessed, is too many people are too certain about too many things. Nothing more than pointers to a couple of articles and some commentary for the moment.

Sad Day

Famed shark hunter Frank Mundus, who supplied the words to live by “A PhD don’t mean shit”, died this week. We earlier looked at Mundus’s philosophy in the essay “The BS octopus.” Mundus often proved that having letters after his name didn’t make him a better fisherman.

Don’t Go To College

On the same theme, Charles Murray’s new book on why most people don’t need to and shouldn’t go to college is out and making the blogs. Murray points out the obvious: not everybody is equipped to go to college, but most people are encouraged to do so. Many businesses want applicants to “have a degree” (same phrase we used in The BS Octopus). Meaning that the business doesn’t care what the applicant knows, since he can learn what he needs on the job, but they only want the letters after the name. Ridiculous.

Murray also does the simple math to show that most people would be better off skipping college altogether and heading for the trades: electrician (what I would have done), plumber, craftspeople of all types, farmers, and so on. Not everybody—and this is a shocker—graduates with distinction and makes the salaries at the top end of the range. Besides, Murray says, college should be saved for the people the brains to do it. Sound harsh and non-egalitarian? Well, not everybody has the body to be an athlete, nor the looks to be a model, nor the talent to become an actress or musician. We never have a hard time telling people the hard truth in those cases, but we’re squeamish about telling others that they might not be smart enough for school. It also goes smack in the fact of one of the guiding principles of the Enlightenment: education can cure all ills

High school guidance counsellors push too many people towards college. And colleges take them in. Generates big business, too. I speak from experience when I tell you that college is not for the majority. Since that is a true statement, but people desire college for the majority, this, in part, explains why college is not what it used to be, and why there are so many of them. Many modern-day colleges operate like expansion leagues in sports: too little talent to spread around, leading to watered down performance.

I taught too many kids who should not have been in my class. Sweet kids, mostly, big hearts. I never gave anybody a grade less than they deserved, but I admit to helping contribute to grade inflation. I recall two young men in one class. Both were part-time bouncers, both struggled, worked hard and came on time. They never missed a class or a quiz. Both were dumb as posts, but I loved them. They should have failed, but I weakened (I am a softy at heart) and I passed them. These guys were not isolated incidents, not for me, and not for many, many other professors I know.

Voting Complete

My study for guessing who will win the presidential election is now closed. It went well until one gentleman, who called himself a “godless liberal”, posted the survey on his blog. He called me a “right wing” blogger, and this somehow gave permission for his readers to go nuts and stuff the ballot box with all kinds of nonsense, despite my pleas that behave like good citizens. The good news is that I know who these “voters” are so I’ll be able to remove them from the answers.

Can’t, of course, post results on the actual survey until after the election. I might say some things about the ballot box stuffers before then.