Lesson Two Redux: More Mysticism
Statistics as Beauty; Global Warming Miscellany; SATs Biased?; More
Can having a mammogram kill you? How to make decisions under uncertainty.
The answer to the headline is, unfortunately, yes. The Sunday, 10 February 2008 New York Post reported this sad case of a woman at Mercy Medical Center in New York City. The young woman went to the hospital and had a mammogram, which came back positive, indicating the presence of breast cancer (she also had follow-up tests). Since other members of her family had experienced this awful disease, the young woman opted to have a double mastectomy and to have have implants inserted after this. All of which happened. She died a day after the surgery.
That’s not the worst part. It turns out she didn’t have cancer after all. Her test results had been mixed up with some other poor woman’s. So if she never had the mammogram in the first place, and made a radical decision based on incorrect test results, the woman would not have died. So, yes, having a mammogram can lead to your death. It is no good arguing that this is a rare event—adverse outcomes are not so rare, anyway—because all I was asking was can a mammogram kill you. One case is enough to prove that it can.
But aren’t medical tests, and mammograms in particular, supposed to be error free? What about prostate exams? Or screenings for other cancers? How do you make a decision whether to have these tests? How do you account for the possible error and potential harm resulting from this error?
I hope to answer all these questions in the following article, and to show you how deciding whether to take a medical exam is really no different than deciding which stock broker to pick. Some of what follows is difficult, and there is even some math. My friends, do not be dissuaded from reading. I have tried to make it as easy to follow as possible. These are important, serious decisions you will someday have to make: you should not treat them lightly.
Decision Calculator
You can download a (non-updated) pdf version of this paper here.
This article will provide you with an introduction and a step-by-step guide of how to make good decisions in particular situations. These techniques are invaluable whether you are an individual or a business.
The results that you’ll read about hold for all manner of examples—from lie detector usefulness, to finding a good stock broker or movie reviewer, to intense statistical modeling, to financial forecasts. But a particularly large area is medical testing, and it is these kinds of tests that I’ll use as examples.
Many people opt for precautionary medical tests—frequently because a television commercial or magazine article scares them into it. What people don’t realize is that these tests have hidden costs. These costs are there because tests are never 100% accurate. So how can you tell when you should take a test?
When is worth it?
Under what circumstances is it best for you to receive a medical test? When you “Just want to be safe”? When you feel, “Why not? What’s the harm?”
In fact, none of these are good reasons to undergo a medical test. You should only take a test if you know that it’s going to give accurate results. You want to know that it performs well, that is, that it makes few mistakes, mistakes which could end up costing you emotionally, financially, and even physically.
Let’s illustrate this by taking the example of a healthy woman deciding whether or not to have a mammogram to screen for breast cancer. She read in a magazine that all women over 40 should have this test “Just to be sure.” She has heard lots of stories about breast cancer lately. Testing almost seems like a duty. She doesn’t have any symptoms of breast cancer and is in good health. What should she do?
What can happen when she takes this (or any) medical test? One of four things:
You cannot measure a mean
Stats 101: Chapter 4
This is where it starts to get weird. The first part of the chapter introduces the standard notation of “random” variables, and then works through a binomial example, which is simple enough.
Then come the so-called normals. However, they are anything but. For probably most people, it will be the first time that they hear about the strange creatures called continuous numbers. It will be more surprising to learn that not all mathematicians like these things or agree with their necessity, particularly in problems like quantifying probability for real observable things.
I use the word “real” in its everyday, English sense of something that is tangible or that exists. This is because mathematicians have co-opted the word “real” to mean “continuous”, which in an infinite amount of cases means “not real” or “not tangible” or even “not observable or computable.” Why use these kinds of numbers? Strange as it might seem, using continuous numbers makes the math work out easier!
Again, what is below is a teaser for the book. The equations and pictures don’t come across well, and neither do the footnotes. For the complete treatment, download the actual Chapter.
Distributions
1. Variables
Recall that random means unknown. Suppose x represents the number of times the Central Michigan University football team wins next year. Nobody knows what this number will be, though we can, of course, guess. Further suppose that the chance that CMU wins any individual game is 2 out of 3, and that (somewhat unrealistically), a win or loss in any one game is irrelevant to the chance that they win or lose any other game. We also know that there will be 12 games. Lastly, suppose that this is all we know. Label this evidence E. That is, we will ignore all information about who the future teams are, what the coach has leaked to the press, how often the band has practiced their pep songs, what students will fail their statistics course and will thus be booted from the team, and so on. What, then, can we say about x?
We know that x can equal 0, or 1, or any number up to 12. It’s unlikely that CMU will loss or win every game, but they?ll prob ably win, say, somewhere around 2/3s, or 6-10, of them. Again, the exact value of x is random, that is, unknown.
Now, if last chapter you weren?t distracted by texting messages about how great this book is, this situation might feel a little familiar. If we instead let x (instead of k?remember these letters are place holders, so whichever one we use does not mat
ter) represent the number of classmates you drive home, where the chance that you take any of them is 10%, we know we can figure out the answer using the binomial formula. Our evidence then was EB . And so it is here, too, when x represents the number of games won. We?ve already seen the binomial formula written in two ways, but yet another (and final) way to write it is this:
x|n, p, EB ? Binomial(n, p).
This (mathematical) sentence reads “Our uncertainty in x, the number of games the football team will win next year, is best represented by the Binomial formula, where we know n, p, and our information is EB .” The “?” symbol has a technical definition: “is distributed as.” So another way to read this sentence is “Our uncertainty in x is distributed as Binomial where we know n, etc.” The “is distributed as” is longhand for “quantified.” Some people leave out the “Our uncertainty in”, which is OK if you remember it is there, but is bad news otherwise. This is because people have a habit of imbuing x itself with some mystical properties, as if “x” itself had a “random” life. Never forget, however, that it is just a placeholder for the statement X = “The team will win x games”, and that this statement may be true or false, and it?s up to us to quantify the probability of it being true.
In classic terms, x is called a “random variable”. To us, who do not need the vague mysticism associated with the word random, x is just an unknown number, though there is little harm in calling it a “variable,” because it can vary over a range of numbers. However, all classical, and even much Bayesian, statistical theory uses the term “random variable”, so we must learn to work with it.
Above, we guessed that the team would win about 6-10 games. Where do these number come from? Obviously, based on the knowledge that the chance of winning any game was 2/3 and there?d be twelve games. But let?s ask more specific questions. What is the probability of winning no games, or X = “The team will win x = 0 games”; that is, what is Pr(x = 0|n, p, EB )? That’s easy: from our binomial formula, this is (see the book) ? 2 in a million. We don’t need to calculate n choose 0 because we know it?s 1; likewise, we don?t need to worry about 0.670^0 because we know that?s 1, too. What is the chance the team wins all its games? Just Pr(x = 12|n, p, EB ). From the binomial, this is (see the book) ? 0.008 (check this). Not very good!
Recall we know that x can take any value from zero to twelve. The most natural question is: what number of games is CMU most likely to win? Well, that’s the value of x that makes (see the book) the largest, i.e. the most probable. This is easy for a computer to do (you’ll learn how next Chapter). It turns out to be 8 games, which has about a one in four chance of happening. We could go on and calculate the rest of the probabilities, for each possible x, just as easily.
What is the most likely number of games the team will win is the most natural question for us, but in pre-computer classical statistics, there turns out to be a different natural question, and this has something to do with creatures called expected values. That term turns out to be a terrible misnomer, because we often do not, and cannot, expect any of the values that the “expected value” calculations give us. The reason expected values are of interest has to do with some mathematics that are not of especial interest here; however, we will have to take a look at them because it is expected of one to do so.