Skip to content

Category: Philosophy

The philosophy of science, empiricism, a priori reasoning, epistemology, and so on.

May 3, 2008 | 23 Comments

Stats 101: Chapter 1

UPDATE: If you downloaded the chapter before 6 am on 4 May, please download another copy. An older version contained fonts that were not available on all computers, causing it to look like random gibberish when opened. It now just looks like gibberish

I’ve been laying aside a lot of other work, and instead finishing some books I’ve started. The most important one is (working title only) Stats 601, a professional explanation of logical probability and statistics (I mean the modifier to apply to both fields). But nearly as useful will be Stats 101, the same sort of book, but designed for a (guided or self-taught) introductory course in modern probability and statistics.

I’m about 60% of the way through 101, but no chapter except the first is ready for public viewing. I’m not saying Chapter 1 is done, but it is mostly done.

I’d post the whole thing, but it’s not easy to do so because of the equations. Those of you who use Linux will know of latex2html, which is a fine enough utility, but since it turns all equations into images, documents don’t always end up looking especially beautiful or easy to work with.

So below is a tiny excerpt, with all of Chapter 1 available at this link. All questions, suggestions for clarifications, or queries about the homework questions are welcome.

Logic

1. Certainty & Uncertainty

There are some things we know with certainty. These things
are true or false given some evidence or just because they are
obviously true or false. There are many more things about which
we are uncertain. These things are more or less probable given
some evidence. And there are still more things of which nobody
can ever quantify the uncertainty. These things are nonsensical or
paradoxical.

First I want to prove to you there are things that are true,
but which cannot be proved to be true, and which are true based
on no evidence. Suppose some statement A is true (A might be
shorthand for “I am a citizen of Planet Earth”; writing just ‘A’ is
easier than writing the entire statement; the statement is every-
thing between the quotation marks). Also suppose some statement
B is true (B might be “Some people are frightfully boring”). Then
this statement: “A and B are true”, is true, right? But also true is
the statement “B and A are true”. We were allowed to reverse the
letters A and B and the joint statement stayed true. Why? Why
doesn?t switching make the new statement false? Nobody knows.
It is just assumed that switching the letters is valid and does not
change the truth of the statement. The operation of switching
does not change the truth of statements like this, but nobody will
ever be able to prove or explain why switching has this property.
If you like, you can say we take it on faith.

That there are certain statements which are assumed true
based on no evidence will not be surprising to you if you have
ever studied mathematics. The basis of all mathematics rests on
beliefs which are assumed to be true but cannot be proved to
be true. These beliefs are called axioms. Axioms are the base;
theorems, lemmas, and proofs are the bricks which build upon
the base using rules (like the switching statements rule) that are
also assumed true. The axioms and basic rules cannot, and can
never, be proved to be true. Another way to say this is, “We hold
these truths to be self-evident.”

Here is one of the axioms of arithmetic: For all natural
numbers x and y, if x = y, then y = x. Obviously true, right? It is just
like our switching statements rule above. There is no way to prove
this axiom is valid. From this axiom and a couple of others, plus
acceptance of some manipulation rules, all of mathematics arises.
There are other axioms?two, actually?that define probability.
Here, due to Cox (1961), is one of those axioms: The probability
of a statement on given evidence determines the probability of its
contradictory on the same evidence. I’ll explain these terms as we
go.

It is the job of logic, probability, and statistics to quantify
the amount of certainty any given statement has. An example
of a statement which might interest us: “This new drug improves
memory in Alzheimer patients by at least ten percent.” How prob-
able is it that that statement is true given some specific evidence,
perhaps in the form of a clinical trial? Another statement: “This
stock will increase in price by at least two dollars within the next
thirty days.” Another: “Marketing campaign B will result in more
sales than campaign A.” In order to specify how probable these
statements are, we need evidence, which usually comes in the form
of data. Manipulating data to provide coherent evidence is why
we need statistics.

Manipulating data, while extremely important, is in some
sense only mechanical. We must always keep in mind that our
goal is to make sense of the world and to quantify the uncertainty
we have in given problems. So we will hold off on playing with data
for several chapters until we understand exactly what probability
really means.

2. Logic

We start with simple logic. Here is a classical logical argument,
slightly reworked:

All statistics books are boring.

Stats 101 is a statistics book.
_______________________________________________
Therefore, Stats 101 is boring.

The structure of this argument can be broken down as follows.
The two statements above the horizontal line are called premises;
they are our evidence for the statement below the line, which is
the conclusion. We can use the words “premises” and “evidence”
interchangeably. We want to know the probability that the conclusion
is true given these two premises. Given the evidence listed,
it is 1 (probability is a number between, and including, 0 and 1).
The conclusion is true given these premises. Another way to say
this is the conclusion is entailed by the premises (or evidence).

You are no doubt tempted to say that the probability of the
conclusion is not 1, that is, that the conclusion is not certain,
because, you say to yourself, statistics is nothing if not fun. But
that would be missing the point. You are not free to add to the
evidence (premises) given. You must assess the probability of the
conclusion given only the evidence provided.

This argument is important because it shows you that there
are things we can know to be true given certain evidence. Another
way to say this, which is commonly used in statistics, is that the
conclusion is true conditional on certain evidence.

(To read the rest, Chapter 1 is available at this link.)

February 29, 2008 | 10 Comments

The tyranny and hubris of experts

Today, another brief (in the sense of intellectual content) essay, as I’m still working on the Madrid talk, the Heartland conference is this weekend, and I have to, believe it or not, do some work my masters want.

William F. Buckley, Jr. has died, God rest his soul. He famously said, “I’d rather be governed by the first 2000 names in the Boston phone book than by the dons of Harvard.” I can’t usefully add to the praise of this great man that has begun appearing since his death two days ago, but I can say something interesting about this statement.

There are several grades of pine “2 by 4’s”, the studs that make up the walls and ceilings of your house. Superior grades are made for exterior walls, lesser grades are useful for external projects, such as temporary bracing. A carpenter would never think of using a lesser grade to build your roof’s trusses, for example. Now, if you were run into a Home Depot and grab the first pine studs you came to (along with the book How to Build a Wall), thinking you could construct a sturdy structure on your own, you might be right. But you’re more likely to be wrong. So you would not hesitate to call in an expert, like my old dad, to either advise you of the proper materials or to build the thing himself.

Building an entire house, or even just one wall, is not easy. It is a complicated task requiring familiarity with a great number of tools, knowledge of various building techniques and materials, and near memorization of the local building codes. But however intricate a carpenter’s task is, we can see that it is manageable. Taken step by step, we can predict to great accuracy exactly what will happen when we, say, cut a board a certain way and nail it to another. In this sense, carpentry is a simple system.

There is no shortage of activities like this: for example baking, auto mechanics, surgery, accounting, electronic engineering, and even statistics. Each of these diverse occupations are similar in the sense that when we are plying that trade, we can pull a lever and we usually or even certainly know which cog will engage and therefore what output to expect. That is, once one has become an expert in that field. If we are not an expert and we need the services of one of these trades, we reach for phone book and find somebody who knows what he’s doing.

But there are other areas which are not so predictable. One of these is governance, which is concerned with controlling and forecasting the activity and behavior of humans. As everybody knows, it is impossible to reliably project what even one person will do on a consistent basis, let alone say what a city or country full of people will be like in five years. Human interactions are horribly, unimaginably complex and chaotic, and impossible to consistently predict.

Of course, not everyone thinks so. There is an empirically-observed relationship that says the more institutionalized formal education a person has, the more likely it is that that person believes he can predict human behavior. We call these persons academics. These are the people who make statements (usually in peer-reviewed journals) like, “If we eliminate private property, then there will be exact income equality” and “We can’t let WalMart build a store in our town because WalMart is a corporation.” (I cleaned up the language a bit, since this is a PG-rated blog.)

It is true, and it is good, that everybody has opinions on political matters, but most people, those without the massive institutionalized formal education, are smart enough to realize the true value of their opinions. Not so the academics, who are usually in thrall to a theory whose tenets dictate that if you pull this one lever, this exact result will always obtain. Two examples, “If we impose a carbon tax, global warming will cease” and “If the U.S.A. dismantles its nuclear weapons, so too will the rest of the world, which will then be a safer place.”

Political and economic theories are strong stuff and even the worst of them is indestructible. No amount of evidence or argument can kill them because they can always find refuge among the tenured. The academics believe in these theories ardently and often argue that they should be given the chance—because they are so educated and we are not—to implement them. They think that—quite modestly of course–because they are so smart and expert, that they can decide what is best for those not as smart and expert. Their hero is Plato who desired a country run by philosophers, the best of the best thinkers. In other words, people like them.

The ordinary, uneducated man is more likely to just want to be left alone in most matters and would design his laws accordingly. He would in general opt for freedom over guardianship. He is street-smart enough to know that his decisions often have unanticipated outcomes, and is therefore less lofty in his goals. And this is why Buckley would choose people from the phone book rather the from the campus.

February 20, 2008 | 45 Comments

An excuse I hadn’t thought of

A few weeks ago I speculated what would happen if human-caused significant global warming (AGW) turned out to be false. There might be a number of people who will refuse to give up on the idea, even though it is false, because their desire that AGW be true would be overwhelming.

I guessed that these people would slip into pseudoscience, and so would need to generate excuses why we have not yet seen the effects of AGW. One possibility was human-created dust (aerosols) blocking incoming solar radiation. Another was “bad data”: AGW is true, the earth really is warmer, but the data somehow are corrupted. And so on.

I failed to anticipate the most preposterous excuse of all. I came across it while browsing the excellent site Climate Debate Daily, which today linked to Coby Beck’s article “How to Talk to a Global Warming Sceptic“. Beck gives a list of arguments typically offered by “skeptics” and then attempts to refute them. Some of these refutations are good, and worth reading.

His attempt at rebutting the skeptical criticism “The Modelers Won’t Tell Us How Confident the Models Are” furnishes us with our pseudoscientific excuse. The skeptical objection is

There is no indication of how much confidence we should have in the models. How are we supposed to know if it is a serious prediction or just a wild guess?

and Beck’s retort is

There is indeed a lot of uncertainty in what the future will be, but this is not all because of an imperfect understanding of how the climate works. A large part of it is simply not knowing how the human race will react to this danger and/or how the world economy will develope. Since these factors control what emissions of CO2 will accumulate in the atmosphere, which in turn influences the temperature, there is really no way for a climate model to predict what the future will be.

This is as lovely a non sequitur as you’re ever likely to find. I can’t help but wonder if he blushed when he wrote it; I know I did when I read it. This excuse is absolutely bullet proof. I am in awe of it. There is no possible observation that can negate it. Whatever happens is a win for its believer. If the temperature goes up, the believer can say, “Our theories predicted this.” If the temperature goes down, the believer can say, “There was no way to know the future.”

What the believer in this statement is asking us to do, if it is not already apparent, is this: he wants you to believe that his prognostications are true because AGW is true, but he also wants you to believe that he should not be held accountable for his predictions should they fail because AGW is true. Thus, AGW is just true.

Beck knows he is on thin ice, because he quickly tries to get his readers to forget about climate forecasts and focus on “climate sensitivity”, which is some measure showing how the atmosphere reacts to CO2. Of course, whatever this number is estimated to be means absolutely nothing about, has no bearing on, is meaningless to, is completely different than, is irrelevant to the context of, the performance of actual forecasts.

It is also absurd to claim that we cannot know “how the human race will react” to climate change while (tacitly or openly) simultaneously calling for legislation whose purpose is to knowingly direct human reactions.

So, if AGW does turn out to be false, those who still wish to believe in it will have to work very hard to come up with an excuse better than Beck’s (whose work “has been endorsed by top climate scientists”). I am willing to bet that it cannot be done.

February 18, 2008 | 37 Comments

Statistics’ dirtiest secret

The old saying that “You can prove anything using statistics” isn’t true. It is a lie, and a damned lie, at that. It is an ugly, vicious, scurrilous distortion, undoubtedly promulgated by the legion of college graduates who had to suffer, sitting mystified, through poorly taught Statistics 101 classes, and never understood or trusted what they were told.

But, you might be happy to hear, the statement is almost true and is false only because of a technicality having to do with the logical word prove. I will explain this later.1

Now, most statistics texts, even advanced ones, if they talk about this subject at all, tend to cover it in vague or embarrassed passages, preferring to quickly return to more familiar ground. So if you haven’t heard about most of what I’m going to tell you, it isn’t your fault.

Before we can get too far, we need some notation to help us out. We call the data we want to predict y, and if we have some ancillary data that can help us predict y, we call it x. These are just letters that we use as place-holders so we don’t have to write out the full names of the variables each time. Do not let yourself be confused by the use of letters as place-holders!

An example. Suppose we wanted to predict a person’s income. Then “a person’s income” becomes y. Every time you see y you should think “a person’s income”: clearly, y is easier to write. To help us predict income, we might have the sex of the person, their highest level of education, their field of study, and so on. All these predictor variables we call x: when you see x, think “sex”, “education”, etc.

The business of statistics is to find a relationship between the y and the x: this relationship is called a model, which is just a function (a mathematical grouping) of the data y and x. We write this as y = f(x), and it means, “The thing we want to know (y) is best represented as a combination, a function, of the data (x).” So, with more shorthand, we write a mathematical combination, a function of x, as f(x). Every time you see a statistic quoted, there is an explicit or implicit   “f(x)“, a model, lurking somewhere in the background. Whenever you hear the term “Our results are statistically significant“, there is again some model that has been computed. Even just taking the mean implies a model of the data.

The problem is that usually the function f(x) is not known and must be estimated, guessed at in some manner, or logically deduced. But that is a very difficult thing to do, so nearly all of the time the mathematical skeleton, the framework, of f(x) is written down as if it were known. The f(x) is often chosen by custom or habit or because alternatives are unknown. Different people, with the same x and y, may choose different f(x). Only one of them, or none of them, can be right, they both cannot be.

It is important to understand that all results (like saying “statistically significant”, computing p-values, confidence or credible intervals) are conditional on the model that chosen being true. Since it is rarely certain that the model used was true, the eventual results are stated with a certainty that is too strong. As an example, suppose your statistical model allowed you to say that a certain proposition was true “at the 90% level.” But if you are only, say, 50% sure that the model you used is the correct one, then your proposition is only true “at the 45% level” not at the 90% level, which is, of course, an entirely different conclusion. And if you have no idea how certain your model is, then it follows that you have no idea how certain your proposition is. To emphasize: the uncertainty in choosing the model is almost never taken into consideration.

However, even if the framework, the f(x), is known (or assumed known), certain numerical constants, called parameters, are still needed to flesh out the model skeleton (if you’re fitting a normal distribution, these are the μ and σ^2 you might have heard of). These must be guessed, too. Generally, however, everybody knows that the model’s parameters must be estimated. What you might not know is that the uncertainty in guessing the parameter values also has to carry through to statements of certainty about data propositions. Unfortunately, this is also rarely done: most statistical procedures focus on making statements about the parameters and virtually ignore actual, observable data. This again means that people come away from these procedures with an inflated sense of certainty.

If you don’t understand all this, especially the last part about parameters, don’t worry: just try to keep in mind that two things happen: a function f(x) is guessed at, and the parameters, the numerical constants, that make this equation complete must also be guessed at. The uncertainty of performing both of these operations must be carried through to any conclusions you make, though, again, this is almost never done.

These facts have enormous and rarely considered consequences. For one, it means that nearly all statistics results that you see published are overly boastful. This is especially true in certain academic fields where the models are almost always picked as the result of habit, even enforced habit, as editors of peer-reviewed journals are suspicious of anything new. This is why—using medical journals as an example—one day you will see a headline that touts “Eating Broccoli Reduces Risk of Breast Cancer,” only to later read, “The Broccolis; They Do Nothing!” It’s just too easy to find results that are “statistically significant” if you ignore the model and parameter uncertainties.

These facts, shocking as they might be, are not quite the revelation we’re after. You might suppose that there is some data-driven procedure out there, known only to statisticians, that would let you find both the right model and the right way to characterize its parameters. It can’t be that hard to search for the overall best model!

It’s not only hard, but impossible, a fact which leads us to the dirty secret: For any set of y and x, there is no unconditionally unique model, nor is there any unconditionally unique way to represent uncertainty in the model’s parameters.

Let’s illustrate this with respect to a time series. Our data is still y, but there is no specific x, or explanatory data, except for the index, or time points (x = time 1, time 2, etc.), which of course are important in time series. All we have is the data and the time points (understand that these don’t have be clock-on-the-wall “time” points, just numbers in a sequence).

Suppose we observe this sequence of numbers (a time series)

y = 2, 4, 6, 8; with index x = 1, 2, 3, 4

Our task is to estimate a model y = f(x). One possibility is Model A

f(x) = 2x

which fits the data perfectly, because x = 1, 2, 3, 4 and 2x = 2, 4, 6, 8 which is exactly what y equals. The “2” is the parameter of the model, which here we’ll assume we know with certainty.

But Model B is

f(x) = 2x |sin[(2x+1)π/2]|

which also fits the data perfectly (don’t worry if you can’t see this—trust me, it’s an exact fit; the “2”s, the “1” and the “π” are all known-for-certain parameters).

Which of these two models should we use? Obviously, the better one; we just have to define what we mean by better. Which model is better? Well, using any—and I mean any—of the statistical model goodness-of-fit measures that have ever, or will ever, be invented, both are identically good. Both models explain all the data we have seen without error, after all.

There is a Model C, Model D, Model E, and so on and on forever, all of which will fit the observed data perfectly and so, in this sense, will be indistinguishable from one another.

What to do? You could, and even should, wait for more data to come in, data you did not use in any way to fit your models, and see how well your models predict these new data. Most times, this will soon tell you which model is superior, or if you are only considering one model, it will tell you if it is reasonable. This eminently common-sense procedure, sadly, is almost never done outside the “hard” sciences (and not all the time inside these areas; witness climate models). Since there are an infinite number of models that will predict your data perfectly, it is no great trick to find one of them (or to find one that fits well according to some conventional standard). We again find that published results will be too sure of themselves.

Suppose in our example the new data is y = 10, 12, 14: both Models A and B still fit perfectly. By now, you might be getting a little suspicious, and say to yourself, “Since both of these models flawlessly guess the observed data, it doesn’t matter which one we pick! They are equally good.” If your goal was solely prediction of new data, then I would agree with you. However, the purpose of models is rarely just raw prediction. Usually, we want to explain the data we have, too.

Models A and B have dramatically different explanations of the data: A has a simple story (“time times 2!”) and B a complex one. Models C, D, E, and so on, all too have different stories. You cannot just pick A via some “Occam’s razor2” argument; meaning A is best because it is “simpler”, because there is no guarantee that the simpler model is always the better model.

The mystery of the secret lies in the word “unconditional”, which was a necessary word in describing the secret. We can now see that there is no unconditionally unique model. But there might very well be a conditionally correct one. That is, the model that is unique, and therefore best, might be logically deducible given some set of premises that must be fulfilled. Suppose those premises were “The model must be linear and contain only one positive parameter,” then Model B is out and can no longer be considered. Model A is then our only choice: we do not, given these premises, even need to examine Models C, D, and so on, because Model A is the only function that fills the bill; we have logically deduced the form of Model A given these premises.

It is these necessary external premises that help us with the explanatory portion of the model. They are usually such that they demand the current model be consonant with other known models, or that the current model meet certain physical, biological, or mathematical expectations. Regardless, the premises are entirely external to the data at hand, and may themselves be the result of other logical arguments. Knowing the premises, and assuming they are sound and true, gives us our model.

The most common, unspoken of course, premise is loosely “The data must be described by a straight line and a normal distribution”, which, when invoked, describes the vast majority of classical statistical procedures (regression, correlation, ANOVA, and on and on). Which brings us full circle: the model and statements you make based on it are correct given the “straight line” premise is true, it is just that the “straight line” premise might be, and usually is, false.3

Because there are no unconditional criteria which can judge which statistical model is best, you often hear people making the most outrageous statistical claims, usually based upon some model that happened to “fit the data well.” Only, these claims are not proved, because to be “proved” means to be deduced with certainty given premises that are true, and conclusions based on statistical models can only ever be probable (less than certain and more than false). Therefore, when you read somebody’s results, pay less attention to the model they used and more to the list of premises (or reasons) given as to why that model is the best one so that you can estimate how likely the model that was used is true.

Since that is a difficult task, at least demand that the model be able to predict new data well: data that was not used, in any way, in developing the model. Unfortunately, if you added that criterion to the list of things required before a paper could be published, you would cause a drastic reduction in scholarly output in many fields (and we can’t have that, can we?).

1I really would like people to give me some feedback. This stuff is unbelievably complicated and it is a brutal struggle finding simple ways of explaining it. In future essays, I’ll give examples from real-life journal articles.
2Occam’s razor arguments are purely statistical and go, “In the past, most simple models turned out better than complex models; I can now choose either a simple or complex model; therefore, the simple model I now have is more likely to be better.”
3Why these “false” models sometimes “work” will be the discussion of another article; but, basically, it has to do with people changing the definition of what the model is mid-stream.