Archive for June, 2008

Jun 30 2008

IMS: Citation Indexes Stink

Published by Briggs under Bad statistics

The Institute of Mathematical Statistics (I am a member) has issued a report on the wide-spread misuse of Citation Statistics.

The full report may be found here.

The non-surprising main findings are:

  • Statistics are not more accurate when they are improperly used; statistics can mislead when they are misused or misunderstood.
  • The objectivity of citations is illusory because the meaning of citations is not well-understood. A citation’s meaning can be very far from “impact”.
  • While having a single number to judge quality is indeed simple, it can lead to a shallow understanding of something as complicated as research. Numbers are not inherently superior to sound judgments.

The last point is not just relevant to citation statistics, but applies equally well to many areas, such as (thanks to Bernie for reminding me of this) trying to quantify “climate sensitivity” with just one number.

More findings from the report:

  • For journals, the impact factor is most often used for ranking. This is a simple average derived from the distribution of citations for a collection of articles in the journal. The average captures only a small amount of information about that distribution, and it is a rather crude statistic. In addition, there are many confounding factors when judging journals by citations, and any comparison of journals requires caution when using impact factors. Using the impact factor alone to judge a journal is like using weight alone to judge a person’s health.
  • For papers, instead of relying on the actual count of citations to compare individual papers, people frequently substitute the impact factor of the journals in which the papers appear. They believe that higher impact factors must mean higher citation counts. But this is often not the case! This is a pervasive misuse of statistics that needs to be challenged whenever and wherever it occurs.
  • For individual scientists, complete citation records can be difficult to compare. As a consequence, there have been attempts to find simple statistics that capture the full complexity of a scientist’s citation record with a single number. The most notable of these is the h?index, which seems to be gaining in popularity. But even a casual inspection of the h?index and its variants shows that these are naive attempts to understand complicated citation records. While they capture a small amount of information about the distribution of a scientist’s citations, they lose crucial information that is essential for the assessment of research.

I can report that many in medicine fixate and are enthralled by a journal’s “impact factor”, which is, as the report says, a horrible statistic—with an awful sounding name. The “h index” is “the largest n for which he/she has published n articles, each with at least n citations.”

Naturally, now that we statisticians have weighed in on the matter, we can expect a complete stoppage in the usage of citation statistics.

6 responses so far

Jun 24 2008

Variant on a theme

Published by Briggs under Philosophy

We, dear readers, have earlier dealt with the nonsensical argument, of which an Enlightened few are excessively fond, “There is no truth.” This is an argument often employed by those who embrace the idea that “all cultures are of equal value.” It is also commonly found, if sometimes not expressly stated, in academia in the “humanities”.

But the argument is ridiculously absurd and paradoxical, and in the same class as the 2600 year-old Epimenides paradox (Epimenides was a Cretan who said, “All Cretans are liars.”). A paradox, incidentally, is a man-made creation that stands in the way of a man-made theory gaining full acceptance. When a paradox arises, it implies, logically, that the theory that gave rise to it is flawed and should be modified or abandoned. But, usually, the theory is so beautiful or desirable that every possible effort is made to do away with the paradox (typically by calling it a “Problem” or ignoring it). The philosopher David Stove has brilliantly written about this in his book The Rationality of Induction.

Now, if we rationally believe the argument “There is no truth”, it must mean the argument is true. And if the argument is true, then the statement “there is no truth” is false because we at least believe the argument is true. Which of course means there is truth, so the argument is fallacious. Or nonsensical, actually. In other words, anybody who makes the argument and is convinced by it is making a grievous error or acting foolishly. This is bad news for those who theorize that human thought creates truth, or “truth” as they normally write it. From Stove again: writing “true” does not mean true, but only “believed to be true by so and so”, a definition as far from true as you can get.

Very well. Few actually utter the exact words “There is no truth”, probably because some internal B.S. detector senses something has gone awry. But there are, in common parlance, phrases which are entirely equivalent to “There is no truth.” Let’s look at one of them.

“Don’t be all judgmental”, is a phrase often heard immediately after you have pointed out that some behavior on the part of another was wrong or mistaken. Or it can be found in a simple example like this: you walk by a booth selling tie-dyed shirts and you say, “Those shirts are hideously ugly” and the booth owner says “Some people are so judgmental” which carries the implication that “being judgmental is wrong.”

The presupposition is that passing judgment on somebody’s “lifestyle” (for those who do not speak psychobabble, this means the English word behaviors) is an activity which is forbidden. It follows immediately that when the person says to you “Don’t be all judgmental” they are in fact passing judgment on your behavior. In other words, they are “being all judgmental.” It is, therefore, impossible not to pass judgment. I do not mean “impossible” in the colloquial sense of “unlikely”, but in the logical sense of “certainly cannot be no matter what.”

[UPDATE–thanks Nick!:] This is true whether tie-dyed shirts really are hideous or whether my comment was solicited (it was) or not, or whether the thought remains a thought and is forever unspoken. It might be, of course, that offering an unsolicited comment aloud is in poor taste, but it might also be that it is useful in the sense of discouraging aberrant behavior, such as that displayed by street vendors hawking ridiculous looking clothing.

So the next time somebody says to you “Don’t be all judgmental” you ask them “Aren’t you passing judgment on me?” Then get ready for a blank stare.

6 responses so far

Jun 23 2008

Ithaca update: hours and dogs as presidential candidates

Published by Briggs under Fun, Politics

The Ithaca Hours, mentioned in the previous post, quantify a barter system, trading “hours worked” at one task for equivalent “hours worked” at another. For example, you might trade one “hour” of “Cranio-sacral therapy, energy healing” for 10 hours of “Speaking & consulting on non-violent symbolic action.” Most services on offer are on the order of “Gentle Reiki energy sessions for health and growth” and ” movement coaching.” Some ordinary retail shops accept hours, but only for a small percentage of your overall bill. The Hours themselves have the logo “In Ithaca We Trust”, an expression the egotism of which I trust is obvious enough. The hours are, naturally, printed on hemp paper.

As I understand it, and here I might be off, trade, even though conducted in “Hours”, must still be ultimately accounted for in green-backs for tax purpose. “Hours” received are treated as ordinary income. Which, if true, makes the system truly worthless. But enlightened, and certainly enjoyable because, as their website says, it’s “fun to get and use something other than dollars (remember how much you enjoyed or still enjoy using monopoly money).” Thus, spending “Hours” is a form of play, though I find it odd that they would tout the game Monopoly, which is a game that teaches and celebrates capitalism.

The Ithaca Festival was this weekend on the Commons. This is a typical summer outdoor festival with arts & crafts and music. I counted not less than four booths that featured tie-dyed clothing, perhaps the ugliest form of body covering ever invented.

I went into a t-shirt shop (to find for my number two son a shirt emblazoned with “Ithaca Gun”, a now-defunct company that was justly famous for their shotguns) and some middle-aged ladies were discussing the upcoming election. “I’d vote for a dog before I’d vote for a republican!” said one. “I’d vote for a parakeet before I’d vote for McCain,” said another. “I can’t see why anybody would ever vote for a republican,” quipped the last.

The only thing strange about these commonplace comments is that they imply that the democrat party, lacking candidates of substance, will soon nominate animals to their tickets.

7 responses so far

Jun 21 2008

But you must hate us!

Published by Briggs under Politics

I am in Ithaca, New York, teaching a short course at Cornell University. Have you ever visited Ithaca? It was once voted the “most enlightened city in America” by the far-left magazine Utne Reader. Plenty of Volvos with “Impeach Bush” bumper stickers on them, a score of Tibetan bead shops in a desolate downtown area called the Commons, a own home-grown currency called “Hours” which is supposed to be more politically correct than greenbacks, and so on.

I was in a popular bar called the Chapter House (fantastic beer selection) and met a gentleman from England who was at Cornell taking a course from a well-known labor educator. This gentleman’s flight back home was canceled because of a thunderstorm. He is a union organizer for the Transit Workers in London. We had a nice chat over a few beers.

The bartender found out that my new friend was from England and asked him, “You must hate us over there.” By “us” he meant “Americana.” My friend said “No, we generally like Americans.” The bartender refused to accept this. “But you must hate us. Look at everything we have done!” My friend’s reply: “I was happy to come here. America is a great place.”

(By “we”, I assume the bartender did not include himself.)

This went back and forth a few times, my friend even describing a trip to Walmart to buy inexpensive jeans. The bartender lost heart and gave up. I felt sorry for him. There was nobody around to confirm his feelings of inferiority or to show him that he was not hated as he hoped he would be.

So the next time you are in Ithaca, please stop and tell somebody how much you dislike them. It will be sure to cheer them up.

11 responses so far

Jun 17 2008

Please don’t let them do it

Published by Briggs under Fun

You will have by now heard that some are advocating the use of “instant replay” in baseball. The, for lack of a better word, entities pushing for this realize its nefarious implications, and so suggest the video tape be referenced only for disputed home run calls.

Please, God, do not let them do it.

I used to enjoy watching American football when I was boy. Two things destroyed my pleasure in this sport. The first, and most obvious, is the increasing non-stop blather from the sportscasters, now crammed three or four to a booth. These guys never know when to shut up. Worse, broadcast colleagues in baseball thought that they should get in on the act and not just call the game, but to analyze every triviality. No, instead of great announcers like Ernie Harwell and Phil Rizzuto—gentlemen who knew when to shut up and let us hear the relaxing sounds of the ballpark—we have corporate types with “communications degrees” endlessly uttering profundities like “This game isn’t over, Jim.”

This would have been tolerable in football if it weren’t for the second degrading change: The instant replay. Games now drag by as referees, doubtless worried their calls might be challenged, gather at the end of nearly every play to have a little chat about what just happened. And then there is the ridiculous spectacle of a coach prancing up to the sidelines to delicately toss a little red flag on the field when he feels piqued. It is a pathetic thing to see.

I predict that not too many years from now, the game of football will have evolved so that each team’s rosters are supplemented by attorneys (both offensive and defensive ones, naturally). At the conclusion of each play, the lawyers will charge the field to dispute the play—challenging the outcome on the grounds of insanity, income disparity, etc.—to be settled by a jury of tennis fans (who presumably will not prejudice the outcome). Some plays will be so contentious that they will end up in court. It will eventually take years to finish a “season” as the courts become backlogged with cases.

Please do not let this happen to baseball. Umpires, like MBA business executives who think of things like instant replay, make mistakes, but so what. You will get over a bad call. The instant replay some say makes good “business sense” because “so much is at stake.” Nonsense. It is only a game and it is meant to be entertaining.

It will suck the life out of baseball, interrupt the natural flow, and make watching the games more of a chore than a pleasure.

5 responses so far

Jun 12 2008

Peer-reviewed research: Men find looking at nearly naked women distracting

Published by Briggs under Bad statistics, Fun

Nothing is true unless it has been demonstrated and published in a peer-reviewed journal. For example, until last week, many people suspected that when men look at nearly or completely naked women, they tend to be distracted. Anybody who believed that was foolish to do so because it had never been “scientifically” proven.

If they did believe it, they probably did so based on their academically-discredited intuitions. Amateurs.

But scientific researchers Bram Van den Bergh, Siegfried Dewitte,and Luk Warlop have finally leant scientific credibility to the popular belief, which we are now free to label as “scientific.” These researchers published their stunning findings in the June 2008 issue of the Journal of Consumer Research. The journal article was summarized in a newspaper report here.

The title of their article is “Bikinis Instigate Generalized Impatience in Intertemporal Choice.” Their abstract follows

Neuroscientific studies demonstrate that erotic stimuli activate the reward circuitry processing monetary and drug rewards. Theoretically, a general reward system may give rise to nonspecific effects: exposure to ?hot stimuli? from one domain may thus affect decisions in a different domain. We show that exposure to sexy cues leads to more impatience in intertemporal choice between monetary rewards. Highlighting the role of a general reward circuitry, we demonstrate that individuals with a sensitive reward system are more susceptible to the effect of sex cues, that the effect generalizes to nonmonetary rewards, and that satiation attenuates the effect.

In you cannot read this, do not worry, for it is not written in English, but in academese, a language which frequently borrows English words, but changes their meanings and which otherwise has no similarity to plain English. Luckily for you, dear reader, I have been trained in academese and can translate the abstract for you:

When men look at naked women, their brains get excited and they have thoughts of getting lucky. When men see naked women, they get distracted and cannot concentrate on the tasks at hand. When we showed a group of men pictures of nearly naked women, they lost patience with a betting game we tried playing with them. The hornier the men were the less they were interested in our game, and in anything else we had to say. After a while, the men got bored of looking at the same women and wanted to move on.

As I said, this is ground-breaking research as it brings to light relationships of men to naked women never before suspected.

Rumor has it the three researchers, who are from Belgium, plan on studying the effects of increasing dosages of the C2H4OH molecule on men’s perception of female attractiveness. I for one, cannot wait to find out.

4 responses so far

Jun 10 2008

Bill Clinton’s “Pump Head”

Published by Briggs under Bad statistics

I have never, and will never, read Vanity Fair. Given our culture is already saturated, more mindless celebrity tittle tattle written by besotted suck-ups I do not need. So I missed the piece on Bill Clinton that suggested he might have suffered from a malady called “pump head”, brought on by his heart surgery.

Melinda Back, at the Wall Street Journal, wrote an article on this subject today (I have no idea how long that link will be good) which alerted me to the topic.

When surgeons cut a guy open to chop away at his heart, they usually stop it from beating (presumably, this makes it less slippery). They then hook up a machine, a pump, to oxygenate and circulate the patient’s blood. Some people are concerned that the machine, which is certainly necessary, causes harm, usually mental degradation, to those patients who live through the surgery. Lots of mechanisms have been proposed which might cause this harm, but there is no agreement or even direct evidence that any of them actually do cause harm.

“Pump head”, not to put too fine a point on it, is bunk.

The first “diagnosing” of this strange malady came from a series of experiments that gave people before- and after-surgery mental exams. The researchers found that a certain proportion of people scored worse on the after-surgery tests, which confirmed the idea that people get dumber after having been on the pump.

To show this, they created a conglomeration of the tests that were given using a dicey statistical technique called “factor analysis,” a method with which it is far too easy to generate spurious results. But even given that this method was applied properly and conservatively, there is still a large, glaring error in these analyses.

It is true that some people scored worse on the conglomeration-test after surgery. This is the sole evidence for “pump head.” But it is also true that some people scored better! In fact, the same exact proportion of people who scored worse, scored better. This means you could just as easily write a paper suggesting open-heart surgery as a method to boost IQ!

The problem was that the original researchers never bothered to look for people who scored better, only those who scored worse; they only examined those patients who looked like what they hoped they would look like, that is, those who seemed to get dumber.

What’s really going on is nothing more than the banal phenomena of “regression to the mean.” If you take a test, some days you will do better, other days worse. Everybody has a natural background variability. Now, if you do score high one day, chances are that the next time you take the test, you will achieve only your average performance. Same thing if you first tested low: next time, you’re likely to improve.

If you look at a bunch of people who take the test, and create two groups, one with those who scored high and another with those who scored low, and then later re-test both groups the high group will show lower scores on the re-test, and the low group will show higher scores. It is impossible for the situation to be other than this.

This phenomena is a boon to researchers who want to prove spurious effects, because, as I said, it is impossible for it not to manifest itself. You can prove the efficacy of or show the potential harm of absolutely any therapy this way.

So pump head, so far as it has been demonstrated in tests like these, is nonsense.

This means that Bill Clinton is probably no dumber now than he was before.

4 responses so far

Jun 06 2008

Publisher needed: Stats 101

Published by Briggs under Uncategorized

I’ve been looking around on various publisher’s websites over the past few weeks to see which of them might take Stats 101 off my hands. I have also been considering bringing it out myself, like my other bestseller, but would rather avoid that.

Here is an overview of (tentative title) Stats 101: Real Life Probability and Statistics in Plain English in case anybody knows a publisher.

I have successfully used this (draft) text in several introductory, typically one-semester, courses, and will do so again this summer at Cornell in the ILR school. It is meant for the average student who will only take one or two courses in statistics and who must, above all, understand the results from statistical models yet will not do much calculating on their own. Examples come from various fields such as business, medicine, and the environment. No jargon is used anywhere except when absolutely necessary. The book has also be used for self-study.

Many books claim to be a “different” way of teaching introductory statistics, yet when you survey the texts the only thing that changes are the names of the examples, or whether boxplots are plotted vertically or horizontally.

Not this book. This is the only volume that emphasizes objective Bayesian probability from the start to the finish. It is the only one that stresses what is called “predictive” statistics. I do not mean forecasting. Predictive statistics focuses on the quantification of actual observable, physical data. This book teaches intuitive statistics.

Nearly all of classical statistics and much of Bayesian statistics concentrate their energies making statements about the parameters of probability models. The student will learn these methods in “Stats 101″, too. But what the other books will not do is to put the knowledge of parameters in perspective. Concentrating solely on parameters makes you too confident and gives the student a misleading picture of the true uncertainty in any problem.

Hardly any equations appear in the book. Only those that are strictly necessary are given. The soul of this book is on understanding, which is crucial for students who will not become statisticians (it’s crucial for the later group, too, but they will seek out more math). Pictures, instead of confusing formulae, are used whenever possible.

All computations are done in R, and are presented in easy-to-follow recipes. An appendix of R commands leads the students through several common examples. No calculations are done by hand and the student is never asked to look up information in some obscure table. I have also set up a book website where the data used can be downloaded.

There are 15 chapters plus the aforementioned appendix. The book starts, unlike any other statistics book except Jayne’s advanced Probability Theory, with logic. This easy introduction intuitively leads to (discrete) probability. After that, three chapters lead up the binomial and normal distributions emphasizing their duty in quantifying uncertainty in observable data. Building intuition is stressed. These chapters are followed by two others on R and on real-life data manipulation (all at a very basic level, presented in a very realistic, plain spoken manner).

Chapter 8 introduces classical and Bayesian (unobservable) parameter estimation. Chapter 9 brings us back from parameter estimation to observables. The true purpose of building a probability model is to quantify the uncertainty of data not yet seen, yet no book (except very advanced monographs like Geisser’s Predictive Statistics) ever mentions this.

Chapters 10 and 11 go over classical and Bayesian testing, and again brings everything back to practicality and observables.

Chapters 12 and 13 introduce linear and logistic models; again classical, Bayesian, and observables methods are given.

The most popular chapter by far is 14, which is “How to Cheat” (at statistics). It is in the nature of Huff’s well known How to Lie with Statistics, but brought up to date and has many examples of how easy it is to manufacture “significant” results, particularly if you use classical methods.

Finally, the last chapter gives a philosophical overview of modern, observable statistics, and ties everything together.

Each chapter has homework questions, and I am working on an answer guide now, which I imagine can be published separately. Most homework, especially in the chapters on statistics, have the students gather, prepare, and analyze their own data, which works wonders for their understanding.

There is a division, and sometime animosity, that splits our field along classical and Bayesian lines. This book adds a third division by taking the minority position in the Bayesian field. The objective, logical probability camp is small and growing, and is, as I obviously feel, the correct position. Most of us are not in statistics departments, but are in physics, astronomy, meteorology, etc.— fields in which it is not just desired, but necessary to properly quantify uncertainty in real life data. Naturally, we argue that everybody should be interested in observable data, because it is the only data that can be, well, observed.

Because of these ideas, the book is not likely to be adopted as a primary text in many statistics classes; at least, not right away. However, I have had interest from professors, especially Bayesians, who would like to use it as a supplementary text. Other professors in computer science, physics, astronomy etc. would use it directly. It’s about 200 pages in a standard trade paperback format, ideal for an optional or secondary text.

Lastly, statistics professors themselves will form a likely audience. They will not necessarily teach from the book (not all professors, obviously, teach introductory classes), but will use it as a source for a clear introduction to logical probability and non-parameter statistics. This is a new and growing area and there is a clear benefit to being first.

8 responses so far

Jun 05 2008

How to Cheat: Stats 101 Chapter 14

Published by Briggs under Good Statistics, Philosophy

Here is the link.

I’ve decided to jump ahead a few chapters. Chapters 10 - 13 are very important and cover material that comprises 90% of all the actual statistics that is practiced by civilians. Topics like “testing” and regression—how they are done in classical and Bayesian statistics, why these methods are too sure of themselves, and why observable statistics is the only proper way.

But I can tell that I am testing the patience of my audience, so I will leave these more technical chapters for the book itself.

Thus, here I return to something eminently practical: HOW TO CHEAT WITH STATISTICS.

It is important these days for people to know how to get away with as much as they possibly can. This chapter shows you how to do it.

There are no cheap methods like data fudging or just plain lying—those techniques are for pikers. No: what I give you is genuine, sophisticated gold. Tricks you can actually use and get away with. Tricks that work.

I must be out of my mind to give these secrets away for free, but it is a measure of how much I love you, my audience, my faithful readers.

Only an excerpt is in this posting. To get the whole Chapter, you’ll have to download it. Here is the link.

2. Conditioning

A typical academic study is one, say, that gathers two groups of college kids, maybe about 50 in each set, and has them do some task or asks them to rate something. Another study gathers data from a small area, say a neighborhood in a city, where the sample size may be as high as a few hundred, and asks sociological and economic questions of the people that live there. A medical study might try two treatments in two groups of a hundred or so people. When the data from these studies are in, the results are compiled and papers are published. Claims are made in these papers. The college kids paper will say that people act one way and not another; the city paper will say that poor people have less money; and the medical paper will claim treatment A is better than treatment B.

We already know that if all these researchers wanted to do was to say something about their datasets, then they do not need statistics or probability models. They can look at their data and say, yes, more people got better under treatment A than under treatment B. They would be finished. Evidently, the creators of these studies do not want to make statements only about the past data; they want to imply their findings are more widely applicable.

By far the majority of these kinds of studies, published in academic journals, concern humans. As of this writing, there are over 6.6 billion humans alive, about 100 billion are dead, and God only knows how many more are yet to live. Incidentally, whatever you do, do not mention these facts in your results (unless, of course, you happen to be writing about demography), it will weaken your argument.

Are the results from the college kids study applicable to all humans? All those that lived in the past, those that will live in the future, even those that live now but not in the town in which the college lie? Those who are in their 50s?, 80s? who are less than 10? Poorer people and those with enough money to “get a degree”? (Kids go to college to “get a degree” nowadays, and not usually for anything else. Well, maybe socialization. These are rational choices given the way things are.) Kids at other universities? Let’s be clear: the researchers will gather data on their 100 kids, create a probability model, and since they have read this book, they will not just make a statement about the parameters, but calculate the probability distribution of future observables. The only problem is, about whom do we apply this probability distribution?

Before we answer that, think about the medical trial, which was conducted at a hospital in a city on the East Coast of the United States of America. The physicians also use their data to create a probability distribution of future patients. But who exactly are these patients? People who live in other cities on the east coast?, anywhere in the USA? Canada, too? Or only cities of a certain size? Or do the future patients merely have to “look like” the patients in the old data; that is, be of the same ages, sex ratio, weights, economic condition, have eaten the same things in their lifetimes, traveled to the same places, engaged in the same activities, and so on. Would it have applied to the people who used to be alive, and to people not yet born, indefinitely into the future?

Nobody knows the answers to these questions, which is highly in your favor, especially if you have just completed a study using data “at hand”, that is, that was easy for you to collect. You certainly want to imply that your results are as broadly applicable as possible because this makes you more of an expert than somebody who merely claims to know the habits of a small group of college kids in the year 2008 only, in city C and who are unmarried, between 19 and 22 years old, and whose parents are upper middle class, etc. Openly stressing these limitations might be noble and correct, but it will not get you far. State your results in terms of all people. For example, say “People choose option A over B which gives weight to our theory of psychology.” Do not say, “College kids in our freshman psychology class, who might not be anything like the rest of the population, carried out an experiment for us?and surely they took this task seriously?and…”

Same thing in the medical trial. Emphasize your small p-value, spend more time talking about how the two groups of patients (those that received treatment A and those that got B) were not different than one another. Tell how there were roughly equal numbers of men and women in both treatments, and the same with age, weight, etc. This is an excellent strategy because it is useful information: if the two groups did differ, then your results may be biased. Well, this is a wonderful distraction because it allows you to ignore or downplay the discussion of how your results might only be useful for a small subset of patients.

In short, be sloppy describing the nature of your sample; or, rather, say as much about your sample as you like, but say little or nothing about whom you expect your results are applicable. Certainly imply that all humanity falls under your results. With any luck, a reporter will find your paper and help you along this road by summarizing your results, leaving out all hint of limitation with his headline, “Kumquats reduce risk of toenail cancer.”

5. Publishable p-values

Most journals, say in medicine or those serving fields ending with “ology”, are slaves to p-values. Papers have a difficult, if not impossible, time getting published unless authors can demonstrate for their study a p-value that is publishable, that is, that is less than 0.05. Sometimes, the data are not cooperative and the p-value that you get from using a common statistic is too large to see the light of print. This is bad news, because if you are an academic, you must publish papers else you can?t get grants, and if you don?t get grants, then you do not bring money into your university, and if you don?t bring money into your university, you do not get tenure, and if you do not get tenure, then you are out the door and you feel shame.

So small p-values are important. I of course advise against using classical statistics methods, but if you are forced to (and some journal editors insist on it), then all is not lost if an initial large p-value is found. In fact, I would go so far to say that if you cannot find a publishable p-value in any situation, then you are not trying hard enough. There are several ways to lower your p-value.

The most well known is to increase your sample size. This one is a lock. Let?s take a look at the t-test statistic from Chapter 10 to see why.

(see the book)

There is a mathematical phrase that begins “without loss of generality” which I now invoke by letting, for ease of notation, nA = nB = n and s2 = s2 = s2 , so that t(x) becomes

(see the book)

Remember that we want a large statistic, a large t, the larger the better, because larger ts mean smaller p-values. Do you see the trick? A larger n means a larger t! All you have to do is to increase your sample size and just wait for the small p-values to start rolling in. This trick always works in any classical situation, even when the difference xA ? xB is too small to be of interest to anybody. This is why having a small p-value is called attaining statistical significance and not practical or useful significance.

Incidentally, this trick also works in Bayesian statistics in the sense that the posterior distribution of μ A ? μ B will have most probability above or below zero. But it fails miserably in modern observable statistics because a trivial difference in μ A ? μ B won?t make a tinker?s dam worth of difference in the probability distribution of future observables.

The next trick, if you cannot increase your sample size, is to change your statistic. This comes from the useful loophole in classical theory that there is no rule which specifies which statistic you must use in any situation. Thus, though some creativity and willingness to spend time with your statistical software, you can create small p-values where other people see only despair. This isn’t so easy to do in R because you have to know the names of the alternate statistics, but it?s cake in SAS, which usually prints out dozens of statistics in standard cases, which is one reason SAS is worth its exorbitant price. Look around at the advertising brochures of statistical software and you will see that the openly boast of the large number of tests on offer.

For example, for use in “testing differences between proportions”, just off the top of my head I can think of the z statistic, the proportions test with and without correction for continuity (two or three to choose from here), chi-squared test, Fisher’s exact test, McNemar’s test, logistic regression. There are dozens more and teams of academic statisticians constantly add to the pile. Don’t believe it? Here?s a small table of these tests for the TSD/Sex data from Chapter 11.

Test p-value
Prop test 0.78
Fisher’s 0.70
Logistic Reg. 0.52
chi-squared 0.50
z test 0.49
McNemar’s 0.24

Because I was only able to get to 0.24 just means I didn?t try hard enough. Which is the correct p-value? They all are; that?s the beauty of this trick. Not one of these p-values is more “right” than any other one. Each is valid. If all you know is classical statistics, let this knowledge sink in. It should prove to you that p-values are not what you probably thought they were.

For ‘testing differences between means”, there is the t-test (a couple of versions of this, actually), Wilcox test (also called Mann- Whitney), sign tests, Spearman correlation tests, Kendall’s tau, Kruskal-Wallis test, Kolmogorov-Smirnov test, permutation test, Friedman two-way analysis of variance—I’m running out of breath—and many more. Here?s some of those tests for the advertising data:

Test p-value
Spearman 0.87
Perm. 0.20
t-test 0.19
Wilcox 0.14
Kol.-Smi. 0.08

Nearly there!

Please remember that in this example, like the previous one, the data is the same; the only thing that changes is that classical statistical test.

The key to this deceit is to never admit what you did. When it comes time to write up your result boldly and authoritatively state, “We used Johnston’s (Johnston, 1983) frammilax test for differences in means.” Tossing in a citation always cows potential critics; tossing in two or more guarantees editorial acquiesence. Do not tell the reader that you went through a dozen tests to find the lowest p-value. Act as if “Johnston’s test” was what you had in mind all along.

This technique is unavailable in Bayesian or observable statistics. True, you can change your default prior distribution on the parameters or even change the model (see below), but editors in most fields are still suspicious of modern methods and tend to be conservative and will likely insist on a well-known default. There will be more room for creativity in, say, ten years when modern methods become familiar.

Our last option, if you cannot lower your p-value any other way, is to change what is accepted as publishable. So, instead of a p-value of 0.05, use 0.10 and just state that this is the level you consider as statistically significant. I haven?t seen any other number besides 0.10, however, so if your p-value is larger than this the best you can do is to claim that your results are “suggestive” or “in the expected direction.” Don’t scoff, because this sometimes works.

You can really only get away with this in secondary and tertiary journals (which luckily are increasing in number) or in certain fields where the standard of evidence is low, or when your finding is one which people want to be true. This worked for second-hand smoking studies, for example, and currently works for anything negatively associated with global warming.

4 responses so far