Reader Kip Hansen asks, “Can you please run a brief explanation of what the Mexican Hat fallacy is statistically?”
I can. The Mexican Hat Fallacy, or Falacia Sombrero, is when a man moves from sunny to cloudy climes, such as when a hombre shifts from Veracruz to Seattle, and thus believes he no longer need wear a hat. This is false. A gentleman always wears a hat—not a baseball cap—not just because it regulates heat and keeps you dry, but because it completes any outfit.
Well, that’s the best joke I could come up with.
The term was coined by Herren von Storch and Zwiers in the book Statistical Analysis in Climate Research and it came about like this: Some fellows were wandering in the Arizona dessert sin sombreros and came upon the curious rock formation pictured above (image source).
One fellow said to the other, “Something caused those rocks to resemble a sombrero.” The other fellow, more sun-stroked then the first, disagreed, “No, no thing was its cause. That’s my null.” Quoting from a paper by Herr Gerd Bürger (because I had never heard of this fallacy before):
By collecting enough natural stones from the area and comparing them to the Mexican Hat [formation], one would surely find that the null hypothesis ‘the stone is natural’ is quite unlikely, and it must be rejected in favor of human influence. In view of this obvious absurdity [von Storch and Zwiers] conclude: ‘The problem with these null hypotheses is that they were derived from the same data used to conduct the test. We already know from previous exploration that the Mexican Hat is unique, and its rarity leads us to conjecture that it is unnatural.’ A statistical test of this kind ‘can not [sic] be viewed as an objective and unbiased judge of the null hypothesis.’
Which leads me to (hilarious) joke number two. There are two kinds of people: those who find null hypotheses a useful philosophical concept, and those who don’t. This description is confusing—but then so ultimately are most stories about “null” hypotheses.
If the “fallacy” merely means that the closeness of model fit to observed data is not necessarily demonstrative of model truth, then I am with them (this is why p-values stink). You can always (as in always) find a model which fits data arbitrarily well—as psychoanalytic theory does human behavior, for example—but that does not mean the model/theory is true. Good fit is a necessary but not sufficient condition for model/theory truth. A (nearly) sufficient condition is if the model/theory predicts data not yet known, or not yet used (never used in any way) to fit or construct or posit the model/theory—as psychoanalytic theory does not well predict new behavior.
The parenthetical “nearly” is there to acknowledge that, in most cases, we are never (as in never) 100% certain an empirical model/theory is true. But we can be pretty sure. Thus we do not say “It is 100% certain evolutionary theory is true,” but we can say, “It is nearly certain evolutionary theory is true.”
So much is Stats 101. Yet I’m still perplexed by Bürger-von Storch-Zwiers’s example. If we “already know from previous exploration” that the Mexican Hat formation was caused by (say) weathering, then collecting rocks from nearby isn’t useful—unless one wants to play King of the Hill. And what does “comparing” these rocks to the formation mean? Should the individual stones resemble the formation in some way for the formation to be “natural”? The rocks nearest will be made of the same material as the formation, so this is no help.
Regarding the possible causes or hypotheses of formation, they are infinite. It is we who pick which to consider. It could be, for example, that we’ll soon see a History Channel “documentary” which claims ancient Egyptians were flown to Arizona by aliens under the guidance of angels to build the Sombrero so that the Hopi could use it in a religious ceremony that was eventually secretly used by Hitler in his bid to conquer the USSR.
Let’s call this the “null” hypothesis. Why not? The “null” is ours to choose, so it might as well be something juicy. I bet if we link this around that, give the ingenuity of internet denizens, within a week we would have enough corroborative evidence for it to satisfy any NPR listener.
Speaking of hats, if you’re looking for a genuine Panama to cool your pate in the summer months, may I recommend Panama Hats Direct? I get nothing for this endorsement, except the satisfaction of helping this fine company stay in business. (If this is your first, go for the $95 sub fino. It is a fantastic deal.)
The Null Hypothesis is exactly that: the hypothesis that some parameter is zero (null). For example, that (μ-n)=0, (μ1-μ2)=0, or that Ï=0. Or that long-term process variation is zero (“in control”). This is parallel to the legal standard that the accused is assumed innocent unless guilt is proven beyond reasonable doubt. The great thing about stats is that you can know how reasonable the doubt is. But that is also why Scottish juries are best: they bring a verdict of Proven/Not Proven. Either you reject the null hypothesis (proven) or you don’t (not proven). But (not proven) doesn’t mean (innocent). It just means you have not proven guilt. Not rejecting the null hypothesis does not mean the null hypothesis is true. It means you have no grounds for rejecting it.
Our two wandering archeologists do not understand null. Since archeologists deal in artifacts, perhaps they are predisposed to see them where they are not. I would hate to see them as prosecutors in a legal case.
Enough tequila will make any hypothesis null, thus your Falacia Sombrero……..
YOS,
That’s historically so—the value of some parameter being zero. And your argument from tradition a good one. But since you can write models with almost any parameterization, you could always write one so that parameter you want equal to zero is said to correspond to that which you do or do not want to be so. Hence the History Channel model could be a “null”, as could any other scenario.
See what madness happens when you abandon tradition?
Besides if the hypothesis (μ-model prediction)=0 is rejected, it likely means the model is wrong. (Box would be astonished…(Fe)) But if it is not rejected, it doesn’t mean the model is correct. The Ptolemaic model made valid predictions for well over two millennia before it was finally falsified.
Briggs,
Your statement “in most cases, we are never (as in never) 100% certain an empirical model/theory is true” seems to have too many qualifiers, and one is too many, to be useful. “In most cases” is barely distinguishable from “in some cases”. All you have eliminated is “no cases” and “all cases”.
William Sears,
Yeah, badly written. I’ll clear it up in the series of Subjective vs. Object Bayes (coming soon to a browser near you).
BTW, I’ll see your sombrero and raise you Lord Krishna’s Butter Ball.
https://www.google.com/search?q=krishna%27s+butter+ball&client=firefox-a&hs=Nrf&rls=org.mozilla:en-US:official&tbm=isch&tbo=u&source=univ&sa=X&ei=oSmFUfm-G6fO0wGqpIHgDA&ved=0CEAQsAQ&biw=1252&bih=548
YOS,
I think that I see Sisyphus in one of those pictures.
Matt, Thank you for answering my question on the Mexican Hat fallacy.
In this discussion, the idea of “null hypothesis” seems to have captured more attention than the original point–>the MHF.
Matt, please correct me if I am wrong here, but I thought the main idea was expressed in this bit –> “The problem with these null hypotheses is that they were derived from the same data used to conduct the test. We already know from previous exploration that the Mexican Hat is unique, and its rarity leads us to conjecture that it is unnatural.” In so much as the MODEL is built from the data and then the data is used to “prove” the model is correct/true. The model is an arbitrary, subjective “explanation” [one could use hypothesis, but I have avoided it as it has stirred up the commenters] built from a similarly arbitrary, subjective set (number of sets) of data.
An example in Climate Science is the selection of GST (global surface temperature — a times series of fictitious data) and the Keeling Curve (atmospheric CO2 concentration in ppm) to form the AGW hypothesis and, literally, climate models based on this hypothesis.
I was fascinated by your example of psychoanalysis as a “model” for human behavior.
I have extended this idea to the concept that all cultic movements are built on a charismatic leader wielding a powerful (if wildly eccentric) mental/spiritual model formed from commonly observed human behavior–which model is the ‘true secret’ to the perplexities of human life–and which is demonstrably proven to be true by referring back to the self-same observed human behavior from which it was built. Since the foundation is actually every day behaviors with an overlay of explanation (“the secret”) — many people will believe that the belief system (model) is true, no matter how bizarre, because anyone can see the “data” on which it is based it true.
This “how to form a cultic belief” uses the same method as CliSci — everyday experience (some people are bothered by mental images of the past and talking about them can help) plus The Magic Secret (whatever the cult’s basis is — images from the past have vast power over the psyche of man and must be exorcised by [the charisma of] His Holiness-The Master).
Model formed from data (plus arbitrarily selected explanation — which can be another “correlated” data set) and data used to prove the model.
Thanks Matt.
I am also having a problem with the idea of a null hypothesis, but for a different reason. The thing one should want to prove is that a prediction made by a certain hypothesis does not match the experimental data. Reason is, this is the only thing one can prove. There is no null hypothesis here, only competing hypotheses. Saying that a hypothesis is falsified is not the same as saying that its negation is proven true, because we are not interested in the values we will not measure, we want to know what we will measure. If the prediction is that we see 110, and the prediction is not measured, we want to know what we are going to measure. -1000? 1000000000000010? 12334567890? Etc for all the numbers not eaqual to 110.
If the null hypothesis is a figure of speech, it is a bad one as it promotes fuzzy thinking. The null hypothesis of man-made global warming is not natural climate variations. There is instead a long list of competing hypotheses with very specific values for the exact influence of CO2 on climate, with each family of CO2-hypotheses with a very specific influence having an even longer list of other influencers (water vapor, methane, soot, whatever). There is no null hypothesis here.
A Null Hypothesis is not something you pull out of your ass anytime you feel like to… That is actually the definition for a Bayesian Prior distribution ^.^
Mexican Hat is in Utah
Not sure what “derived’ means here. Think of the textbook example of a jury trial given by YOS, the null hypothesis (Ho) of innocence is not derived or set up by the data/evidence. The defendant’s guilt beyond a reasonable doubt is supposed to be derived from the data evidence.
We really don’t know how each jury decides whether to reject Ho. In the probability notation used by Burger,
(1) a jury may reason that P(Ho|d), given data/evidence, the probability that the accused is innocent, is extremely unlikelyâ€.
(2) a jury may reason that P(d|Ho), if the accused is innocent, the probability of observing the presented evidence against the accused, is extremely unlikelyâ€.
Based on my reading, Burger criticizes (2), and claims that (1) should be the way to go. However, he doesn’t explain how it can be done. Criticizing without offering solution is not scholarly.
We all know about the steps: form a hypothesis, collect data, analyze data and make inferences. A model is built to analyze the data, but there is no guarantee that a null hypothesis will be rejected. Some psychologists/sociologists complain that many studies are not published due to lack of statistical significance.
JH, unfortunately for many defendants, the P(guilt|trial) in the eyes of the typical juror is close to 1.
As much as I would like to wear a gentleman’s hat, I have not seen an adequate solution to the “hat-hair problem,” from which I suffer. Perhaps in the good old days men avoided this problem by slathering their hair in oil?
Kip,
Nothing philosophically wrong with generating hypotheses from observations. Indeed, it may be the only way. Often, because of the nature of people, generated hypotheses will fit the observations well. But that does not mean these hypotheses/models/theories will make good predictions for data as-yet-unknown.
A friend asked that I explain this bit from my earlier comment:
“An example in Climate Science is the selection of GST (global surface temperature — a times series of fictitious data) and the Keeling Curve (atmospheric CO2 concentration in ppm) to form the AGW hypothesis and, literally, climate models based on this hypothesis.”
GST — global surface temperature — a times series of fictitious data. The term “fictitious data” is not meant to imply that the data is made up our of the air or necessarily faked. It only means that is it not in any way directly measured, in and of itself, yet is derived from thousands of other data sets through complex and oft-times (as in always) controversial statistical methods.
Epidemiology is filled with these types of “fictitious data sets” as well, where some biological marker is used as a direct substitute for a desired or not-desired outcome.
An objective method is to apply William Dembski‘s Explanatory Filter. See FAQ: How do we Detect Design?
e.g. apply the Explanatory Filter to the Mexican Hat formation and to Mount Rushmore.
Differing natural causes also need to be recognized and distinguished. e.g.,
Report: Wind, not water, formed mysterious mound on Mars