**Update** Hello newcomers. Posts usually close for comments after two weeks. But since this one is getting so many views, I moved it up to re-open comments.

“Here is a column of a couple of dozen numbers. From them, calculate the mean and standard deviation. When you are finished—it should take you a good fifteen to twenty minutes—report back to me.”

So goes the instruction in many, if not most or even all, undergraduate statistics courses across the land. Part of the reason this is so is nostalgia. Professors learn statistics in a certain way; they naturally teach it in that same way. Running through endless examples of plugging numbers into calculators and pressing certain buttons are what they did while growing up and, by God, what was good enough for them, is good enough for students. So what if the students forget why they’ve done it?

This inertia is a quirk of human nature and is common in any field of instruction: its limitations are overcome easily by all serious students. Far more restraining, however, are the pernicious effects of the belief that statistics is a branch of mathematics.

Statistics is not math; neither is probability. It is true that math has proven unreasonably effective in understanding statistics, but it is not, as Wigner suggested for the relationship between physics and mathematics, the best, or at least not the sole, language to describe its workings.

That language is philosophical. Just think: statistics self-named purpose is to compile evidence to use in quantifying uncertainty in (self-selected) hypotheses. How this evidence bears on the hypotheses may be best described mathematically, but *why* it does so cannot be. It also cannot be that because statistics uses so much math that it *is* math. This would be equivalent to saying that accounting is a branch of mathematics because it too rests on multitudinous calculations.

Statistics rightly belongs to epistemology, the philosophy of how we know what we know. Probability and statistics can even be called quantitative epistemology. Our axioms concern themselves with what probability means; that is, of the interpretation of uncertainty. But we abandon those axioms too quickly, choosing instead to follow the path of equations, nearly always skimping on what those equations actually mean.

To master probability and statistics requires mastering a great chunk of math. But we begin to go wrong when we mindlessly apply equations in inappropriate situations because of the allure of quantification. Worse, we routinely reify the mathematics; for example, p-values positively wriggle with life: to most, they are mysterious magic numbers. Equations become a scapegoat: when what was supposed to have been true or likely because of statistical calculation turns out to be false and even ridiculous, the culprits who touted the falsity point the finger of blame at the math.

Philosophy sharpens the mind. It teaches us to recognize and eliminate sloppy thinking and writing, two elements rife in our field. If people spent more time thinking about what they are saying and doing, much error would be reduced or eliminated.

I’ll give just one example. Ask any statistician for the definition of a confidence interval. Chances are overwhelming that he’ll tell you something false. But he’ll believe the falsity, and because of that, he’ll go on using confidence intervals, interpreting them wrongly, and he’ll justify their use because, well, because they are being used. The reason this behavior persists is sloppy writing on the part of textbook writers: flaws which could have been largely eliminated had the authors had some philosophical training.

What is a confidence interval? It is an equation, of course, that will provide you an interval for your data. It is meant to provide a measure of the uncertainty of a parameter estimate. Now, strictly according to frequentist theory—which we can even assume is true—the *only* thing you can say about the CI you have in hand is that the true value of the parameter lies within it or that it does not. This is a tautology, therefore it is always true. Thus, the CI provides no measure of uncertainty at all: in fact, it is a useless exercise to compute one.

But ask your neighborhood statistician and you will hear words about “95% confidence”, about “long runs”, about “other experiments”, etc., etc. These poorly chosen phrases are a bar to clear thinking. They make the utterer forget that all he can say is some tautological, and therefore trivial, truth. He has concentrated on the math, making sure to divide by n minus one in the appropriate place, etc., and has not given any time to consider *why* the calculation exists.

Much nonsense in the last century has been promulgated because of sloppy thinking in statistics. It is time to stop thinking about the mathematics and more on the meaning.

*Obviously, there is much more to say: today’s thoughts are just a sketch to help clear my mind and begin a discussion. Meaning, it’s likely I will have fallen prey to my own complaint!*

Categories: Philosophy, Statistics

Amen, Brother! Here in San Antonio, we have an ongoing argument about whether logic (as taught by the philosophers) should be allowed to satisfy the (state-mandated) core mathematics requirement, as statistics does currently. There are, predictably, mathematicians AND statisticians who are vehemently opposed. Curious, I checked out a standard freshman logic text, and what did I find in the final chapters? A epistemologically-oriented discussion of basic probability, including Bayes’ Theorem! My take: many more folks need to learn to think LOGICALLY than need to think STATISTICALLY (and professors are as protective of their turf as any politician).

Confidence intervals crack me up. When I teach these to my students I remind them that the word “confidence” has two opposite meanings, and that confidence intervals have much in common with confidence games. If you go the extra mile and explain that a confidence interval is formed by finding the lowest and highest parameter values which will give a half-alpha sized tail probability delimited by your estimate (an exact interval), and that alpha is a small number pulled out of your butt, the mystique tends to vanish. No wonder some scientists simply report precision as the standard error.

But here’s the real tragedy: WE DON’T TEACH OUR STATISTICS MAJORS, EVEN THE GRAD STUDENTS, ANY OF THIS.

I would distinguish probability from statistics. Probability is math; statistics is not. Probability is deductive; statistics is inductive.

Probability was inspired by applications outside of math, but so was geometry. One could argue that geometry is a branch of physics and not math — I think V. I. Arnold actually said that once — but most mathematicians would disagree. One could also say that probability is a branch of analysis that uses non-standard terminology, calling measurable functions “random variables” etc. This is formally correct, though not particularly helpful in practice. Of course probabilists are thinking about “random” events even though strictly speaking there’s nothing “random” in the formalism of probability.

It may be helpful to bring up the difference between mathematical statistics and statistics. The former is a branch of mathematics inspired by the latter. The situation is analogous to mathematical physics. Mathematical physics is a branch of mathematics inspired by physics and useful to physics, but it’s not physics. The theorems of mathematical physics cannot be empirically proven or disproven, though they can be shown to be adequate or inadequate in describing reality. Likewise the theorems of mathematical statistics are indubitably true. Experience may show that some theorems are useless, but their validity does not rest on their usefulness.

I’ve always wanted to see “reify” used in a sentence and now I have had that opportunity. For this I thank you. I’ve always wanted to use it in a sentence myself I now I’ve finally found a sentence to use it in. I thank for this as well and also for your brief mind-clearing statistical interval.

Lots here to agree with. On the one hand, it’s hard to disagree with the statement “Statistics is not Math” when you can analogize it to the statement “Accounting is not Math”.

I think the confusion arises because most people have an intuitive understanding that “Accounting” refers to professional practice that heavily involves Math, but also other things as well (Law, Management, etc.).

Statistics, on the other hand, is thought of by most as a class, or subject, the consideration of which usually causes dread (especially if taught by a certain visiting professor with two first names:-). However, those of us who are practicing statiisticians know that, like accounting, there is much more to being effective than just a little applied math.

Anyone familiar with the work of Shewhart, Deming, and even George Box know that these men saw a much bigger place for practicing statisticians than is typically taught at most Universities.

Great topic, I could go on and on.

I think the core of the issue here was stated as:

“poorly chosen phrases are a bar to clear thinking”

I think the core of the issue here was stated as:

â€œpoorly chosen phrases are a bar to clear thinkingâ€

Math & language are some of the tools one pulls from one’s bag of tricks to develop a solution. True understanding comes from understanding the underlying physics, social factors, etc. as they apply to a given analysis/problem.

But ALL MUST be integrated — what in the aersospace industry is referred to as a “systems approach.” That’s “logical thinking” with all relevant “tools” integrated to the appropriate degree at the proper place in the analytical process.

But, since communication is intrinsic to analysis at all points along the way, I’d argue that even more important than “logical thinking,” etc. is the ability to both grasp & employ language.

Peole that confuse arithmetic with math are the same people who confuse standard deviations and averages with statistics and who think surfing the web is the same thing as computing.

Ken,

The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.

I guess all this somewhat depends on accepted definitions of Statistics and Mathematics. I recognize the ambiguity and illogic in the notion of confidence intervals but is not some degree of an equivalent conceptual fuzziness true in most disciplines that value rigor? Didn’t Godel kind of prove this?

All,

Very busy today and tomorrow; exams. Will answer all in full later.

But…John: I say there is no difference between probability and statistics. Statistics is just the working of probability problems. And, of course, I would also say that it is possible to show the theorems of mathematical physics false in some instances. Let’s hope we can lure Tom Vonk in on this.

I have 95% confidence that the phrase “Statistician to the Stars!” could usefully be abbreviated “Startistician!”.

Uh uh, it is more than mathematics, imho. Mathematics is an element of Statistics.

The mechanical calculations of statistics (numbers), e.g., standard deviation, should not be emphasized in class. They are busy work, not math. Mathematics is essential to Statistics. Simple examples: why is the standard deviation a useful measure of spread or variability? Why is the Pearson correlation a measure of linear association? Their clever mathematical definitions make them so!

Over-emphasis on math in teaching Statistics might not be helpful in many ways, however, theoretical/mathematical understanding is often required to develop new statistical methodologies.

I am all in favour of clear thinking, but what do we gain from classifying statistics as being one thing rather than another?:

â€œStatistics rightly belongs to epistemology, the philosophy of how we know what we know.â€

To me this statement does not clarify very much. Rather it shifts the question into an even more contentious sphere.

What is knowledge? (justified true belief)

What is truth? (correspondence with facts)

What are facts? etc

leading ultimately to a circular set of definitions â€“ as do all the â€œwhat is Xâ€ questions.

For me, what is important about statistics, math, science etc is their predictive ability, not the taxonomy of knowledge.

The question about the confidence interval should be â€œwhat is its predictive power?â€ not â€œwhat is it?â€

StephenPickering:

Well said. You have more precisely and clearly said what I was trying to say.

I have long fought this kind of battle.

One of the things I do is industrial modeling, big models with dozens of simultaneous equations and hundreds of variables. I have a small horde of students (I initially spelled that hoard…) that help and to whom I try to inculcate some basic forecasting techniques.

Whilst many come with econometric skills learned by studying Green et al, I spend all my time with them talking model and equation specification.

Statistics is ultimately always descriptive and can help you gain greater insight into what the statistics are trying to describe. Hence: statistics rightly do belong to epistemology, as one of the tools of the researcher.

The greatest failure of the economics profession is the teaching of statistics without the teaching of understanding real-world economics. I will personally sit down and give any student who comes to me with a wonderfully tested, superbly estimated equation that has nothing to do with reality a serious discussion of what line of work they are suited for. Economists cannot be bean-counters and number-crunchers: they must instead impart insight and understanding into why the economy behaves like it does.

The question about the confidence interval should be “Are you really that confident about your forecast? Willing to put down real money?” and not how to calculate it.

Stephen/Bernie,

Ah, but this is the point, isn’t it. Those three more contentious questions are what statistics is all about. You cannot even begin to answer “What is the CIs predictive power” without first having answered those three questions.

Continue the example: the CI says—for your data and your problem—that the true value of the parameter either lies withing the interval or it doesn’t. How will you verify this claim? That is, how will you assess the predictive power of the CI? You cannot. Parameters are forever unobservable. You will never know whether the parameter was actually in this interval or not.

But if you have a go at the contentious questions, you can discover a way to talk directly about a model’s predictive power. Indeed, I am always saying this is what all people should do with their models.

Search for “predictive” here for more on this. Or I’ll have more soon.

So stats isn’t math? How come it’s all numbers then!

Yes, it is true that we don’t know whether Casey will hit a home run next at bat. But we can map his past performance with math and present the statistical parameters we generated. What you do with those numbers, including your emotional reactions, is your business. And possibly Casey’s business when he is negotiating his next contract.

What of logic and epistemology? They don’t need stats; stats needs them. Stats can be done illogically, and often is, which is problematical (double entendre!).

Probability is equally dependent on logic, but may also be done without, and often is, etc.

Epistemology is epistemology. It stands alone. The study of the nature of knowledge underpins every other kind of study: scientific, mathematical, artistic, etc.

Then there is epistemic logic, a horse of a different color. Is it math or not? You decide: see the Stanford Encyclopedia of Philosophy here:

http://plato.stanford.edu/entries/logic-epistemic/

As far as predicting the future goes, if it weren’t for the valid foresights of frequentist mathematical statistics, casinos would go broke.

Ronnie,

Those foresights can also be had in logical (objective) probability. In fact, I argue that those insights are logical. Probability may not be separated from logic; it is logic.

Yeah well… if it uses numbers, then it’s math.

Philosophically, all knowledge is probabilistic. It may not, however, always be logical.

PS – How do you like my new handle? I use it when answering the phone: “Hello, this is Ron Number.” Most callers hang up, which relieves me from having to talk to them.

Of course confidence intrvals are correct- as applied to a model. If your little T distribution does not apply to reality, too bad.

Mr Briggs said: â€œThose three more contentious questions are what statistics is all about. You cannot even begin to answer â€œWhat is the CIs predictive powerâ€ without first having answered those three questions.â€

Well, as those three questions remain contentious, I suppose you would have to conclude that nothing can yet be said with certainty about the predictive power of CIs. Concerning the predictive power of models, should they not be validated by comparing with data generated after the prediction, rather than by answering the three contentious questions?

The contrast with the physical sciences is stark. They have high predictive power without requiring knowledge of what exists. For example, Newtonâ€™s law states: Force = Mass x Acceleration. It enables people to fly rockets to the moon and back. Yet what are force, mass and acceleration?

Force: is what causes mass to accelerate.

Mass: is what is accelerated by force.

Acceleration: is what happens when force acts on mass.

It is not possible to gain greater understanding of Newtonâ€™s law by refining those definitions, nor is its predictive power dependant on epistemology. The fact that we cannot give non-circular definitions of the basic concepts of physics doesnâ€™t hinder us in any way from making accurate predictions.

I also tend to the view that statistics is purely descriptive and has no intrinsic predictive power. Instead, the predictive power resides in any hypotheses that are subsequently formulated concerning the mechanisms that were in operation and that were responsible for the data. Statistical analysis is a springboard for the creative process of formulating theories.

Although I fully agree that people should think carefully what they are doing with statistical analysis, I am not convinced that we need to go so far as solving epistemological issues.

I canâ€™t tell if the word â€œintrinsicâ€ makes a difference, but it is obvious that Statistics is not a fortune teller. Many people think the existing statistical measures of â€œpredictive powerâ€ are not up to par. I have a challenge for those people, given whatever information/data we have, please come up with one! â€œStatisticsâ€ is not denying the uncertainty in the prediction! It provides tools for you to make the best of the information.

Think about our life. We often have to make decisions using whatever information/data on hand. And we know sometimes our decision is probably good depending on the situation. For example, I know my kidsâ€™ academic records (data), I can predict well what theyâ€™ll earn on their next math test. Do I use statistics? Yes. Does it have predictive power? Yes. Should I use a ready-made software thatâ€™s written based on the math behind the tools? I should. Do I use mathematics? I should, but perhaps not in this case, because their past performances (data) are consistent (ohh, this can be quantified as a measure of the predictive power). Just a simple example.

@ JH

I said â€˜intrinsicâ€™ predictive power because we often unconsciously supplement the data with assumptions such as â€˜every thing else being equalâ€™.

Extrapolating from the past to the future works if the important parameters remain the same, but one really ought to know whether that is the case or not. The financial industry caveat: â€œpast performance is no guide to the futureâ€ is right to warn against extrapolation without understanding the underlying mechanisms.

The real test of predictions is their performance when the parameters change by significant amounts. For example, if a student has the misfortune to miss half his lessons through illness one semester, then a statistical analysis of his past test results is unlikely to accurately predict his next test result. On the other hand, if the parameters do not change, and his results are consistent, then is the prediction not rather tautological? I.e. if nothing changes, everything stays the same.

StephenPickering,

Define what you meant by â€œparametersâ€. For the example you described, no, I won’t be able to know how the

variable“days of absence” would affect the student’s performance if I don’t have any information about the variable. However, I can incorporate the variable in my predicting strategy (statistical modeling), which will require proper collection of data… Statistics this is!The more information we have, the better conclusions/decision/prediction should be reached… we don’t need to know statistics to see this. And the science Statistics uses mathematics to formalize this idea.

JH

Sorry! I should indeed have written variable not parameter.

Actually, after your last message, I donâ€™t think our views on the substance are necessarily so very different.

I would consider the case that you describe as that of the construction of a model for predicting the effects of variables, such as number of lessons missed, on the test performance. I would approach this by trying to choose an equation that fits the data. (At this point it occurs to me that by statistical modelling perhaps you meant probabilistic modelling?) However, if the model involves an equation, then I would attribute the predictive power of the model to the equation, rather than to the statistical data. After all, there may be more than one equation that could have fitted the data.

If one gets the choice of equation right, as with the case of Newtonâ€™s law, then the statistical data, with all its limitations, are irrelevant for the accuracy of the prediction, which is far greater than the data alone would justify.

Probabilistic modelling is something else. It would concede that it has predictive power, though limitations concerning the range of applicability (no extrapolation) and accuracy.

Well, Briggs. You have a lot of fodder in the comments to feed your refinement of your sketch above. Is statistics math? Is it the roadmap to logical inference? Is it merely descriptive? Is it epistemology? What, exactly, is “probability?”

I await your next philosophic thrust.

Mr. Briggs,

Perhaps the reason that some current statisticians (including students thereof) in our time do not grasp the philosophical place wherein statistics belongs is the deficiency of philosophic education in the early part of the education process. I think this is the case.

So, I applaud you bringing out the philosophic context and content to show the relationship more clear between all branches of science and that first science which is philosophy.

I look forward to your futher posts. I’m hooked. : )

John

Yes I agree.Statistics is not mathematics

Statistics deals with random variables.

For a mathematician random variable is just a measurable function.

For a statistician , on top of that , random variables (according, I think to the late Prof Geoffrey S. Watson) is the soul of an observation. Or , the observation is the birth of a random variable.

B. de B. Pereira

> I say there is no difference between probability and statistics.

> Statistics is just the working of probability problems.

This seems a bit absurd. Probability is a branch of mathematics. If statistics is not mathematics, then statistics cannot be probability. QED

…or as my multivariate analysis professor put it, “being a statistician means never having to say you’re certain”.

This discussion reminds me of a very brief conversation I had while I was a math grad student circa 2004 with Percy Diaconis, a man who is considered (at least by mathematicians) to have earned his chops in both math and statistics. It was lunchtime at a math conference, folks eating their bag lunches out in the courtyard. In my education, I had received assurances that statistics and mathematics were very different fields, as evidenced in part by the fact that most schools these days have separate departments for the two subjects. I took the rare opportunity to sit down next to Percy, and I asked him whether he could give me a capsule description of the basic difference(s) between mathematics and statistics. He finished chewing his bite, then said, “I’m preparing an article about that.” Then he went back to eating his sandwich. End of conversation.

I figured that if an expert like Percy couldn’t give me a five-minute-or-less summary of the distinctions, even granting some oversimplifications, then the differences must be murky indeed. I never looked up his article, to see whether it was ever published. These days I work as an applied mathematician, and I am still on the statistics learning curve. I’m very aware of some of the differences, but I don’t yet know enough to speak knowledgeably about them. I’m grateful to see some discussion of the topic here.

As an aside: I appreciate the contrast between math and accounting, but sadly it is also a distinction that few appreciate. The typical man-in-the-street has no clue what mathematicians do, and when pressed to guess, they come up with visions of manipulating lists of numbers all day, like accountants, or perhaps more accurately, like bookkeepers. They don’t know that most mathematicians rarely deal with actual numbers.

I’m an undergrad Biology major, interested in being a research biologist. I’ve heard from biologists that many math and statistics courses are useful for research beyond calculus, specifically linear algebra, possibly differential equations, and especially statistics. The argument is that biologists who don’t study statistics seriously are more likely to abuse it without understanding what the numbers mean. For this reason I signed up to take an introductory biostatistics course this fall. Is there anything else you think I should learn, or any courses you think I should take, specifically in the way that statistics apply to science? The goal being to UNDERSTAND, not just quantify, uncertainty. Cause that’s why I signed up for biostatistics in the first place.