June 6, 2008 | 10 Comments
I’ve been looking around on various publisher’s websites over the past few weeks to see which of them might take Stats 101 off my hands. I have also been considering bringing it out myself, like my other bestseller, but would rather avoid that.
Here is an overview of (tentative title) Stats 101: Real Life Probability and Statistics in Plain English in case anybody knows a publisher.
I have successfully used this (draft) text in several introductory, typically one-semester, courses, and will do so again this summer at Cornell in the ILR school. It is meant for the average student who will only take one or two courses in statistics and who must, above all, understand the results from statistical models yet will not do much calculating on their own. Examples come from various fields such as business, medicine, and the environment. No jargon is used anywhere except when absolutely necessary. The book has also be used for self-study.
Many books claim to be a “different” way of teaching introductory statistics, yet when you survey the texts the only thing that changes are the names of the examples, or whether boxplots are plotted vertically or horizontally.
Not this book. This is the only volume that emphasizes objective Bayesian probability from the start to the finish. It is the only one that stresses what is called “predictive” statistics. I do not mean forecasting. Predictive statistics focuses on the quantification of actual observable, physical data. This book teaches intuitive statistics.
Nearly all of classical statistics and much of Bayesian statistics concentrate their energies making statements about the parameters of probability models. The student will learn these methods in “Stats 101”, too. But what the other books will not do is to put the knowledge of parameters in perspective. Concentrating solely on parameters makes you too confident and gives the student a misleading picture of the true uncertainty in any problem.
Hardly any equations appear in the book. Only those that are strictly necessary are given. The soul of this book is on understanding, which is crucial for students who will not become statisticians (it’s crucial for the later group, too, but they will seek out more math). Pictures, instead of confusing formulae, are used whenever possible.
All computations are done in R, and are presented in easy-to-follow recipes. An appendix of R commands leads the students through several common examples. No calculations are done by hand and the student is never asked to look up information in some obscure table. I have also set up a book website where the data used can be downloaded.
There are 15 chapters plus the aforementioned appendix. The book starts, unlike any other statistics book except Jayne’s advanced Probability Theory, with logic. This easy introduction intuitively leads to (discrete) probability. After that, three chapters lead up the binomial and normal distributions emphasizing their duty in quantifying uncertainty in observable data. Building intuition is stressed. These chapters are followed by two others on R and on real-life data manipulation (all at a very basic level, presented in a very realistic, plain spoken manner).
Chapter 8 introduces classical and Bayesian (unobservable) parameter estimation. Chapter 9 brings us back from parameter estimation to observables. The true purpose of building a probability model is to quantify the uncertainty of data not yet seen, yet no book (except very advanced monographs like Geisser’s Predictive Statistics) ever mentions this.
Chapters 10 and 11 go over classical and Bayesian testing, and again brings everything back to practicality and observables.
Chapters 12 and 13 introduce linear and logistic models; again classical, Bayesian, and observables methods are given.
The most popular chapter by far is 14, which is “How to Cheat” (at statistics). It is in the nature of Huff’s well known How to Lie with Statistics, but brought up to date and has many examples of how easy it is to manufacture “significant” results, particularly if you use classical methods.
Finally, the last chapter gives a philosophical overview of modern, observable statistics, and ties everything together.
Each chapter has homework questions, and I am working on an answer guide now, which I imagine can be published separately. Most homework, especially in the chapters on statistics, have the students gather, prepare, and analyze their own data, which works wonders for their understanding.
There is a division, and sometime animosity, that splits our field along classical and Bayesian lines. This book adds a third division by taking the minority position in the Bayesian field. The objective, logical probability camp is small and growing, and is, as I obviously feel, the correct position. Most of us are not in statistics departments, but are in physics, astronomy, meteorology, etc.— fields in which it is not just desired, but necessary to properly quantify uncertainty in real life data. Naturally, we argue that everybody should be interested in observable data, because it is the only data that can be, well, observed.
Because of these ideas, the book is not likely to be adopted as a primary text in many statistics classes; at least, not right away. However, I have had interest from professors, especially Bayesians, who would like to use it as a supplementary text. Other professors in computer science, physics, astronomy etc. would use it directly. It’s about 200 pages in a standard trade paperback format, ideal for an optional or secondary text.
Lastly, statistics professors themselves will form a likely audience. They will not necessarily teach from the book (not all professors, obviously, teach introductory classes), but will use it as a source for a clear introduction to logical probability and non-parameter statistics. This is a new and growing area and there is a clear benefit to being first.