Regular readers know Uncertainty proposes we go back to the old way of examining and making conclusions about data, and eschew many innovations of the 20th Century. No p-values, no tests, no posteriors. Just plain probability statements about observables and a rigorous separation of probability from decision.
These criticisms you know (or ought to by now). So why not let’s do a case study or three, and take our time doing so. Case Study 1 uses the same data presented in Uncertainty. We’re interested in quantifying our uncertainty in a person’s end-of-first-year College GPA given we know their SAT score, high school GPA, and perhaps another measure we might have.
Now right off, we know we haven’t a chance to discover the cause—actually causes—of a person’s CGPA. These are myriad. A GPA is comprised of scores/grades per class, and the causes of the score in each class are multitudinous. How much one drank the evening before a quiz, how many hours put in on a term paper, whether a particular book was available at a certain time, and on and on.
It is equally obvious a person’s HGPA or SAT does not and cannot cause a person’s CGPA. Some of the same causes responsible for the HGPA, SAT might appear in the list of causes for CGPA, but it’s a stretch to say they’re identical. We could say “diligence” or “sloth” are contributory causes, but since these cannot be quantified (even though some might attempt such a maneuver), they cannot take their place in a numerical analysis.
Which brings up the excellent question: why do a numerical analysis at all?
Do no skip lightly over this. For in that query is the foundation of all we’ll do. We’re doing a numerical, as opposed to the far more common qualitative (which form most of our judgments), study because we have in mind a decision we will make. Everything we do must revolve around that decision. Since, of course, different people will make different decisions, the method of analysis would change in each case.
It should be clear the decision we cannot make is about what causes CGPA. Nor can we decide how much “influence” SAT or HGPA has on CGPA, because “influence” is a causal word. We cannot “control” for SAT or HGPA on CGPA because, again, “control” is a causal word, and anyway HGPA and SAT were in no way caused, i.e. controlled, by any experimenter.
All we can do, then, if a numerical analysis is our goal, is to say how much our uncertainty in CGPA changes given what we know about SAT or HGPA. Anything beyond that is beyond the data we have in hand. And since we can make up causal stories until the trump of doom, we can always come up with a causal explanation for what we see. But our explanation could be challenged by somebody else who has their own story. Presuming no logical contradiction (say a theory insists SAT scores that we observed are impossible), our “data” would support all causal explanations.
This point is emphasized to the point we’re sick of hearing it because the classic way of doing statistics is saturated in incorrect causal language. We’re trying to escape that baggage.
So just what decision do I want to make about CGPA?
I could be interested in my own or in another individual’s. Let’s start with that by thinking what CGPA is. Well, it’s a score. Every class, in the fictional college we’re imagining, awards a numerical grade, F (= 0) up to A+ (A = 4, A+ = 4.33, and so on). CGPA = score per class divided by number of classes. That’s several numbers we need to know.
How many classes will there be? In this data, I don’t know. That is to say, I do not know the precise number for any individual, but I do know it must be finite. Experience (which is not part of the data) says it’s probably around 10-12 for a year. But who knows? We also can infer that each person has at least one class—but it could be that some have only one class. Again, who knows?
So number of classes is equal to or greater than one and finite. So, given the scoring system for grades, that means CGPA must be of finite precision. Suppose a person has only one class, then the list of possible CGPAs is 0, 0.33, …, 4, 4.33 and none other. If a person has two classes, then the possibilities are 0, 0.165, 0.33, and so forth. However many classes there are, the final list will be a discrete, finite set of possible CGPAs, which will be known to us given the premises about the grading system.
Suppose a student had 12 classes, then his score (CGPA) might be (say) 2.334167. That’s 7 digits of precision! This number is one of lots of different possible grades (these begin with 0, 0.0275, 0.055, 0.0825, …). And there is more than one way to get some of these grades. A person with a CGPA of 2 might have had 12 classes with all C’s (= 2), or 12 with half A’s and half F’s; and there are other combinations that lead to CGPA = 2. And so now we have to ask ourselves just what about the CGPA we want to know.
We’ve reached our first branching point! And the end of today’s lesson. See if you can guess where this is going.
I’ll answer all pertinent questions, but please look elsewhere on the site (or in Uncertainty) for criticisms of classical methods. Non-pertinent objections will be ignored.
Research has shown that HGPA has the highest correlation with CGPA, even with the wide variance in H quality affecting the GPA. Not to mention that there are almost as many ways to calculate and/or scale HGPA as there are schools and then it usually is converted to the college scaling for comparison. Regarding the CGPA, such factors as eight o’clock classes, homesickness, instructor experience (I’ve seen evidence that new instructors tend to grade differently over time as they adjust to the capabilities of their students), and many more intangible ones contribute to the final number. Curious that the difference between the summa cum laude and magna cum laude distinction should come down to 0.01 of a point. FWIW, the vagaries and fallacies of grading are one reason for the movement toward experiential learning and learning assessments that eschew grades.
“Briggs: “This point is emphasized to the point we’re sick of hearing it because the classic way of doing statistics is saturated in incorrect causal language. We’re trying to escape that baggage.”
This looks like another ‘correlation doesn’t mean cause’ presentations. Is this really necessary??? Yes, but only to a limited extent, this is by no means the problem its made out to be here — there’s a much much larger issue being ignored.
A substantial, arguably overwhelmingly dominant, chunk of the observed problem is ethical — willful abuse of the tool — not misunderstanding. A substantial number of “researchers” actively abuse statistics to achieve a desired result (finding, or creating the illusion of having found, something publishable).
In response to abuse of a tool endeavoring to change the tool is misguided like the T-shirt that reads: “Saying guns kill people is like saying forks make people fat.” When the issue is one of abuse, address the abuser and the motives for abuse (e.g. the “publish or perish” culture). Trying to refine the tool, or even succeeding in refining it, to make it less prone to abuse will work about as well for statistics as Sen Feinstein’s initiative to ban scary assault weapons as a means of reducing gun crime.
No amount of education about proper application of the tool (statistics) or teaching the limits of reasonable interpretation of calculated results will alter willfully unethical behavior. Nor will tossing out some parts of the tool (e.g. p-values). Any tool is prone to abuse, change the tool and the type of abuse will change accordingly, not go away.
When a criminal uses a gun to commit a crime, go after the criminal, not guns.
When a researcher uses statistics to present false results, go after the researcher, not the tool (statistics).
Too consistently in this blog the blogger makes the very same correlation-shows-causation mistake trying to be remedied:
Observation: Researcher presents false conclusions via stats
Factor: False conclusions derive from misunderstanding
Factor: Researchers seek truths
Conclusion: Therefore, the researcher didn’t understand what they were doing with stats — need to educate, and/or, alter the stats tool to mitigate abuse
Consistently, despite emphasizing the nearly infinite breadth of factors that affect an outcome, the blogger consistently ignores two factors:
– Willful abuse to achieve a desired outcome
– Desired outcomes include personal prestige (another publication, lecture tour….)
— A new truth, if found, is great; but if personal prestige can be achieved with fudging (lying), so be it
Those ignored factors are routinely the core of the observed issue — those are selfish human factors (psychology) consistently ignored here.
Those human factors commonly derive from researchers that live by a “Shame Culture” vs. the “Guilt Culture” value system. Most Western societies largely hone to a “Guilt Culture” value system (Arab & Asian societies much less). There are notable Western subcultures: Very many that gravitate to academia and other research arenas, where publication pressures are substantial, live by a “Shame Culture” value system — these personalities have no difficulty fudging, or fabricating, research results, including the willful abuse of statistics, as long as they believe nobody will out them.
Guilt vs Shame values are described at: http://www.doceo.co.uk/background/shame_guilt.htm
BOTTOM LINE: The human factor/psychology of many people is such that willful abuse of research tools, statistics included (but not limited to), is the issue needing to be addressed. Trying to refine the tool to mitigate the abuse sidesteps the real issue — treats the symptom as the issue. It’s high-time that those behavioral factors get due consideration around here.
Notwithstanding however depressed I become over how many people do not appear to even remotely understand what you are about here, I am excited, thrilled, overjoyed, and just dad-blamed happy at this, the beginning of this series.
I could scarcely applaud you enough if I began now and applauded until the heat death of the universe, or two minutes in a row, I forget which.
However many students learn something from this series, it won’t be enough — but it will be some.
And I hope that someone will be able to consult the Wayback Machine decades after I’m long gone and still profit from this series. Because sadly, decades after I’m long gone, these classes will still probably be needed. Just saying.
Kudos to you, Matt, for persevering, and for your generosity to those willing and able to learn.
Ken, when people do call the researcher to task for misusing statistics, as was done with one AGW proponent, they get sued (as did Mark Steyn and National Review). The prospect of legal entanglement does inhibit calling fakirs out, I would think.
Pingback: Predictive Case Study 1, Part II – William M. Briggs
Pingback: Free Data Science Class: Predictive Case Study 1, Part III – William M. Briggs