Free Data Science Class: Predictive Case Study 1, Part VI

Review!

This class is neither frequentist nor Bayesian nor machine learning in theory. It is pure probability, and unlike any other class available. And for the right price!

Last time we completed this model:

(5) Pr(CGPA = 4 | grading rules, old observation, fixed math notions).

What we meant by “fixed math notions” gave us the multinomial posterior predictive, from which we made probabilistic predictions of new observables. Other ideas of “fixed math notions” would, of course, give us different models, and possibly different predictions. If we instead started from knowledge only of measurement, and grading rules, we could have deduced a model for new observables, too. This is done in Uncertainty. But the results won’t, in this very simple case for our good-sized n, be much different.

We next want to add other measurements to the mix. Besides CGPA, we also measured High School GPA, SAT scores (I believe these are in some old format; the data you will recall is very old and on an unknown source), and hours spent studying for the week. We want to construct models like this:

(7) Pr(CGPA = 4 | grading rules, old observables, old correlates, math notions),

where “old observables” are measures CGPA and “old correlates” are measures of things we think are “correlated” with the observable of interest.

This brings us to our next and most crucial questions. What is a “correlate” and why are we putting them in our models? Don’t we need to test the hypotheses, via wee p-values or Bayes factors, that these correlates are “significantly” “linked” to the observable? What about “chance”?

Here is the weakest point of classical statistics. Now we have no chance here of having a complete discussion of the meaning and answers of these questions. We’ll have a go, but the depth will be unsatisfactory. All I can do it point to Uncertainty, and to other articles on the subject, and hope the introduction here is sufficient to progress.

What many are after can’t be had. The information about why a correlate is important is not in the data, i.e. the measurements of the correlate itself. Because of this, no mathematical function of the data can tell us about importance, either. Importance is outside the measured data, as we shall see. Usefulness is another matter.

Under strict probability, which is the method we are using, a “correlate” is any measure of bit of evidence you put on the right hand side. Here is where ML/AI techniques also excel. For instance, a correlate might be, “sock color of student worn on their third day of class.” With that, we can calculate (7).

Suppose we calculate these:

(7a) Pr(CGPA = 4 | grading rules, old obs, sock color, math) = 0.05,
(7b) Pr(CGPA = 4 | grading rules, old obs, math) = 0.05,

and the same for every values of CGPA (here we only have 5 possibly values, 0-4, but what is said counts for however we classify the observable). I mean, the prediction is the same (exactly identical) probability whether or not we include sock color, then in this model in this context and given these old obs, the sock color is irrelevant to the uncertainty in CGPA.

If we change anything on the right hand sides of (7a) or (7b) such we get

(7a) Pr(CGPA = 4 | grading rules, even more old obs, sock color, math) = 0.05,
(7b) Pr(CGPA = 4 | grading rules, even old obs, math) = 0.051,

then sock color is relevant to our uncertainty in CGPA. Relevance, then, is a conditional measure, just as probability is. Any difference (to withing machine floating-point round off!) in probabilities for any CGPA (with these givens), then sock color is relevant.

Irrelevance is, as you can imagine, hard to come by. Even a cloud, made up of water and cloud condensation nuclei, can resemble a duck, even though the CCN have no artistic intentions. As for importance, that’s entirely different.

Would you, as Dean (recall we are a college dean), make any different decision given (7a) = 0.05 and (7b) = 0.051? (You have to also consider all the other values of CGPA you said were important, and at least one other value will differ by at least 0.01.) If so, then sock color is useful. If not, then sock color is useless. Or of no use. Even though it is, strictly speaking, relevant.

Think about this decision. Think very hard. The decision you make might be different than the decision somebody else makes. The model (7a) may be useless to you and useful to somebody else.

And then you think to yourself, “You know, that 0.01 can make a big difference when I consider tens of thousands of students” (maybe this is a big state school). So (7a) becomes interesting.

Well, how much would it cost to measure the sock color of every student on the third day of their class? It can be done. But would it be worth it? And you have to know it if you use (7a) instead of (7b). It’s a requirement. Besides, if students knew about the measurement, and they caught wind that, say, red colors have higher probabilities of large CGPA than any other color, wouldn’t they, being students and by definition ignorant, wear red on that important day? That would throw off the model. (Answering why we do next time.)

Now if you dismiss this example as fanciful and thus not interesting, you have failed to understand the point. For it is the cost and consequences of the decisions you make that decide whether a relevant “variable” is useful. (Irrelevant “variables” are useless by definition.) We must always keep this in mind. The examples coming will make this concept sharper.

“But, Briggs, what could sock color have to do with CGPA?”

Sounds like you’re asking a question about cause. Let’s save that for next time.

It’s Christmas Break! Class resumes on 9 January 2018.

5 Comments

Gary

December 19, 2017, 9:52 am

Sock color may not be relevant to CGPA, but there is evidence that the rigor of high school course work may be.

https://research.collegeboard.org/publications/validity-academic-rigor-index-ari-predicting-fygpa

Whether gathering and analyzing the data is worth the effort is the question. For students likely to earn CGPA>=2, maybe not. For students at risk of CGPA<2, maybe there is, given an institutional mission of providing access to a college education for students who may be academically somewhat under-prepared. Decisions depend not only on data, but as stated above Importance is outside the measured data.
McChuck

December 19, 2017, 11:23 am

Gary – interesting point of contention. I would argue that the institutional mission of a college or university is not “providing access to a college education”, but to make money and attract research grants. At most universities, undergraduates are considered to be a slightly annoying cash crop, with two or three harvests per year.

And, if we speak of causes (which probability does not do), it should be obvious that the rigor of the previous education would have an effect on education in a separate locale. A school which passes all students by age, with each receiving a ‘B’ or better score for every class, will produce students unqualified for further education, even though they may have excellent qualifications on paper.

As far as gathering data, the relevant data is available by comparing standardized test scores to the aggregate GPAs of graduates of schools. ACT scores are generally more useful than SAT scores, as they haven’t dumbed down their standards nearly as much over the last few decades. State standardized tests could be used to validate in-state institutions, of course. While imperfect, this technique is a relatively reliable indicator of academic rigor and grade inflation, and has the benefit of using publicly available data (in most states).
JH

December 19, 2017, 11:33 am

Since the calculations in (7.a) and (7.b) involve observations, the estimated probability in (7.b) is always, yes, always, different from the one in (7.a) unless sock color doesn’t play a role in the calculation of (7.a). That is, sock color is definitely relevant based on what’s described here. In fact, hair color, nose size, shoe size, … and so on, would all be relevant. No, don’t need any modeling or calculations for this conclusion. Why?

The theoretical definition of independence or irrelevance or no-association is usually not realizable empirically. Just as that the sample linear correlation coefficient between bivariate data would not be zero for real data. Or that theoretically, there are perfect right triangles, but there are no perfect physical representations of them.

Here is a problem of real concern that has been studies. I don’t know if it is interesting or not, but student retention is of great concern to college administrators. To investigate the important predictors for student retention, past data on student withdrawal status and other factors are collected. The purpose is to find a threshold value of any sort, built by minimizing false classifications rates or other methods, so that the counselors or advisors can reach out to students who are identified as having a high risk of dropping out before it is too late.
Gary

December 19, 2017, 10:16 pm

@McChuck,
At my public university, there has been a special admissions program for 50 years. About 10% of the freshmen classes come from that under-prepared pool after they’ve demonstrated some ability to do college level work. So at least one institution is somewhat serious about providing opportunities for socio-economic mobility via an education. This in spite of ever-present budget challenges. I’m not saying some cynicism about higher education in general isn’t justified; however, not every place is as bad as the anecdotes depict.

BTW, the College Board’s academic rigor index only looks as the high school courses taken, not the grades received. So “rigor” here isn’t a measure of accomplishment, but of exposure to the subject (i.e., four years of math instead of all A’s in math courses). Of course there is a wide range of course material coverage across school districts. Even so, the index has some predictive power. Interestingly, though, it doesn’t add much if anything to HSGPA or entrance exam scores as predictors, meaning it’s mostly a superfluous covariate. By itself in a different model it might have some usefulness, but the research still is being done.
Pingback: Free Data Science Class: Predictive Case Study 1, Part VII – William M. Briggs

Free Data Science Class: Predictive Case Study 1, Part VI

Related

5 Comments

Leave a Reply

Share this:

Related

5 Comments

Leave a Reply