Anybody out there a lawyer, or do you know one, who is involved on the side of angels in these so-called discrimination lawsuits? Have them contact me.
The argument that complainers (usually our government) are using to “prove” discrimination are statistical nonsense. This is easy to prove. Problem is, those who are defending against these charges also appear to believe statistics can prove discrimination, and instead of arguing that it doesn’t, they instead attack the data or say different statistics can prove lack of discrimination. All this is wrong.
Two recent examples: “Obama bullied bank to pay racial settlement without proof: report“.
So CFPB applied the screws to Ally, saying it had “statistical evidence” showing its participating dealers were “marking up” loan prices for blacks and Hispanics vs. whites (by an average of $3 a month). Ally fought back, insisting non-discriminatory factors, such as credit history, down payments, trade-ins, promotions and rate-shopping, explained differences in loan pricing. After conducting a preliminary regression analysis, the bank found these factors alone accounted for at least 70 percent of the “racial disparities” the government was claiming.
On Friday, the Obama administration announced executive action that would require companies with 100 employees or more to report to the federal government how much they pay their employees broken down by race, gender, and ethnicity. The proposed regulation is being jointly published by the U.S. Equal Employment Opportunity Commission and the Department of Labor. It is hoped that this transparency will help to root out discrimination and reduce the gender pay gap..
You have a sample of incomes, or loans, or whatever, given or measured on people of two races, J and K. You run a regression with incomes (or whatever) on race, which gives a wee p-value on race. What has been proved?
Nothing. Or, rather, it proves that given this data and the ad hoc, crude assumption of a normal to quantify income, and assuming the spread parameter of this normal for both races is equal, if the central parameter for race (or race difference) was certainly equal to 0, then the probability of seeing a p-value in the test used on the parameter larger than the one seen if the experiment which gave rise to the data could be embedded in an infinite sequence of such experiments is wee.
And that is all that is proved. (That mouth-numbing sentence is the definition of a p-value.)
Does the wee p-value prove that racism exists? No. Does it prove that race J makes more than race K? No. What does the regression say about what caused the observed “discrepancy”? Nothing. Not. One. Thing. As in nothing. Zero. Nada. Nothing.
How do I know this? Because of the Banana Test. Every person in the sample, because they are human, have a mind-boggling number of things we could have measured on them, besides their race. Something like this will therefore almost surely be true: the folks in race J will have eaten over their entire lives at least one banana more than did the folks in K. Therefore in the regression we need not have labeled the groups by the two races, J and K, but instead by the equally true High Bananas and Low Bananas.
It may not be bananas, but number of Lego blocks owned, or has breathed in more air downstream from Cleveland, or number of bubbles blown, or on and on and on and on some more.
It could even be ability to do a task! Like a job for which one receives an income.
Anyway, the wee p-value applies just as equally and logically as to High/Low Bananas as races J/K. It must because the Bananas are just as true of the people in the sample as their race.
“C’mon Briggs, that’s absurd. Eating one more banana can’t cause discrimination in income.”
That’s probably true. But so what? Statistical models are silent on cause.
“No way. Everybody knows race is cause of discrimination. That p-value proved it.”
So why run the statistical model to “prove” what you already know? You’re arguing circularly. If statistical models show what-causes-what the wee p-value also proved that eating bananas causes income disparities.
Or, if you already knew race was a cause, then it is a cause regardless what the statistical model showed. If the p-value wasn’t wee, and race is a cause of income “disparity”, then race is still a cause of income “disparity.”
Using statistical models to “prove” discrimination is always cheap, lazy, and wrong. To really prove discrimination you have to do the hard work of investigating each person in the sample and discover what precisely caused his income. If you can’t do that, you can’t prove cause.
There’s much more to say on this dismal and rapidly expanding topic. If you want to defend yourself against spurious charges, don’t use statistics.