William M. Briggs

Statistician to the Stars!

Page 151 of 566

Black And White Homicide Rates: Who’s Killing Whom?

This classic post is being picked up in several places, and the reason is not far to seek. Later we’ll have a poll where I’ll ask “Which is the dumbest educated group?” and where choice (A) will be “Reporters.” This post first ran 27 January 2013.

Blacks who commit homicide do so at a rate about 7.5 times larger than whites who commit homicide. The trend in black homicide correlates well with the change in overall homicide. This disparity also exists for other crimes: blacks commit them at about 7-10 times the rates of whites.

Going on a suggestion made by reader JH when we first examined the homicide statistics1, I put both races together on one plot to make comparisons fair.

First is the homicide rate for Blacks and Whites, over-plotted with the homicide victim rate for both groups, finally over-plotted with the overall homicide rate (taken from the post on guns and homicides).

Figure 1

Figure 1

Note that the blue line is scaled to the right vertical axis, and is the overall homicide rate (as estimated by the FBI and Census). The black lines are killers, and the red victims. The dashed are Blacks, the solid Whites. (See this post for separate pictures of each race, where the general decline of White homicide rates, and its slight upward bump in 1990 is more visible.)

Obviously, the overall homicide rate correlates nicely with the Black rates. In an odd coincidence, according to the FBI the number of Black victims is almost identical to the number of White victims.

Whatever is happening in Black populations is largely responsible for the decline in homicide rates.

One thing is more or less constant: the rates for Black killers are much higher than for White killers, as this picture which shows the ratio of rates demonstrates:

Figure 2

Figure 2

Rates bob around, but average out to about 7.5 times higher.

Perhaps the difference lies in who is killing whom. Lumping all the Black and White homicides together, here are the percentages of Blacks killing Whites and White killing Blacks:

Figure 3

Figure 3

Racial animosity is on the rise, homicidally speaking. This will come as no surprise to anybody who isn’t an NPR listener. More Blacks kill Whites than the reverse, which strangely is exactly what you don’t hear on TV. Update Perhaps this trend will escalate, given the media’s glee in stoking racial discontent.

Lastly, here are the percentages for the races killing each other:

Figure 4

Figure 4

Fairly self-explanatory.

Update Fig. 2 with new axes limits.

Fig. 2b

Fig. 2b

Update Breakdown of male (Figure A1) and female (Figure A2) Black and White homicide rates. Note the change in scale between the sexes. Males are about 10 times as homicidal as females.

Figure A1

Figure A1

Figure A2

Figure A2

The ratio of Black to White, by sex and age group, is also interesting.

Figure A3

Figure A3

Figure A4

Figure A4

Black males maintain a consistent edge over White males, homicidally speaking, regardless of age group. Though the gap is closing, or at least leveling off, for females.


—————————————————————————-

1The data are from the FBI (details here), the Census Bureau, and the Department of Justice (details here). For W/B,B/W homicides, see htus8008f19.csv in the DOJ link.


51 Comments

Happy 50th Bill & Marilyn!

The Happy Couple

The Happy Couple

We made rather merry at the party yesterday. So much so that there might not be any fresh posts for days.


1 Comment

Two New Dice Games: Animosity & Disdain

Laissez les bons temps rouler!

Summer and time for doing plenty of nothing. Like playing a rousing game of 10,000, sometimes called Farkle, Dix Mille, or “Didn’t we just play that yesterday?”

There are many variants to 10,000 as there are economists’ opinions on the GDP, but this link is closest to the game I know.

The problem with 10,000, while it’s fun in a pleasant sort of way, is that it doesn’t sow as much discord and domestic disharmony as I like to see. Every player is against only himself and the cruelty of the dice. Whereas in a proper game players are at each others’ throats.

Best news is that no politics of any kind is ever discussed during play of the games below. No rule forbids this, but the flow of play precludes it. Thus, while the game appears to increase bad feelings, it actually decreases them globally. Progressives and conservatives, atheists and believers, and the froward and shy may play together and George Zimmerman’s name never is mentioned.

333 or Animosity

Normal: Play starts and continues right or left. See Scoring Table below. Player rolls and can keep his score or assign it to the player on his left (if playing left, or vice versa). He must then roll again and must keep score on second roll. Player to his left, if assigned the thrower’s first roll, loses his next chance to roll. A straight or triple immediately reverses play.

Example: Player rolls (1,2,3) and play had been going left. It switches to right. Player may keep the 12 points for himself or assign it to the player to his right. After scoring, play moves to right and proceeds as usual.

Endgame: Once any player meets or exceeds 333 points, play continues once more around in current direction, but scores can no longer be assigned. Each player begins an accumulation, adding scores on successive rolls. The accumulation may continue as long as successive rolls are larger than or equal to previous rolls.

Example: The current potential victor has 342 and play is to left. Next player is at 280 and rolls a (1,1,2). This isn’t enough to put him over 342, so he rolls again. Next two rolls are (4,4,5), which is higher than (1,1,2), then (2,3,5), which isn’t, so this player loses and the next player moves to the endgame.

Disdain

Normal: Play is much like in Animosity, except that the endgame accumulation rule is in effect the entire game and the play never reverses direction. Player rolls and at any time in his accumulation may stop and keep his score or assign it to the player on his left. He must then roll again and must keep score on second accumulation. As above, the successive roles must equal or exceed the previous roles in the accumulation. Player to his left, if assigned the thrower’s first score, loses his next chance to roll.

Endgame: Exactly as above.

Notes: Both games have been played and tested. Both have produced much fun. Gamblers like Disdain; analytical folks prefer Animosity.

Scoring Table

The three dice are summed. If the throw is a “straight”, the sum is multiplied by two. If the throw is triples, the sum is multiplied by three.

Roll Score Prob
1,1,2 4 0.0139
1,1,3 | 1,2,2 5 0.0278
1,1,4 6 0.0139
1,1,5 | 1,2,4 | 1,3,3 | 2,2,3 7 0.0694
1,1,6 | 1,2,5 | 1,3,4 | 2,2,4 | 2,3,3 8 0.0972
(1,1,1) | 1,2,6 | 1,3,5 | 1,4,4 | 2,2,5 9 0.0880
1,3,6 | 1,4,5 | 2,2,6 | 2,3,5 | 2,4,4 | 3,3,4 10 0.1250
1,4,6 | 1,5,5 | 2,3,6 | 2,4,5 | 3,3,5 | 3,4,4 11 0.1250
(1,2,3) | 1,5,6 | 2,4,6 | 2,5,5 | 3,3,6 | 3,4,5 12 0.1389
1,6,6 | 2,5,6 | 3,4,6 | 4,4,5 13 0.0833
2,6,6 | 3,5,6 | 4,4,6 | 4,5,5 14 0.0694
3,6,6 15 0.0139
4,6,6 | 5,5,6 16 0.0278



(2,2,2) | (2,3,4) 18 0.0324
(3,4,5) 24 0.0278
(3,3,3) 27 0.0046
(4,5,6) 30 0.0278
(4,4,4) 36 0.0046
(5,5,5) 45 0.0046
(6,6,6) 54 0.0046

Notes: All possibilities are show, sorted; the order of the dice do not matter. A (1,2,3) is the same as a (2,1,3) or (2,3,1), etc. Parentheses around the roll indicate a straight (sum times two) or triple (sum times three) and thus also a switch. Chance of a reversal (straight or triple) is 5/36, or about 1/7.


Bonus third game! 222 or Unnamed as yet

Normal: Play starts and continues right or left. Player rolls and can keep his score or subtract it from player to his left (if playing left, or vice versa). A straight or triple immediately reverses play. No player can have less than 0 points.

Example: Player A rolls (1,2,3) and play had been going left. It switches to right. Player A may keep his 12 points or subtract them from player to his right. After scoring, play moves to right and proceeds as usual until next switch.

Endgame: Once a player meets or exceeds 222 points, play continues once more around in current direction, except that scores can only be added to players’ tallies, and only accumulated if successive rolls are larger than or equal to previous rolls. Players may stop at any time and tally score.

Example: Player A hits 230 so play moves to B, who is at 180. He rolls a score of 12, which isn’t enough to beat A, so he rolls again but must beat or tie 12 points on next roll. Suppose his second desperation roll is 14, for a total of 26, which still isn’t enough to beat A, so he will roll again. If he doesn’t beat or tie 14 on the roll, he is out. And so on across the board.

Notes. Haven’t played 222 yet, but would be delighted to hear reports of any attempts.


3 Comments

What Regression Really Is

Bookmark this one, will you, folks? If there’s one thing we get more questions about and that is more abused than regression, I don’t know. So here is the world’s briefest—and most accurate—primer. There are hundreds of variants, twists and turns, and tweaks galore, but here is the version most use unthinkingly.

  1. Take some thing in which you want to quantify the uncertainty. Call it y: y can be somebody’s income, their rating on some HR form, a GPA, their blood pressure, anything. It’s a number you don’t know but want to.
  2. Next write y ~ N(m, s), which means this and nothing else: “Our uncertainty in the value y takes is quantified by a normal distribution with central parameter m and spread parameter s.” It means you don’t know what value y will take in any instance, but if you had to bet, it would take one of the values quantified by the probabilities specified by the mathematical equation N(m,s).
  3. We never, absolutely never, say “y is normally distributed.” Nothing in the universe is “normally distributed.” We use the normal to quantify our uncertainty. The normal has no power over y. It is not real.
  4. The probability y takes any value, even the values you actually did see, given any normal distribution, is 0. Normal distributions are bizarre and really shouldn’t be used, but always are. Why if they are so weird are they ubiquitous? Some say insanity, others laziness, and still more ignorance. I say it’s because it’s automatic in the software.
  5. Collect probative data—call it x—which you hope adds information about y. X can be anything: sex, age, GDP, race, anything. Just to fix an example, let x1 be sex, either male or female, and let y be GPA. We want to say how sex informs our uncertainty of a person’s GPA.
  6. Regression is this: y ~ N(b0 + b1*I(sex=Male), s).
  7. This says that our uncertainty in y is quantified by a normal distribution with central parameter b0 + b1*I(sex=Male) and spread parameter s. The funny “I(sex=Male)” is an indicator function and takes the value 1 when it’s argument is true, else it equals 0. Thus, for males, the central parameter is b0 + b1 and for females it is just b0. Pause here until you get this.
  8. This could be expanded indefinitely. We could write y ~ N(b0 + b1*I(sex=Male) + b2 * Age + b3 * Number of video games owned, s), and on and on. It means we draw a different normal distribution for GPA uncertainty for every combination of sex, age, and numbers of video games. Notice the equation for the central parameter is linear. Our choice!
  9. Regression is not an “equation for y”. Regression does not “model y”. Regression only quantifies our uncertainty in y conditioned on knowing the value of some x’s.
  10. The b’s are also called parameters, or coefficients, or betas, etc. If we knew what the values of the b’s were, we could draw separate normal distributions, here one for men and one for women. Both would have the same spread, but different central points.
  11. We do not ordinarily know the values of the parameters. Classically we guess using some math which isn’t of the slightest interest to us in understanding what regression is. We call the guesses “b-hats” or “beta-hats”, to indicate we don’t know what b is but it is just a guess. The guesses are given the fancy title of “estimates” which makes it sound like science.
  12. Ninety-nine-point-nine-nine percent of people stop here. If b1 is not equal to 0 (judged by a magical p-value), they say incorrectly “Men and women are different.” Whether or not this is true, that is not what regression proves. Instead, if it were true that b1 was not equal to 0, then all we could say was that “Our uncertainty in the GPAs of females is quantified by a normal distribution with central parameter b0 and spread s, and our uncertainty in the GPAs of males is quantified by a normal distribution with central parameter b0+ b1 and spread s.
  13. Some people wrongly say “Males have higher GPAs” if b1 is positive or “Males have lower GPAs” if b1 is negative. This is false, false, false, false, and false some more. It is wrong, misleading, incorrect, and wrong some more, too. It gives the errant impression that (if b1 is positive) males have higher GPAs, when all we can say is that the probability that any given male has a higher GPA than any given female is greater than 50%. If we knew the values of the b’s and s, we could quantify this exactly.
  14. We do not know the values of the b’s and s. And there’s no reason in the world we should be interested, though the subject does seem to fascinate. The b’s are not real, they are fictional parameters we made up in the interest of the problem. This is why when you hear somebody talk about “The true value of b” you should be as suspicious as when a politician says he’s there to help you.
  15. What should then happen, but almost never does, is to account for the uncertainty we have in the b’s. We could, even not knowing the b’s, make statements like, “Given the data we observed and accepting we’re using a normal distribution to quantify our uncertainty in GPA, the probability that any given male has a higher GPA than any given female is W%.” If W% was equal to 50%, we could say that knowing a person’s sex tells us nothing about that person’s GPA. If W% was not exactly 50% but close to it—where “close” is up to each individual to decide: what’s close for one wouldn’t be for another—we could ignore sex in our regression and concentrate on each students’ age and video game number.
  16. This last and necessary but ignored step was the point of regression; thus that it’s skipped is an argument for depression. It is not done for three reasons. (1) Nobody thinks of it. (2) The p-values which say whether each bi should be judged 0 or not mesmerize. (3) Even if we judge the probability, given the data, that bi is greater than 0 is very high (or very low), this does not translate into a discernible or useful difference in our understanding of y and people prefer false certainty over true uncertainty.
  17. In our example it could be that the p-value for b1 is wee, and its posterior shows the probability it is greater than 0 is close to 1, but it still could be that, given the data and assuming the normal, the probability any given male has a GPA larger than any given female is (say) 50.01%. Knowing a person’s sex tells us almost nothing about this person’s GPA.
  18. But it could also be that the p-value for b1 is greater than the magic number, and the posterior also sad, but that (given the etc.) the probability a male has a higher GPA than a female is (say) 70%, which says something interesting.
  19. In short, the b’s do not tell us directly what we want to know. We should instead solve the equation we set up!

Obviously, I have ignored much. Entire textbooks are written on this subject. Come to think of it, I’m writing one, too.



31 Comments
« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑