Statistics

# Z-Scores Are Misleading & Create Panic: Or, How Not To Compare Pandemics

A number of people panicked to the point their minds have turned to quivering jelly, angry at my calmness, have been emailing and tweeting me the equivalent of “Oh yeah, smart guy? If it wasn’t so bad, then explain to me why the z-scores are so high!”

This proves propaganda works. (Did you know our State Department boasts of its propagandistic abilities?) Not one of these people touting them could define what a z-score is, even if you threatened to take away their Netflix.

What has these folks frightened are scary graphs with big bumps issued by our official propaganda organs. People are told to shiver when gazing at them, and so they do shiver. We have become a very obedient people.

Yet z-scores can cause harmful misinterpretations in judging the severity of pandemics.

Let’s go through a simplified example to show this.

First, z-scores are the results of statistical models, built with an aim of hypothesis testing, which all regular readers know leads to grief. The first cousin of z-scores are p-values, and p-values should never be used. New readers won’t understand any of that, and you don’t need to for this demonstration. But if you’d like to learn, I have a free on-line class, dozens of articles and a book proving all this.

Second, here are simulated z-scores for deaths over a period of about 11 years: 10 previous years, and our incomplete year of doom and hysteria.

Look at that sucker soar! The other wiggles are nothing next to the whopper of a z-score for our year of doom. (I could have done this so that weeks lined with up with our current frenzy, i.e. the peak coming at 2020 week 14 or so, but I am lazy, and it makes no difference. R is a pain in the keister trying to label x-axes so that week 52 is in middle of simple plots.)

Anyway, there it is. The frightening z-score. If I saw that and had no clue what a z-score is I might say “Golly!”, too. Especially if I saw z-score plots across several countries, which appear confirmatory.

Now we’re using a simplified statistical model here, unlike some epidemiological models which can grow complex. The complexity is irrelevant, however. The interpretation remains the same. Our z-score is as simple as you can get: at each week, the observation minus the mean all divided by the standard deviation.

It really does appear, and furthermore it is true (since I picked the numbers) that this period is unusually high. We don’t need to test, because we know with certainty about the unusualness.

This is where most of these plots stop, leaving the viewer suitably awed. But a few go a little way further and show the excess deaths.

This is time series plot, where week 0 was about 11 years ago. Even looking at the excess deaths, it is clear this year is different from all the rest.

Our excess deaths are calculated with reference to the same model as above. Again, other models are more complicated, but the interpretation is the same. Excess deaths are, for each week the observation minus the mean. Simple.

So, once again, it is true that excess deaths are greater than usual, at least over these past 11 years.

Now the viewer is really shaking—or is it “literally” shaking? These excess deaths look the same at many different countries, too! This justifies the panic.

Regular readers will know the answer to this question: what do we call it when we look at the model and not the data, and then act like the model is the data? Right! This is the Deadly Sin of Reification, a terrible sin. We have reified the model, and forgotten reality. This is a common blunder made by scientists everywhere.

How about we now look at the actual deaths, from which the above plots were generated? Good idea:

There it is, the whole 11 years. There is a regular 52-week period, with deaths peaking once in every year, and then falling to a low later in each year. Only to repeat the whole thing the next year.

This, then, simulates a regular year, where flu deaths start ramping up around October, and falling off by the end of April. These are meant to be total and not just flu deaths. People do die of things besides viruses, contrary to media reports.

See the arrow at the end? And the little blip under it? That little blip represents the “excess deaths” which I added (20, 60, and 40 for weeks 38-40 in the final year).

These extra deaths happened, God rest their simulated souls. We are not trying to say they did not happen. They did. But put into context they are nothing to grow hysterical about.

“But Briggs, real data is noisier than that! This is cheating.”

It’s a simulation intended to show that one should not examine z-scores, because z-scores only show departures from the model mean and do not put the number of deaths into a reality-based context.

The reason z-scores look scary for COVID-19 is the same reason they looked scary here, the “excess” deaths are happening at an unusual time. Put into context, however, they are not as scary, as this post shows. Here’s the per-capita all-cause death data for the US:

(The fall off is late reporting by the CDC: read the original post for details.)

If coronavirus happened to coincide in timing with the flu, instead of lagging it a few week, the z-scores everybody shows would be much smaller even though the number of deaths would be the same. Z-scores are a matter of timing, because time is a component of the model.

Here’s the proof. I removed the excess deaths from the end and put them at the peak of the simulated flu, then recomputed the z-scores.

The panic-inducing powers of the z-scores have been rendered impotent. Same number of deaths. Nothing has changed except the timing.

The excess death plots are also cheating, when used as propaganda. You should not compare actual data with means. This was proved in this post. Look at the actual data first.

Here are the new excess deaths plot, with the simulated deaths coming earlier.

And the actual deaths:

You can’t even see the additions. If coronavirus happened at the peak of the flu year, both in China and everywhere else, maybe no one except for the people who keep track of the yearly flu and cold viruses.

Understand, coronavirus would still have been here and still killed the people it killed, just as flu was still with us and killed the people it killed. Only maybe the press would have turned their attention to propping up Joe Biden full time instead. Maybe our dear leaders wouldn’t have reacted to the virus.

Maybe, but, if. Like my dad always says, if ifs and buts were candy and nuts, every day would be Christmas.

To support this site and its wholly independent host using credit card or PayPal (in any amount) click here

Categories: Statistics

### 18 replies »

1. Sheri says:

“Oh yeah, smart guy? If it wasn’t so bad, then explain to me why the z-scores are so high!” Really? Like I care……

Excess deaths are fictional. Deaths are deaths. As many people die as die. It’s a propaganda, fear-inducing term, not reality. In any given day, the number of people who die varies widely. Only if you take 10 years and homogenize and mash and squish numbers into little boxes is there any similarity of “normal” death totals. Which is not science, but it is a great example of how to lie with statistics.

Our “dear leaders” would have reacted because China intended for that to happen and would have published as many terrifying, though fake, videos of their epidemics until world domination was accomplished. It’s just that humans are vastly stupider than even China estimated.

2. Fredo says:

Yes coincidence upon coincidence Sheri what is also seeping out around
the edges is that this and ‘other’ candidates was tested for some number
of years on disruptive populations. It’s like Goldilocks’s soup not too hot
and not too cold it seems to hit that ‘just right’ sweet spot of non producers.
Of course this too can be chalked up to fake news and whispered rumors
from the g_lag. But do pause for a moment and consider what if it’s not?

3. “if ifs and buts were candy and nuts, every day would be Christmas”

I’m stealing this.

4. maggette says:

So every filtering, normalization or de-trending of data is “forgetting about reality” and “looking at the model”. Until you magically shift data from one place in time to another…just to “prove” (your words, not mine) that timing makes a difference in the appearance of a time series plot?

5. Dave,

And if that were a fair characterization of what he did you’d be right.

Dr. Briggs,

With the juiced death totals for tomorrow’s post I’d like to see your estimate and methodology for figuring out the actual death totals, or at least a ballpark idea of how overinflated things are.

6. grumpy kat says:

Is counting deaths really the best metric?

We know most death is among the old and sick, so a great many of these would have occurred soon in any case. It may seem harch, but it is one thing to be 89 and bed ridden and catch COVID in the nursing home; it is another thing entirely to be 15 bright and talented and have your life cut short.
Shouldn’t we be measuring something like “lost quality life years” – surely there is ‘expected remaining years” metrics available by age, sex, and maybe even health. e.g. an 89 year old healthy woman can expect to live X many more years, so if she dies of CV we count only X, presumably a much smaller then the expectation for a healthy 15 year old.

If we follow all cause deaths for some months after COVID has passed (I assume it will pass) then I suspect we will observe a rate less than normal because many will have been “harvested” by the COVID reaper (yes, I think harvesting is an actual epidemiological term)

my \$0.02

7. Bruce says:

My dad’s version was “if ifs and buts were candy and nuts, what a splendid Christmas we’d have.”

8. Grumpy Cat,

You ask: “Shouldn’t we be measuring something like “lost quality life years” – surely there is ‘expected remaining years” metrics available by age, sex, and maybe even health.”

“We” should, and we are! Of course that analysis is not fright-inducing, so the PC-Prog media and politicians don’t report on it. “One life saved” is worth destroying our economy, remember?

There’s an excellent website (for many subjects), Just the Facts, that is chronicling many issues related to the scare-mongering on this virus.

See below for the analysis you requested. See the website for even more analysis:

https://www.justfacts.com/news_covid-19_crucial_facts

“Years of Lost Life
“Beyond raw numbers of deaths, another crucial factor in measuring the deadliness of a public health threat is the ages of its victims. In the words of the CDC, “the allocation of health resources must consider not only the number of deaths by cause but also by age.” Hence, the “years of potential life lost” has “become a mainstay in the evaluation of the impact of injuries on public health.”

“In this respect, Covid-19 is much less lethal than common causes of untimely death, such as accidents. The precise average age of death for Covid-19 fatalities is still unknown, but the vast majority of victims are elderly or have one or more chronic illnesses, as is the case with deaths from the flu and pneumonia.

“Based on the CDC’s latest data for the age distribution of deaths, the average age of death for accidents is about 53.3 years, while for the flu and pneumonia, it is about 77.4 years. Using flu and pneumonia as a rough proxy for Covid-19, this disease robs an average of 12.0 years of life from each of its victims, as compared to 30.6 years of lost life for each accident. And again, accidents kill around 170,000 Americans per year, while Covid-19 is unlikely to have an ongoing high death toll because of its limited prospects for mutation.”

9. “It’s a simulation intended to show that one should not examine z-scores, because z-scores only show departures from the model mean and do not put the number of deaths into a reality-based context.”

Sorry, but overall that’s rubbish.

In z = (x-mean)/std, all components (x, mean, std) are either data or come from models, so it depends how good the model is.

I’ve used z-scores in consulting to successfully monitor network traffic and discover fraud real-time across geography. And no, I did not need to believe the model was “reality” as you often erroneously state in your quest against p-values, hypothesis testing, and frequentism. See http://www.statisticool.com/networktraffic.htm

Cheers,
Justin

10. Briggs says:

Justin, I just figured out who you remind me of. One of those dowswers. You know, the guys with the twigs who swear they can find water anywhere underground, and who cannot be convinced what they’re doing doesn’t work.

P-values are the dowsing rods of science!

11. Justin –
If the average is 100, and we increase that by one, we have a 1% increase.
If the average is 10, and we increase that by one, we have a 10% increase.
If the average is 1, and we increase that by one, we have a 100% increase.

Note that, in each instance, we add exactly one.

12. Andrew says:

McChuck, why are you talking about precentages? Let’s look at hard numbers, shall we? Some stats from the Office of National Statistics (ONS) in the UK:-

Week 14, 16,387 deaths (last 5 year average of 10,305)
Week 15, 18,516 deaths (last 5 year average of 10,520)
Week 16, 22,351 deaths (last 5 year average of 10,497)
Week 17, 21,997 deaths (last 5 year average of 10,458)

The last 3 weeks have been highest-ever record numbers of deaths in the UK.

In your opinion, what is causing these higher numbers of deaths?

Aren’t you even curious as to what is driving these higher numbers?

13. “Justin, I just figured out who you remind me of. One of those dowswers. You know, the guys with the twigs who swear they can find water anywhere underground, and who cannot be convinced what they’re doing doesn’t work.

P-values are the dowsing rods of science!”

C’mon Briggs, that was more of a Twitter response. I gave you an example of where it did work. Bizarre. But I have a website too, I understand about wanting to control the narrative, so no big whoop. 😉 Actually, p-values and hypothesis testing have been used to show dowsing doesn’t work (Randi and many others). Also, Nobel Prize winners, for example, use p-values in their work. But let’s back up a bit to fundamentals.

If I assume a fair coin model and flip a coin 100 times, and do that 3 times, and get say 95, 89, and 93 heads, is this evidence the coin is fair? 92, say, being far distance away from what you’d expect under the model, 50, is just what the p-value is getting at. What is “dowsing rod” about that?

Justin