This idependent article shows how confusing counting coronavirus deaths are. Regular readers, no Doom tomorrow.
This is going to be a preliminary investigation into the COVID-19 numbers being presented hourly to the world. This topic has taken me down a data rabbit hole and will be an ongoing investigation, as I have many questions that remain unanswered. Perhaps readers knowledgeable in such areas can provide their insight.
Firstly, I want to emphasize that COVID-19 is a real virus, and that it is, especially to the elderly and those with underlying health conditions, dangerous and all too often, deadly. However, the mis-labeling of data that is going on in the media is astounding, even for one used to such things. Even websites whose focus in disseminating COVID-19 related statistics have been mis-labeling terms and not providing proper references for their data.
In the US, there is no requirement for laboratory results confirming that there was an active COVID-19 infection in an expired patient in order to report the cause of death as COVID-19. Per the CDC’s National Vital Statistics System COVID-19 Alert No. 2, March 24, 2020:
Should “COVID-19” be reported on the death certificate only with a confirmed test? COVID-19 should be reported on the death certificate for all decedents where the disease caused or is assumed to have caused or contributed to death.
The term “confirmed deaths” as used in media often means (at most) that a death, in the opinion of the certifier (the one who filled out the death certificate, sometimes a doctor), was attributable to COVID-19. Even if there was a requirement for a lab confirmation that there was an infection at the time of the patient’s demise, assigning the proximate cause of death is often difficult, even in the best of times. Unlike determining whether a person has died, there is extensive judgment involved in ascertaining a cause of death. (Along with hospital billing statements, you can file popular television dramas like Grey’s Anatomy, House, ER, and CSI under the category of fiction.) Only a small percentage of deaths are from obvious causes such injuries from traffic collisions.
The majority of deaths occur in elderly individuals, most of those individuals have multiple underlying health issues, and the cause(s) of death is usually the certifier’s best guess. Most deaths are never autopsied, but even with an autopsy, it’s often just a educated opinion as to what the cause of death was. Old people, when they become frail, just die. It’s just a fact of life, and it can be a fool’s errand trying to ascribe a cause of death in those cases. In nursing care facilities, the median life expectancy of residents can be shockingly short. A 2010 study found that the median life expectancy after admittance was a mere 5 months. From that, one can surmise that the one-year mortality rate for residents after being admitted to a nursing home is approximately 80%. Again that is in normal times.
Respiratory viruses have been with humankind for tens of thousands of years. Every year around the beginning of the year in countries in the northern hemisphere, there is a spike in respiratory-related illness and death. More accurately, a plot of all-cause mortality follows a sinusoidal pattern with the peak near the beginning of the year and the trough in the summer. The death rate at the peak is usually about 1/3 higher than the death rate at the trough in the summer.
The terms “excess deaths” or “excess mortality” are more accurately termed mortality displacement. Though uncomfortable for most to ponder, it is an unavoidable fact of life that we all will die. Thus in reality, there is no such thing as an “excess” death per se, just an early death. Per wikipedia:
Mortality displacement denotes a temporary increase inthe mortality rate (number of deaths) in a given population, alsoknown as excess mortality or an excess mortality rate. It is usually attributable to environmental phenomena such as heat waves, cold spells, epidemics and pandemics, especially influenza pandemics, famine or war.
During heat waves, for instance, there are often additional deaths observed in the population, affecting especially older adults and those who are sick. After some periods with excess mortality, however, there has also been observed a decrease in overall mortality during the subsequent weeks. Such short-term forward shift in mortality rate is also referred to as harvesting effect. The subsequent, compensatory reduction in mortality suggests that the heat wave had affected especially those whose health was already so compromised that they “would have died in the short term anyway”
In the US, there are approximately 7,700 deaths per day on average. You can think of that number as the “background” rate of deaths. If only a small percentage of those, say 10%, were mistakenly attributed to COVID-19 because no testing is currently required, that would lead to a (mistaken) figure of 770 COVID-19 deaths per day. This data reporting issue is essentially a version of the false positive paradox, a type of base rate fallacy in statistics. And even if laboratory testing was performed on an expired patient, and they were COVID-19 positive, it doesn’t mean that they died from COVID-19.
The virus is so widespread, we should expect that a significant number people who actually die from other causes will have COVID-19 infections, even though COVID-19 was not really responsible for their death. Consider a person who was suffering from a common cold when they had a heart attack. Did the individual die from the cold or the heart attack? However, we will leave that matter for another day and now just focus on the numbers and their sources. Let us take a closer look at some of the most popular websites for COVID-19 statistics.
1. Worldometer COVID-19 Coronavirus Pandemic Web
Worldometer is popular aggregator and distributor of COVID-19 data.
Referring to their about page:
Total Deaths = cumulative number of deaths among detected cases.
Detected how? As far as I know, no such mortality data is available in the US, at least not publicly. The U.S.
Standard Certificate of Death has no fields for reporting laboratory findings. The CDC does have a newly-introduced form “Human Infection with 2019 Novel Coronavirus Person Under Investigation (PUI) and Case Report Form” but I don’t know when the form was introduced, if reporting is mandatory or if/when the data will be made available to the public.
Regarding the source of their data, Worldometer says:
Worldometer manually analyzes, validates, and aggregates data from thousands of sources in real time and provides global COVID-19 live statistics for a wide audience of caring people around the world.
Our data is also trusted and used by the UK Government, Johns Hopkins CSSE, the Government of Thailand, the Government of Vietnam, the Government of Pakistan, Financial Times, The New York Times, Business Insider, BBC, and many others.
How we work
We collect and process data around the clock, 24 hours a day, 7 days a week. Multiple updates per minute are performed on average by our team of analysts and researchers who validate the data from an ever-growing list of over 5,000 sources under the constant solicitation of users who alert us as soon as an official announcement is made anywhere around the world.
Sources and Methods
Our sources include Official Websites of Ministries of Health or other Government Institutions and Government authorities’ social media accounts. Because national aggregates often lag behind the regional and local health departments’ data, part of our work consists in monitoring thousands of daily reports released by local authorities. Our multilingual team also monitors press briefings’ live streams throughout the day. Occasionally, we can use a selection of leading and trusted news wires with a proven history of accuracy in communicating the data reported by Governments in live press conferences before it is published on the Official Websites.
Ah, a secret blend of 11 herbs and spices. Because for a country of 330 million inhabitants, newspaper reports and twitter are your best sources of accurate statistical data. Totally reassuring. Not. To Worldometer’s credit, they do list sources for each state. For example, for Washington state, they list 14 sources. But how do they coordinate the information between the 14 source to avoid double-counting? And realistically, how many employees do they actually have monitoring the “thousands of daily reports released by local authorities,” what credentials do the employees have, and what processes do they have in place to ensure quality control?
2. The Johns Hopkins COVID-19 Dashboard
Confirmed cases include presumptive positive cases and probable cases, in accordance with CDC guidelines as of April 14. Death totals in the US include confirmed and probable, in accordance with CDC guidelines as of April 14.
Bonkers! If a case is presumptive or probable, it is by definition not confirmed. Despite Johns Hopkins referencing CDC guidelines, their terminology is in direct contradiction to the CDC guidelines. Per the CDC:
A confirmed case or death is defined by meeting confirmatory laboratory evidence for COVID-19.
A probable case or death is defined by one of the following:
- Meeting clinical criteria AND epidemiologic evidence with no confirmatory laboratory testing performed for COVID-19
- Meeting presumptive laboratory evidence AND either clinical criteria OR epidemiologic evidence
- Meeting vital records criteria with no confirmatory laboratory testing performed for COVID19″
Thus, per CDC guidelines, confirmed cases are distinctly separate from probable cases (as a regular English speaker would expect). Confirmed cases require laboratory evidence. If a case is presumptive or probable, it is by definition not confirmed.
The fine print on the Johns Hopkins’ US map https://coronavirus.jhu.edu/us-map mirrors that of their world map:
Confirmed cases include presumptive positive cases.
If a case is presumptive, it is by definition not confirmed. Oddly the creator of this Johns Hopkins website releasing medical statistics to the world is not an epidemiologist but an associate professor in the Department of Civil and Systems Engineering. Johns Hopkins is one of the premier medical institutions in the US and should not allow such flagrant misrepresentation of data. I have reported
Now let us delve into the source of the data for Johns Hopkins dashboards. Per their FAQ:
What are the sources of data informing the dashboard?
The data sources include the World Health Organization, the U.S. Centers for Disease Control and Prevention, the European Center for Disease Prevention and Control, the National Health Commission of the People’s Republic of China, 1point3acres, Worldometers.info, BNO, state and national government health departments, local media reports, and the DXY, one of the world’s largest online communities for physicians, health care professionals, pharmacies and facilities.”
My second concern is that the Johns Hopkins website does not reference their data sources in a proper matter. For example, where do they obtain their death statistics for the United States? It’s impossible to tell. Their Github repository enumerates 24 data sources, but it’s impossible to tell which data source(s) was used specifically for the US death statistics. For example, the Johns Hopkins map was reporting 67,680 deaths in the US as of the 5/3/2020, 3:32:26 PM update. So which particular source of data was used for that number? I’ve tried to match the Johns Hopkins’ number against CDC and WHO numbers, but was unable to find a match. In scientific papers, the reference section of the paper is very important. Each assertion in the body of the paper can be linked to a particular entry in the reference section at the end of the paper. This is just not possible with the John Hopkins’ numbers as they don’t reference a particular source for each number, just a long list of sources. The source for each number should be listed explicitly.
I have reported both of these issues to Johns Hopkins, but to date, they have not responded.
3. CDC Coronavirus Disease 2019 (COVID-19) Web Page
The CDC has at least two web pages. Their Coronavirus Disease 2019 (COVID-19) site reported 65,735 deaths as of the May 3, 2020 update. Quoting from the website:
This page is updated daily based on data confirmed at 4:00pm ET the day before.
Numbers reported on Saturdays and Sundays are preliminary and not yet confirmed by state and territorial health departments. These numbers may be modified when numbers are updated on Mondays.
Number of Jurisdictions There are currently 55 U.S.-affiliated jurisdictions reporting cases of COVID-19. This includes 50 states, District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands.
CDC case counts and death counts include both confirmed and probable cases and deaths
Case notifications were received by CDC from U.S. public health jurisdictions and the National Notifiable Diseases Surveillance System (NNDSS).
Accuracy of Data
CDC does not know the exact number of COVID-19 illnesses, hospitalizations, and deaths for a variety of reasons. COVID-19 can cause mild illness, symptoms might not appear immediately, there are delays in reporting and testing, not everyone who is infected gets tested or seeks medical care, and there may be differences in how states and territories confirm numbers in their jurisdictions.
State and local public health departments are now testing and publicly reporting their cases. In the event of a discrepancy between CDC cases and cases reported by state and local public health officials, data reported by states should be considered the most up to date.
One big question that I have regarding this particular CDC data is that since it is a synthesis of data from public health jurisdictions and the National Notifiable Diseases Surveillance System (NNDSS), how does the CDC avoid double counting cases/deaths? Perhaps each jurisdiction is solely reporting through one means or the other? At this point, I can only guess.
4. CDC National Vital Statistics System Provisional Death Counts for Coronavirus Disease (COVID-19) Web Page
The CDC’s second website, Provisional Death Counts for Coronavirus Disease (COVID-19), reports 37,308 deaths involving (i.e not necessarily laboratory-confirmed) COVID-19 as of May 1, 2020. The number is based upon death certificates. Quoting from their website:
Note: Provisional death counts are based on death certificate data received and coded by the National Center for Health Statistics as of May 1, 2020. Death counts are delayed and may differ from other published sources (see Technical Notes). Counts will be updated periodically. Additional information will be added to this site as available.
The provisional counts for coronavirus disease (COVID-19) deaths are based on a current flow of mortality data in the National Vital Statistics System. National provisional counts include deaths occurring within the 50 states and the District of Columbia that have been received and coded as of the date specified. It is important to note that it can take several weeks for death records to be submitted to National Center for Health Statistics (NCHS), processed, coded, and tabulated. Therefore, the data shown on this page may be incomplete, and will likely not include all deaths that occurred during a given time period, especially for the more recent time periods. Death counts for earlier weeks are continually revised and may increase or decrease as new and updated death certificate data are received from the states by NCHS. COVID-19 death counts shown here may differ from other published sources, as data currently are lagged by an average of 1–2 weeks.
The provisional data presented on this page include the weekly provisional count of deaths in the United States due to COVID-19, deaths from all causes and percent of expected deaths (i.e., number of deaths received over number of deaths expected based on data from previous years), pneumonia deaths (excluding pneumonia deaths involving influenza), pneumonia deaths involving COVID-19, influenza deaths, and deaths involving pneumonia, influenza, or COVID-19; (a) by week ending date and (b) by specific jurisdictions.
Note that 37,308 deaths is considerably less than the 65,735 deaths listed on the CDC’s other page described in the previous section. One major cause of this difference is presumably the reporting period. This data has a lag of a few weeks and is less current. However, there may be a myriad of other differences in what the numbers are actually counting.
5. World Health Organization COVID-19 Explorer
The WHO on their COVID-19 Explore the Data page was reporting 62,406 deaths for the US, as of May 3. Note that this figure does not match any of the numbers from any of the other source discussed.
Definition of COVID-19 death
COVID-19 death is defined for surveillance purposes as a death resulting from a clinically compatible illness in a probable or confirmed COVID-19 case, unless there is a clear alternative cause of death that cannot be related to COVID disease (e.g. trauma). There should be no period of complete recovery between the illness and death.
In other words, the WHO’s death numbers, wherever they got them from, are not laboratory-confirmed. Perhaps more importantly, where is the WHO obtaining their death numbers from? Do they have a system that parallels the CDC’s National Vital Statistics System? That seems unlikely. Perhaps the CDC is the sole source for the WHO’s COVID-19 data for the US, but as mentioned, I was unable to match their reported numbers.
National authorities may use either case-based reporting or aggregate reporting. In some circumstances, such as countries with areas experiencing different transmission patterns, a combination of both case-based and aggregate reporting could be considered. The decision to use case-based or aggregate reporting should be based on the capacity of health authorities and the number of cases. National authorities may move from case-based to aggregate reporting as the number of cases increases, and then back to case-based as the number of cases decreases.
WHO requests that national authorities report probable and confirmed cases of COVID-19 infection within 48 hours of identification, by providing the minimum data set outlined in the “Revised case reporting form for 2019 Novel Coronavirus of confirmed and probable cases”. When it is no longer feasible to report case-based data, countries are requested to provide aggregated data for surveillance.
For all countries to understand the epidemiology and trends of COVID-19, all Member States are requested to provide the following minimum set of aggregate counts, once weekly.
At national level:
• Weekly number of new confirmed cases
• Weekly number of new confirmed case deaths fromCOVID-19
The ultimate question is, what is the reliability of these numbers being provided to us by various sources? Honestly, it will take years to sort things out. The lock-downs are so unprecedented that it will require a huge amount of research to ferret out the health effects of the disease from the heath effects associated with the lock-downs. Traffic fatalities are most likely down. How will the rates of suicide and opioid overdoses be affected? Hospital-acquired infections kill about 75,000 people in the US every year. Deaths from hospital-acquired infections other than from COVID-19 may be reduced because hospitals, for the most part, are operating at low-capacity. But deaths from lack of access to medical care and avoidance of medical care may be significantly up.
In terms of reporting COVID-19-involved cases:
- There may be incentives to under-report.
- There may be incentives to over-report.
- Reporting decisions may be overruled by administrators.
- Reporting decisions may be overruled by governmental
Assuming that we are past the peak of deaths in the US, things should be somewhat more clear within a few months. As a sanity check, to estimate the number of “excess” (displaced) deaths, we can examine weekly all-cause mortality data, and determine the area between the peak of deaths in recent weeks and an average of the same week from previous years, say the years 2014 through 2019. For the 2020 weeks, I will use the data from the CDC’s Provisional Death Counts for Coronavirus Disease (COVID-19) page as of May 1. The most recent week listed, the week ending 4/25/2020, currently only has partial reporting, so I will exclude it from the analysis and only examine data up to the week ending 4/18/2020. For the average of the same week of previous years, I will utilize data for years 2014 through 2019 from the CDC’s Pneumonia and Influenza Mortality Surveillance from the National Center for Health Statistics Mortality Surveillance System (FluView) site.
This should provide us a reasonable limit on the actual number of displaced deaths attributable to COVID-19 up to the week ending 4/18/2020, unless deaths from all causes have dropped off dramatically (extremely unlikely).
As shown in the table above, subtracting the difference between each of the weeks and summing them yields a total of 28,479 deaths in excess of the 2014-2019 average. These 28,479 deaths are 16% less than the sum of the CDC’s numbers over the equivalent period.
Unfortunately direct comparison with the other higher counts, such as the CDC’s 65,735, the WHO’s 62,406, and Johns Hopkins’ 67,680 deaths is not possible because those figures include deaths more recent than 4/18/2020. As I mentioned, it will take time to sort things out. Also please keep in mind that figures from the CDC, even simple numbers like number of all-cause deaths, are subject to revision for at least two years.
Keeping it in Perspective
You won’t hear this on the evening news, but as of the CDC’s Week 16 data, deaths all-cause year-to-date in 2020 were actually tracking under deaths up to the corresponding week in 2018 (and essentially tied with deaths in 2017 and 2019). One can assume that the high number of deaths in 2018 was mainly from respiratory infections. Thus respiratory infections caused more deaths in 2018 than in 2020, at least as of the week mentioned.
From a public health perspective, one must always consider the proper allocation of resources. Each year, about 650,000 Americans die from heart disease each year and 150,000 from stroke. Also, as mentioned previously, hospital-acquired infections kill about 75,000 yearly in the US. (From a public health perspective, talk about low-hanging fruit).
Historically speaking, before the 1950’s, deaths from influenza and pneumonia were an order of magnitude worse that as is usual now. In fact, the current spike in deaths might have not have even been noticed among the high background of respiratory virus-related infections in previous eras.
Around the time World Word II, antibiotics came into widespread use causing a dramatic decline in deaths from pneumonia. The most dramatic epidemic in the history of the US was of course the 1918 influenza pandemic which is reported to have caused 675,000 deaths in the US. Scaling that number by today’s population, would result in approximately 2,200,000 deaths in the US. And unlike today, where the vast majority of deaths are in those over age fifty, young adults were often the victim in 1918.
To support this site and its wholly independent host using credit card or PayPal (in any amount) click here