A pdf of this criticism may be downloaded here. The Jerrett report can be found here. Background on this subject can be found here.
Overall
Author Michael Jerrett and nine co-investigators and four student or post-doctoral investigators prepared the report “Spatiotemporal Analysis of Air Pollution and Mortality in California Based on the American Cancer Society Cohort” prepared under Contract #06-332 of State of California Air Resources Board Research Division.
The purpose of Jerrett was to investigate the relationship between particulate air pollution, stated as PM2.5, and mortality in the State of California.
On p. 6 it is stated, “All-cause mortality is significantly associated with PM2.5 exposure, but the results are sensitive to statistical model specification and to the exposure model used to generate the estimates” They derive an estimate of 1.08 hazard ratio, with a classical confidence interval between 1.00 and 1.15. They also claim that the risk associated with death due to cardiovascular disease (CVD) and PM2.5 is significant. The risk of PM2.5 with other causes of death they claim are insignificant.
There are three main criticisms that cast grave doubt about the conclusions of Jerret. I find further that the summary in the abstract—and therefore the only part of the report liable to be read by most—to be the result of either poor work or deliberate bias toward a predefined conclusion.
- The authors prepared, intensely investigated, and justified the use of a series of complex statistical models. There were nine models in total, each having particular strengths and weaknesses. Each had several subjective “knobs” and dials to twist. Only one model of the nine (LUR IND+Met; Fig. 22, p. 105) showed a “statistically significant” relationship between mortality and PM2.5, and that only barely, and in that model, only one submodel showed “significance.” The other eight models showed no relationship. Some models even hinted that PM2.5 reduced the probability of early mortality. With such a large number of tests and “tweaks”, the authors were practically guaranteed to find at least one “significant” result, even in the absence of any effect. Nowhere did the authors control for the multiplicity of testing, even though such controls are routine in statistical analyses of these sort.
- The authors only chose to report, in the Abstract (p. 7), on the one model that was “significant”, ignoring all others. They also departed from the main text and inflated the size of the hazard estimate: on p. 85 the estimates in Table 31 are based on a change in the interquartile range of PM2.5, but in the summary this is inflated to present a larger effect, presumably for emphasis. This behavior makes no sense statistically and is either sloppy writing or the result of purposeful choosing a result because of personal bias.
- The models were a mixture of Bayesian and frequentist methods, but incomplete mixtures. Substantial uncertainties remain in the model constructions such that the results are too certain, i.e. the confidence and credible intervals are too narrow. It is likely that were these uncertainties properly handled, even the one model which did show “significance” would not retain that significance.
Even assuming the models are trouble free, and the model that indicated significance was the only model worth showing, we have to consider that the authors claimed to have shown a relationship between PM2.5 and inhalation. Yet the authors never, not even in one case, measured the PM2.5 inhalation of any person. How, then, could the authors claim that PM2.5 inhalation is associated with early mortality? They cannot. At best, they can claim residence is associated with early mortality.
Instead of PM2.5 inhalation, the authors instead measured (with unaccounted for error; see Section 2) the residence of a sample of Californians. Residence was taken as a perfect, error-free, and unique proxy of PM2.5 inhalation. This is absurd, even on the authors own reasoning. About this, more in Section 2.
At the least, these criticisms call for additional study before any decisions are made regarding PM2.5 inhalation and mortality.
Detailed Criticisms
#1: Urban versus rural populations
Wide variances of mortality occur between urban and rural areas in California. Further, habits of life differ widely between the two. The authors write on p. 41:
Specifically across the United States, in the 1980s there were on average 6.2 excess deaths per 100,000 in non-metropolitan areas compared to metropolitan areas, and this number increased to 71.7 excess deaths for the period 2000-2004 [73].
This enormous and growing difference has profound consequences for any wide-region model of all-cause death. The authors’ answer was to include a single indicator (which would change the intercept of the model only) for whether a person lived in the Los Angeles Metropolitan area (p. 41). Even given the discrepancies in raw mortality statistics, this indicator was not even significant (p. 85, Table 31).
On p. 70 some of their estimates “became insignificantly elevated or were of borderline significance when the Los Angeles indicator and interaction terms were included.” Table 27 later lists this as insignificant for many causes. This is odd and should be explained.
At the least, this indicator should have been included (at least for research) as a multiplier (or interaction) for the other variables in the models besides just PM2.5. This would have changed the size of the effect of these variables inside and outside of LA.
Another difficulty is the rapid change in the differences of urban-rural death rates through time. No attempt was made by the authors to incorporate this in the models. This lack of control could certainly be in favor of “significance” of PM2.5 and all-cause mortality in the land use model.
Higher CVD deaths, incidentally, are found in rural populations (where ambulances and hospitals are more distant). Since it was CVD deaths that were found significant by the authors, and since CVD made up a large proportion of overall deaths, it is likelier still that misspecification of urbal versus rural populations contributed to the bare significance of one of the authors’ models.
In short, what we might be seeing in these models is nothing more than a location effect unrelated to PM2.5.
#2 Per-person PM2.5 exposure
It must be clearly understood that no person’s PM2.5 exposure was ever measured. The statements that PM2.5 was associated with all-cause death is therefore a misnomer.
Instead of actually measuring PM2.5, the authors created a guess of exposure based on where each person in the database (at one time) lived (see the next section). The assumption is that merely living in an area is an error-free proxy for actual PM2.5 exposure. This, of course, is false.
And because it is false, it is true that the results from each model are too certain. At the least, the confidence intervals limits are too narrow. Since this is so, and since only one model barely reached classical statistical significance, it is more than likely that actual PM2.5 exposure is not significantly related to all-cause death.
Now, in creating their guess, the authors could have, but did not, create a per-person estimate of PM2.5 exposure. They instead averaged exposure data across months or event years (“constructing 12-month moving averages from January 1988 to December 2000” p. 41). Why “moving averages”? Why not use just the numbers themselves as estimates of PM2.5 exposure? No convincing justification is given.
The authors could have, but did not, create simple plots of all-cause death by exposure level, just as a sanity check. It is strange that these are missing given the plethora of other graphics.
#3 Uncertainty of PM2.5 exposure
This is a key criticism. Given that they could not directly measure PM2.5, the authors had to make a guess. The guess was input as certain and true into the models. That it, the authors did not take into account the uncertainty of the exposure.
The authors used Bayesian exposure models, but only picked the means, medians, or modes of the posterior distribution of PM2.5—and we are never sure which of these point estimates was finally used; there is more than a hint of data snooping here.
What they should have done is to pick a level of exposure implied by the posterior of PM2.5 and then computed the rest of the model and set that result aside. They should have then picked another level implied by the posterior, repeated the model fit and saved, etc. Then they could have weighted all these results together (the weights determined by the posteriors) and this weight would be the final answer.
No matter what, this answer derived from this proper analysis will be less certain than what they have shown. It is therefore extremely unlikely that any of the models would have showed statistical significance.
Curiously, the authors point out that their kriging estimates of PM2.5 look smooth and conclude that actual values of PM2.5 are smooth. But kriging, by design, produces smooth estimates. Statements like these cause concern that the authors do not fully understand the tools they are using.
#4 Uncertainty of land use model
The exact same criticism can be made for the land use model. Only point estimates were used, and no account of the uncertainty of land use was made. Once again, and taking into account the previous over-certainties, it is even more likely that none of the models would have showed statistical significance.
#5 Uncertainty of where a person lived
They did not control adequately for where a person lived. This is crucial because it is solely from where a person lived that the authors guessed at PM2.5 exposure. It appears the authors used the last address only: on p. 41-42 they say, “We assumed that each subject resided at their home address in 1982 throughout the follow-up period to December 2000.”
This will be true for some, but surely not all, persons in the database. Therefore, there must be large errors in estimating where a person lived. And that means large errors in PM2.5 exposure estimates, and therefore even larger errors in actual PM2.5 exposures.
Of course, and once more, this translates into model statements that are too certain.
#6 Uncertainty in dietary and demographic variables
The authors used diet and “beer, wine, and alcohol” self-report variables in their models. They also used Census-derived variables such as percent white residents (in a geographic area). All these variables are notoriously poor. These variables also changed over the period in question, but these changes were not incorporated into the models.
Using these variables as certain in the model, as before, creates over-confidence.
#7 Uncertainty in model diagnostics
Fig. 5 (p. 45) is supposed to be a check on model goodness (for just one model). Why so few points in this plot? Surely the authors have many more observations of PM2.5 than are indicated.
Further, the model does more poorly the larger PM2.5 is. Figs. 14 (p. 56) and 15 (p. 58) are other model checks. These too indicate very poor performance at higher values of PM2.5.
Since it is the authors’ conclusion that increasing PM2.5 is associated with premature death, poorer model performance at increased levels of PM2.5 calls that conclusion seriously into question.
#8 Other pollutants
The NOx, PM10, etc. models are presented as additional evidence, but they are not. These pollutants are highly correlated to PM2.5, and each is estimated in the same way, so reporting on them in the fashion the authors chose is essentially repeating the same information twice in the guise of independence.
My schooling in computer science and artificial intelligence has sparked an interest in statistics, and I’m trying to educate myself in that area. Your analysis of the study and its mistakes is VERY informative, and I’d encourage you to do something like this on a regular basis.
In some sense, we learn more deeply from our own mistakes — if we bother to investigate them — but in another sense, the whole point of an education is to learn from others’ mistakes and not have to learn everything the hard way. Thanks!
How do you determine if a death is excess? I am still can’t figure out how you can tell if a death is premature. Are excess deaths premature or maybe vice-versa?
Dr. Briggs
Your links to the Jarrett report and the background information both lead to the same location.
Jim,
Thanks—fixed. (After having to repair the database yet one more time.)
Ah the joys of blogging and programming.
Jim
Oops. The links are still broken.
I’m no expert, but I’m pretty sure that instead of all that mathy stuff you wrote, you meant to say that this paper proves that the President sold out by postponing smog rules.
Or something. Maybe you’re just anti-science.
Jim,
The old result must be cached on your browser. Link is correct.
Matt –
I am an expert. I have a PhD in the geosciences and am employed by a county government to study and advise on issues regarding air pollution management.
Dr. Briggs is not anti-science. Quite the contrary. He probably read the report and had many questions. Questions that the authors of the report may not have addressed or seemingly omitted from the publication of their study.
His analysis is definitely worthy of examination.
I hope that Dr. Briggs submits his analysis to CARB and the USEPA, and perhaps even a peer reviewed journal.
We need more analyses like this to put methods under scrutiny that best practices in science prescribe.
These issues are not black and white, despite what the Sierra Club, AND the USEPA might say in their press releases.
– gcap
PS: As far as the President selling out on the ozone (smog) rule…. I view what he did as a political decision that rescinded an earlier, misguided political decision.
I’m not up on my hazard ratios. Googling around, isn’t 1.08 (or anything below about 2) usually considered a random chance of occurrence?
How to I convert that to something more meaningful to me? Is it simply .08*1000 = 80 deaths per 1000? (per year?)
Ross McKitrick wrote an article mocking supposed danger from PM.
“According to Environment Canada, dust from unpaved roads in Ontario puts a whopping 90,116 tonnes of PM2.5 into our air each year, nearly 130 times the amount from coal-fired power generation. Using the Clean Air Alliance method for computing deaths, particulates from country-road usage kills 40,739 people per year, quite the massacre considering there are only about 90,000 deaths from all causes in Ontario each year. Who knew? That quiet drive up back country roads to the cottage for a weekend of barbecues, cozy fires and marshmallow roasts is a form of genocide.”
http://opinion.financialpost.com/2011/05/16/ontarios-power-trip-the-failure-of-the-green-energy-act/
There seems to be a presumption that particulates are higher in urban areas. It would be interesting to compare mortality figures to areas where agriculture burning is frequent (rice etc)
Note the concerns with air quality in Chicago yesterday due to the Minnesota wildfire that has burned some 60,000 acres. Prior to modern fire suppression this area of Minnesota burned 86,000 acres annually. However even this was miniscule compared to the 80 to 100 million acres of tall grass prairie burned every year prior to the late 19th century. These prairie fires not only contributed to particulates but also enormous amounts of NOx. A strong case can be made that our national air standards are actually lower now than the natural background (If we admit Native Americans were part of the natural background)
The hazy days of Indian summer were caused by smoke. Its a wonder Native Americans were able to survive.
Monitoring data for PM 2.5 is a very unsatisfactory tool for investigating the effect of air particulates on health, but there is a temptation to use it for that purpose because it is just about the only data available. Remember that PM2.5 is simply a measurement of the mass concentration of particles smaller than 2.5 microns. This is not an adequate proxy for respirable particles as it fails to distinguish particles that deposit in the lungs from those that are filtered out in the nose.
Particles smaller than 0.5 microns can deposit deep in the lungs where the clearance time will be around 18 months whereas particles larger than 0.5 microns will deposit in the nose and upper air ways where the clearance time is a few days at most. A PM2.5 measurement cannot distinguish between the two cases. This alone should disqualify it as a tool for investigating health effect.
But there is more: chemical composition is very important for health effects, and number concentration may be more relevant than mass concentration, for example for diesel particulates. PM2.5 can say nothing about these.
Against this background, the study cannot be expected to have any value: it may produce spurious correlations, or it may fail to discover real adverse health effects. Better statistical analysis won’t help because the data is inadequate for the task. You can’t make a silk purse out of a sows ear. It would be most unwise to draw policy conclusions from such a study.
Thirteen commens and no one has brought up xkcd?
http://xkcd.com/882/
“Particles smaller than 0.5 microns can deposit deep in the lungs where the clearance time will be around 18 months whereas particles larger than 0.5 microns will deposit in the nose and upper air ways where the clearance time is a few days at most. A PM2.5 measurement cannot distinguish between the two cases.”
Why drag physics into it?
“But there is more: chemical composition is very important for health effects, and number concentration may be more relevant than mass concentration, for example for diesel particulates. PM2.5 can say nothing about these.”
Dear God, now you’re dragging chemistry into it.
I suspect that you are out of sympathy with the modern Science-that-needs-no-science.
Matt:
Great piece. More “drunk looking for his keys under the streetlight” research. Perhaps they will invite you out to give a seminar to the authors’ collective on how to do this properly?
You even managed to invent a new word (at least it is for me): urbal!!
More technically, based on your summary I assume they did not include an accurate cumulative measure of exposure to PM2.5 which presumably is the trigger mechanism.
gcapologist,
Sorry, I didn’t think the /sarcasm tags were necessary on that one. 😉
What, are you in the pay of Big Particulate too?!
Great to see a good old stat review. This would be a good PHD qual question, but i am afraid you would be expected to “support”.
That is, a Stat dept qual question, not “climate science”.
The assumption about the proxy is the “killer”. Place of residence for exposure?
Is that like looking for dropped car keys under a street light, because it’s too dark to see where you dropped them in the car park?
If the underlying assumption is wrong, then the absolute best thing that you can discover from such studies is that the assumption is wrong.
Bernd:
Great minds?
Pingback: The Crisis Of Evidence: Why Probability And Statistics Cannot Discover Cause. New Paper | William M. Briggs
Pingback: Road Rage: Paper Says Living Near Road Causes Dementia – William M. Briggs
Pingback: September 13, 2011 Criticism of Jerrett et al. CARB PM2.5 and Mortality in California Report by William Matt Briggs – Scientific Integrity Institute