# William M. Briggs

### Statistician to the Stars!

#### Category: Statistics (page 1 of 170)

The general theory, methods, and philosophy of the Science of Guessing What Is.

Lovejoy’s new model.

We last met Shaun Lovejoy when he claimed that mankind caused global temperatures to increase. At the 99.9% level, of course.

He’s now saying that the increase which wasn’t observed wasn’t there because of natural variability. But, he assures us, we’re still at fault

His entire effort is beside the point. If the “pause” wasn’t predicted, then the models are bad and the theories that drive them probably false. It matters not whether such pauses are “natural” or not.

Tell me honestly. Is this sentence in Lovejoy’s newest peer-reviewed (“Return periods of global climate
fluctuations and the pause”, Geophysical Research Letters) foray science or politics? “Climate change deniers have been able to dismiss all the model results and attribute the warming to natural causes.”

The reason scientists like Yours Truly have dismissed the veracity of climate models is for the eminently scientific reason that models which cannot make skillful forecasts are bad. And this is so even if you don’t want them to be. Even if you love them. Even if the models are consonant with a cherished and desirable ideology.

Up to a constant, Lovejoy’s curious model says the global temperature is caused by climate sensitivity (at double CO2) times the log of the ratio of the time varying CO2 concentration, all plus the “natural” global temperature.

There is no such thing. I mean, there is no such thing as a natural temperature in the absence of mankind. This is because mankind, like every other plant and animal species ever, has been influencing the climate since its inception. Only a denier would deny this.

Follow me closely. Lovejoy believes he can separate out the effects of humans on temperature and thus estimate what the temperature would be were man not around. Forget that such a quantity is of no interest (to any human being), or that such a task is hugely complex. Such estimates are possible. But so are estimates of temperature assuming the plot from the underrated pre-Winning Charlie Sheen movie The Arrival is true.

Let Lovejoy say what he will of Tnat(t) (as he calls it). Since this is meant to be science, how do we verify that Lovejoy isn’t talking out of his chapeau? How do we verify his conjectures? For that is all they are, conjectures. I mean, I could create my own estimate of Tnat(t), and so you could you—and so could anybody. Statistics is a generous, if not a Christian, field. The rule of statistical modeling is, Ask and ye shall receive. How do we tell which estimate is correct?

But—there’s always a but in science—we might believe Lovejoy was on to something if, and only if, his odd model were able to predict new data, data he had never before seen. Has he done this?

His Figure shown above (global temp) might be taken as a forecast, though. His model is a juicy increase. Upwards and onwards! Anybody want to bet that this is the course the future temperature will actually take? If it doesn’t, Lovejoy is wrong. And no denying it.

After fitting his “GCM-free methodology” model, Lovejoy calculates the chances of seeing certain features in Tnat(t), all of which are conditional on his model and the correctness of Tnat(t). Meaning, if his model is fantasia, so are the probabilities about Tnat(t).

Oh, did I mention that Lovejoy first smoothed his time series? Yup: “a 1-2-1 running filter” (see here and here for more on why not to do this).

Lovejoy concludes his opus with the words, “We may still be battling the climate skeptic arguments that the models are
untrustworthy and that the variability is mostly natural in origin.”

Listen: if the GCMs (not just Lovejoy’s curious entry) made bad forecasts, they are bad models. It matters not that they “missed” some “natural variability.” The point is they made bad forecasts. That means that misidentified whatever it was that caused the temperature to take the values it did. That may be “natural variability” or things done by mankind. But it must be something. It doesn’t even matter if Lovejoy’s model is right: the GCMs were wrong.

He says the observed “pause” “has a convincing statistical explanation.” It has Lovejoy’s explanation. But I, or you, could build your own model and show that the “pause” does not have a convincing statistical explanation.

Besides, who gives a fig-and-a-half for statistical explanations? We want causal explanations. We want to know why things happen. We already know that they happened.

A valid identity.

A reader asked about my take on the Kaya wars that are flaming at Anthony Watts’s place.

Here is the form of the Kaya Identity, which is to say, the Kaya non-equation:

$Y = X_1 \times \frac{X_2}{X_1} \times \frac{X_3}{X_2} \times \dots \times \frac{X_{n}}{X_{n-1}} \times \frac{Y}{X_n}$,

The Ys and Xis are numbers and a free choice, given the limitations of algebra (no Xi equals 0). Try it and see: for fun, let $X_i = i$. Works. A perfectly harmless manipulation.

There is no explicit word about causality in the Kaya. The Xis aren’t necessarily causing the Y or each other. If we wanted to know about what caused Y, and we believed that the Xis were in the causal path of Y, we wouldn’t set up an identity but an equation which looked like this:

$Y = f(X,\beta)$.

Where X is a vector and where the vector of (known and unknown) parameters β may be larger or smaller than X. Notice that Y does not appear on the right hand side. We are solving for Y here. One possibility (and probably a too simple one for most Y)is a linear equation:

$Y = \beta_0 + \beta_1 X_1 + \dots + \beta_1 X_n$.

This is not regression. This is a causal model: it says Y will certainly change by $\beta_i$ when $X_i$ increases by 1 unit. Regression is a probability relationship where we first assume $Y \sim N(\mu, \sigma)$ and then substitute μ for Y on the left hand side. Regressions says Y might, not that it certainly will, change.

Anyway, since the Kaya is an identity we can put anything we like in for the Xis and Y. Let’s try.

$\mbox{CO}_2 = \mbox{Puppies} \times \frac{\mbox{Cats}}{\mbox{Puppies}} \times \frac{\mbox{Meteors}}{\mbox{Cats}} \times \frac{\mbox{Cosmic rays}}{\mbox{Meteors}} \times \frac{\mbox{CO}_2}{\mbox{Cosmic rays}}$,

where each is a number existent or occurring over a year in some suitable units. Notice that I was careful to put things that we know change over time, but I needn’t have. Everything could be static (more or less), like this:

$\mbox{CO}_2 = \mbox{Mountain ranges} \times \frac{\mbox{Continents}}{\mbox{Mountain ranges}} \times \frac{\mbox{Great lakes}}{\mbox{Continents}} \times \frac{\mbox{Gold}}{\mbox{Great lakes}} \times \frac{\mbox{CO}_2}{\mbox{Gold}}$.

Some of these quantities are known for sure, and one, the amount of Gold, is not. But whether we know the value of any of these is immaterial to the Kaya. As long as none are null quantities, and Gold isn’t, we’re in business. Also note that the number of entries was up to me. I could have made the list shorter or longer as I pleased.

What do any of these things have to do with CO2? Who said the items had to have anything to do with CO2? Who said I had to use CO2? Insisting that the Xis are causative of Y and using the Kaya and not an equation is doing it, as they say, the hard way.

But we can certainly manufacture cause-like stories. Puppies eat, and their food both requires and releases CO2. Cats, too. Meteorites often have carbon in them, and boy do they disturb the atmosphere; lots of cloud nucleii strewn hither and thither during their journey. Same thing with cosmic rays. Plus, these energetic creatures effect life, and life is important for understanding carbon. This is just off the top of the head; spend some serious time and you can spin this tale out to saga length. Peer reviewed, of course.

The same thing can be done with the fixed Xs. Or with any items you care to put into the Kaya. As long as you stay away from 0, you’re in business.

Here’s what Kaya himself put (adapted to just the USA):

$\mbox{CO}_2 = \mbox{Population} \times \frac{\mbox{GDP}}{\mbox{Population}} \times \frac{\mbox{Energy}}{\mbox{GDP}} \times \frac{\mbox{CO}_2}{\mbox{Energy}}$.

This is just as valid as the examples above, though this one seems more popular with economists. But then economists are prone to tying everything to GDP, which is a number, and economists love numbers (those without uncertainty, that is), often preferring them over reality. Never mind.

For an example of the Kaya in action, see the Pielke Jr video embedded at this link starting at around 21 minutes. Pielke appears to believe that the Kaya has something to do with causation and that he and fellow economists have captured all they need know about human beings and carbon.

The good news from an analytical standpoint is that there’s nothing else. There’s no other levers that you can use out there. This is comprehensive. You may wish there was some rabbit you can pull out of a hat because the good news is also the bad news: this is all you have.

That’s false, as we know from the Puppies example (recall we can add puppies or gold or whatever to the Kaya). And these items—GDP, etc.—are far from all we know about humans and carbon, though Pielke calls it an “extremely powerful tool for policy analysis.” For instance, he says (around 20 min. mark) that, as a lever governments can pull, “Less people, all else equal, equals less emissions” (I believe the guillotine had a similar lever). Another lever the government can yank, he says, is to purposely create poverty, i.e. “Limit generation of wealth” (does that include the wealth of government, one wonders?).

These claims are not quite false, but not quite true, either. More people mean more energy is used, but there’s also a greater chance for more innovation in, say, creating more efficient energy sources. And more people also means more food, and food is a terrific carbon sequestration vehicle, to say it in economic-speak. (Incidentally, one reason that there are more people is that there is more food.)

Now there’s nothing wrong with grappling with crude ratios like Energy/GDP to have some rough, first-blush idea of the amount of energy that is now required to generate such-and-such-a-sized economy, but as for the energy required to drive a future economy, who knows? Nobody in 1990 predicted Google. The Kaya is not a forecasting tool. And since it doesn’t carry any measure of uncertainty, and since every term is mixed up causally with every other term, nobody knows how much credence to give it.

And we can’t bypass the hard work of actually estimating the amount of carbon released and sequestered, both now and in the future. Yet the Kaya is mute on what causes CO2. GDP, after all, doesn’t cause CO2. That’s impossible.

The Kaya should be replaced with a probability model/equation, which can tell us how much change in GDP might be associated with a change in CO2.

That model ought to be under the same constraint as climate models. If it can’t make skillful predictions of future data, we shouldn’t believe it. Right? And how good are economists at forecasting the GDP or energy use one to two decades out?

Update I should have mentioned this above, but it might be in some problem that we know the last ratio Y/X_n (and each other ratio) but that we do not know Y. The Kaya can then be used to calculate Y. But in the case of CO_2, we do not know the last ratio.

Don Knuth. The equations below are beautiful because of him.

Party trick for you. I’m thinking of a number between 1 and 4. Can you guess it?

Two? Nope. Three? Nope. And not one or four either.

I know what the number is, you don’t. That makes it, to you, truly random. To me, it’s completely known and as non-random as you can get. Here, then, is one instance of a truly random number.

The number, incidentally, was e, the base of the so-called natural logarithm. It’s a number that creeps in everywhere and is approximately equal to 2.718282, which is certainly between 1 and 4, but it’s exactly equal to:

$e = \sum_0^\infty \frac{1}{n!}$.

The sum all the way out to infinity means it’s going to take forever and a day, literally, to know each and every digit of e, but the only thing stopping me from this knowledge is laziness. If I set at it, I could make pretty good progress, though I’d always be infinitely far away from complete understanding.

Now I came across a curious and fun little book by Donald Knuth, everybody’s Great Uncle in computer science, called Things a Computer Scientist Rarely Talks About whose dust flap started with the words, “How does a computer scientist understand infinity? What can probability theory teach us about free will? Can mathematical notions be used to enhance one’s personal understanding of the Bible?” Intriguing, no?

Knuth, the creator of TeX and author of The Art of Computer Programming among many, many other things, is Lutheran and devout. He had the idea to “randomly” sample every book of the Bible at the chapter 3, verse 16 mark, and to investigate in depth what he found there. Boy, howdy, did he mean everything. No exegete was as thorough; in this very limited and curious sense, anyway. He wrote 3:16 to describe what he learned. Things is a series of lectures he gave in 1999 about the writing of 3:16 (a book about a book).

It was Knuth’s use of the word random that was of interest. He, an expert in so-called random algorithms, sometimes meant random as a synonym of uniform, other times for unbiased, and still others for unknown.

“I decided that one interesting way to choose a fairly random verse out of each book of the Bible would be to look at chapter 3, verse 16.” “It’s important that if you’re working with a random sample, you mustn’t right rig the data.” “True randomization clearly leads to a better sample than the result of a fixed deterministic choice…The other reason was that when you roll dice there’s a temptation to cheat.” “If I were an astronomer, I would love to look at random points in the sky.” “…I thin I would base it somehow on the digits of pi (π), because π has now been calculated to billions of digits and they seem to be quite random.”

Are they? Like e, π is one of those numbers that crop up in unexpected places. But what can Knuth mean by “quite random”? What can a degrees of randomness mean? In principle, and using this formula we can calculate every single digit of π:

$\pi = \sum_{k = 0}^{\infty}\left[ \frac{1}{16^k} \left( \frac{4}{8k + 1} - \frac{2}{8k + 4} - \frac{1}{8k + 5} - \frac{1}{8k + 6} \right) \right]$.

The remarkable thing about this equation is that we can figure the n-th digit of π without having to compute any digit which came before. All it takes is time, just like in calculating the digits of e.

Since we have a formula, we cannot say that the digits of π are unknown or unpredictable. There they all are: laid bare in a simple equation. I mean, it would be incorrect to say that the digits are “random” except in the sense that before we calculate them, we don’t know them. They are perfectly predictable, though it will take infinite time to get to them all.

Here Knuth seems to mean, as many mean, random as a synonym for transcendental. Loosely, a transcendental number is one which goes on forever not repeating exactly its digits, like e or π; mathematicians say these numbers aren’t algebraic, meaning that they cannot be explicitly and completely solved for. But it does not mean, as we have seen, that formulas for them do not exist. Clearly some formulas do exist.

As in coin flips, we might try to harvest “random” numbers from nature, but here random is a synonym for unpredictable by me because some thing or things caused these outcomes. And this holds for quantum mechanical outcomes, where some thing or things still causes the events, but (in some instances) we are barred from discovering what.

We’re full circle. The only definition of random that sticks is unknown.

The Lancet’s crack team of editors.

Here’s the title of a big new peer-reviewed paper in The Lancet:

Take your time and answer this question (you will be graded): TRUE or FALSE, scientists measured the effects of air pollution on mortality of a group of folks in Europe.

Come on. After seeing the words effects of air pollution on mortality, what else can you say but TRUE?

It is FALSE, of course. The three or four dozen researchers listed as authors never measured, not even once, the amount of air “pollution” any person was exposed to. Further, every single author knew that the title was false. And so did every editor.

So why was it allowed? What about the children!

No, it was our old friend The Epidemiologist Fallacy, a.k.a. the ecological fallacy. Nothing is better at generating papers—the currency in academia—than Old Reliable. Using it is vastly cheaper than relying on reality, which often lets scientists down (right, Gav?). I beg you will read the linked article to understand this ubiquitous menace and driver of scientism.

Not only wasn’t air “pollution” (dust, mostly) measured on individuals, but the proxies of air “pollution” weren’t even measured at the same time as mortality. And not only that, but, well, read the letter, which has it all.

The three of us submitted, fixed, and resubmitted a letter which explained the shortcomings of the Beelen et al., not asking it to be withdrawn—if journals withdrew epidemiologist fallacy papers, there would be oceans of blank pages—but to highlight the false claims made.

Alas, observation rarely trumps theory (right, Gav?). The Lancet decided not to publish and to let the paper stand, doubtless reasoning that since so many others used the epidemiologist fallacy, and got away with it, there was no reason Beelen shouldn’t, too. And anyway, it’s embarrassing to admit to error.

The epidemiologist’s fallacy – yet another example

Yours Truly

Pieternella S. Verhoeven
Associate professor at the Roosevelt Academy, Middelburg, the Netherlands

Jaap C. Hanekamp
Associate professor at the Roosevelt Academy, Middelburg, the Netherlands
Adjunct Associate professor University of Massachusetts, Environmental Health Sciences, Amherst, US
Chair of the Chemical Food Safety & Toxicity Working Group of the Global Harmonization Initiative

Beelen et al.’s paper carries a peculiar title considering that the authors never did what they claim: exposure to air pollution was never measured on any individual. It is only poorly guessed at, and not even guessed at over the right time.

The land-use regression models, which guessed the different kinds of pollution, are calculated using data from October 2008 through May 2011. Yet, the agglomerated studies ran from 1992 through 2007, with most from the 1990s. So even with correct pollution estimates, it would have had to operate backwards in time. Besides, it cannot be claimed that pollution from 2008-2011 accurately represents pollution in the 1990s because of weather dependency.

The “variance explained” by the land-use models is 57-89%, meaning the guesses are often wrong. Nevertheless, the guessed values at the participants’ residencies are presented as actual exposures, unreasonably leaving the participants ‘fixed’ in their residences within the study-timeframes.

Furthermore, the considerable error in the ‘exposure’ estimates is not encapsulated in the statistical analysis: the guesses are taken as fixed without accompanying plus-or-minuses. If that were done, the study’s results would have been rendered insignificant.

Additionally, the cities with the highest guessed pollution (Athens and Rome) had no and even a slight negative effect for PM2.5. Strangely, the one location (VHM&PP), which found a significant effect, was accorded the largest weight (ten times any other) in the meta-analysis.

Overall, the paper’s conclusions are obviated by making the exposure guesses error free, inflating one (of eight) slim results in favour of the proposed hypothesis.

Profound thanks

I cannot say enough about the generosity of my benefactor, who not only footed the bill for my trip, but wined and dined and rummed our Secret Society most luxuriously. I had a small taste of how (dead broke) Hillary Clinton must regularly feel. A forty-eight hour prince. I am deeply grateful.

I am also still recovering. Hence the tardiness of this post.

Weather weenie

Ever hear that term before? Describes—most lovingly, of course—the kind of guy who has memorized storms in the same way a baseball fanatic can name the batting averages, number of home runs, and ERAs of the lineup of the 1957 Detroit Tigers. And the 1956, and 1955, and, well, you get it.

Joe Bastardi, growing up, used to have posters of hurricanes on his walls. He still knows their names, 500 mb thicknesses, minimum pressures, wind speeds, landfalls, and everything else you would never think anybody could remember. And he can rattle them off faster than the EPA can create new regulations.

Does it really matter to the public, dear reader, that the USA is suffering a paucity of these tropical storms? And that this scarcity repeats itself over just about any climatological or oceanographic characteristic of importance? Tornadoes are down. Sea level isn’t doing squat. Ice is icier. Droughts, especially considering those had before rampaging global warming took over, are down and less severe. The maximum maximum temperatures were seen long ago (#2 was in Death Valley, good old USA, back in the 20s or 30s). Even polar bears are back full blast eating cute seals—and each other (nasty animals, really).

But what does everybody who doesn’t bother to think think? That we are spiraling down, down, down to the gloomy depths.

Who says advertising does’t work? You denier.

This just in

One of my favorite sessions was with everybody’s favorite villain Marc Morano, Steve Goddard (going by his real name), and Russell Cook. How blogs can help; that sort of thing. Shout out to machinist and Cajun musician Greg Olsen, who very wisely read a certain statistics book.

Goddard has newspaper clippings from all over the world from as far back as the mid nineteenth century showing that hysteria is an old and ever-present acquaintance. First the world was going to end in ice, then in fire, then back to ice, and sometimes both at once. Something always had to be done.

The lesson is that lessons are never learned; therefore, we can expect that when this scare ends it will rapidly be replaced by another.

Sustainability

Sam Karnick’s opinion (he’s from Heartland) is that the environmentalists have learned at least one lesson. Global Warming was too specific. It made definite predictions: the world will hot up. And when it didn’t, as it hasn’t, it became obvious that the scare must die.

Thus the trick is not to be specific. Thus sustainability.

What a wonderful word! Beautifully unspecific and utterly without content. Don’t do that, that’s not sustainable. Our manufacturing process is based on sustainability principles. Sustainable Pittsburgh’s Green Workplace Challenge.

Everything is at once sustainable and unsustainable. You will never know when you are guilty of unsustainability. Just think of the layers and layers of bureaucracy that will be required to inform you!

We had here a contest for the next environmental scare, but I hereby cancel that and award the banner to Karnick. Sustainability is bound to be a winner.

Am I blue?

Saw a trailer for a new film called Blue by Jeff King. Proof complete that Reagan was right was he said the eight scariest words in the English language are “I’m an environmentalist and I’m here to help.” Or something.

Do just a little harm

Best story was how Lord Monckton punked one of the UN climate conferences. He slipped into an empty seat (Burma’s/Myanmar’s) just as the chairman asked if anybody representing a group had any comments. So his Lordship pushed the button for his mike and said he represented the Asian Coastal Cooperative Institute (or something; made up on the spot) and told the audience that seeing there had been no global warming for over sixteen years and that all the predictions had been a bust, and that it would be cheaper and more efficient to adapt to any warming rather than try to prevent it, shouldn’t we reexamine our priorities? Response from the chair? Not a valid participant. Brave, brave.

And speaking of his Lordship, here he is speaking. Quite a performance—as usual. (The entire video is worth watching, but if you’re in a hurry, LM starts at around 32 minutes in.) Careful watchers will recognize some of the names.

His best line? The Greens are too Yellow to admit they’re Red. Or maybe the joke starting at around 33 minutes. Or maybe the joke starting at 38 minutes. Or the curious picture at 1:01:00.

Conference surprises

How good Willis Eschenbach looks in a suit. Ouch! How three Craigs ended up in one session (Craig Loehle, Craig Idso, Craig Rucker). What are the chances of that? (Regular readers had better be able to answer, or else.) How many people forget or skip over Dwight Eisenhower’s equation (which Cato’s Pat Michael’s recalled for us) Science + Big government money = Bad news. How coincidences are everywhere. Roy Spencer and his wife grew up in “The Soo”, which is Sault Ste. Marie, Michigan, where I was stationed as a weather forecaster for the NWS back in ’93.

Pony up

If you do like these conferences, and wish them to continue, and you are deep of pockets, do as Joe Bast suggested and slide them a few.

More to come

This has only been a precis. Here’s a link to all the talks, which were recorded.

I haven’t been blogging on global warming as much as I did four or five years ago, but that’s because it takes a lot of time, and it was my idea that people were caring less. And people are caring less. But the power hungry are not. Our dear leader is but one example of many (though the poor soul’s intellectual capabilities do not allow him to rise above the level of sarcastic cliché).

But circumstances might soon allow me to return to this line of query. Stay tuned.

Update

Total amount gambled: \$0.00.