William M. Briggs

Statistician to the Stars!

Page 149 of 684

Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series

Today, a lovely illustration of all the errors in handling time series we have been discussing for years. I’m sure that after today nobody will make these mistakes ever again. (Actually, I predict it will be a miracle if even 10% read as far as the end. Who wants to work that hard?)

Thanks to our friend Marcel Crok, author and boss of the blog The State of the Climate, who brings us the story of Frans Dijkstra, a gentleman who managed to slip one by the goalie in the Dutch paper de Volkskrant, which Crok told me is one of their “left wing quality newspapers”.

Dijkstra rightly pointed out the obvious: not much interesting was happening to the temperature these last 17, 18 years. To illustrate his point and as a for instance, Dijkstra showed temperature anomalies for De Bilt. About this Crok said, “all hell broke loose.”

That the world is not to be doomed by heat is not the sort of news the bien pensant wish to hear, including one Stephan Okhuijsen (we do not comment on his haircut), who ran to his blog and accused Dijkstra of lying (Liegen met grafieken“). A statistician called Jan van Rongen joined in and said Dijkstra couldn’t be right because an R2 van Rongen calculated was too small.

Let’s don’t take anybody’s word for this and look at the matter ourselves. The record of De Bilt is on line, which is to say the “homogenized” data is on line. What we’re going to see is not the actual temperatures, but the output from a sort of model. Thus comes our first lesson.

Lesson 1 Never homogenize.

In the notes to the data it said in 1950 there was “relocation combined with a transition of the hut”. Know what that means? It means that the data before 1950 is not to be married to the data after that date. Every time you move a thermometer, or make adjustments to its workings, you start a new series. The old one dies, a new one begins.

If you say the mixed marriage of splicing the disjoint series does not matter, you are making a judgment. Is it true? How can you prove it? It doesn’t seem true on its face. Significance tests are circular arguments here. After the marriage, you are left with unquantifiable uncertainty.

This data had three other changes, all in the operation of the instrument, the last in 1993. This creates, so far, four time series now spliced together.

Then something really odd happened: “warming trend of 0.11oC per century caused by urban warming” was removed. This leads to our second lesson.

Lesson 2 Carry all uncertainty forward.

Why weren’t 0.08oC or 0.16oC per century used? Is it certainly true there was a perfectly linear trend of 0.11oC per century was caused by urban warming? No, it is not certainly true. There is some doubt. That doubt should, but doesn’t, accompany the data. The data we’re looking at is not the data, but only a guess of it. And why remove what people felt? Nobody experienced the trend-removed temperatures, they experienced the temperature.

If you make any kind of statistical judgment, which include instrument changes and relocations, you must always state the uncertainty of the resulting data. If you don’t, any analysis you conduct “downstream” will be too certain. Confidence intervals and posteriors will be too narrow, p-values too small, and so on.

That means everything I’m about to show you is too certain. By how much? I have no idea.

Lesson 3 Look at the data.

Here it is (click on all figures for larger images, or right click and open them in new windows). Monthly “temperatures” (the scare quotes are to remind you of the first two lessons, but since they are cumbrous, I drop them hereon in).

Monthly data from De Bilt.

Monthly data from De Bilt.

Bounces around a bit, no? Some especially cold temps in the 40s and 50s, and some mildly warmer ones in the 90s and 00s. Mostly a lot of dull to-ing and fro-ing. Meh. Since Dijkstra looked from 1997 on, we will too.

Same as before, but only from 1997.

Same as before, but only from 1997.

And there it is. Not much more we can do until we learn our next lesson.

Lesson 4 Define your question.

Everybody is intensely interested in “trends”. What is a “trend”? That is the question, the answer of which is: many different things. It could mean (A) the temperature has gone up more often than it has gone down, (B) that it is higher at the end than at the beginning, (C) that the arithmetic mean of the latter half is higher than the mean of the first half, (D) that the series increased on average at more or less the same rate, or (E) many other things. Most statisticians, perhaps anxious to show off their skills, say (F) whether a trend parameter in a probability model exhibits “significance.”

All definitions except (F) make sense. With (A)-(E) all we have to do is look: if the data meets the definition, the trend is there; if not, not. End of story. Probability models are not needed to tell us what happened: the data alone is enough to tell us what happened.

Since 55% of the values went up, there is certainly an upward trend if trend means more data going up than down. October 1997 was 9.6C, October 2014 13.3C, so if trend meant (B) then there was certainly an upward trend. If upward trend meant a higher average in the second half, there was certainly a downward trend (10.51C versus 10.49C). Did the series increase at a more of less constant rate? Maybe. What’s “more or less constant” mean? Month by month? Januaries had an upward (A) trend and a downward (B) and (C). Junes had downward (A), (B), and (C) trends. I leave it as a reader exercise to devise new (and justifiable) definitions.

“But wait, Briggs. Look at all those ups and downs! They’re annoying! They confuse me. Can’t we get rid of them?

Why? That’s what the data is. Why should we remove the data? What would we replace it with, something that is not the data? Years of experience have taught me people really hate time series data and are as anxious to replace their data as a Texan is to get into Luby’s on a Sunday morning after church. This brings us to our next lesson.

Lesson 5 Only the data is the data.

Now I can’t blame Dijkstra for doing what he did next, because it’s habitual. He created “anomalies”, which is to say, he replaced the data with something that isn’t the data. Everybody does this. His anomalies take the average of each month’s temperature from 1961-1900 and subtract them from all the other months. This is what you get.

Same, but now for anomalies.

Same, but now for anomalies.

What makes the interval 1961-1990 so special? Nothing at all. It’s ad hoc, as it always must be. What happens if we changed this 30-year-block to another 30-year-block? Good question, that: this:

All possible 30-year-block anomalies.

All possible 30-year-block anomalies.

These are all the possible anomalies you get when using every possible 30-year-block in the dataset at hand. The black line is the one from 1961-1990 (it’s lower than most but not all others because the period 1997-2014 has monthly values higher than most other periods). Quite a window of possible pictures, no?

Yes. Which is the correct one? None and all. And that’s just the 30-year-blocks. Why not try 20 years? Or 10? Or 40? You get the idea. We are uncertain of which picture is best, so recalling Lesson 2, we should carry all uncertainty forward.

How? That depends. What we should do is to use whatever definition of a trend we agreed upon and ask it of every set of anomalies. Each will give an unambiguous answer “yes” or “no”. That’ll give us some idea of the effect of moving the block. But then we have to remember we can try other widths. And lastly we must remember that we’re looking at anomalies and not data. Why didn’t we just ask our trend question of the real data and skip all this screwy playing around? Clearly, you have never tried to publish a peer-reviewed paper.

Lesson 6 The model is not the data.

The model most often used is a linear regression line plotted over the anomalies. Many, many other models are possible, the choice subject to the whim of the researcher (as we’ll see). But since we don’t like to go against convention, we’ll use a straight line too. That gives us this:

Same as before, but with all possible regression lines.

Same as before, but with all possible regression lines.

Each blue line indicates a negative coefficient in a model (red would have showed if any positive; if we start from 1996 red shows). One model for every possible anomaly block. None were “statistically significant” (an awful term). The modeled decrease per decade was anywhere from 0.11 to 0.08 C. So which block is used makes a difference in how much modeled trend there is.

Notice carefully how none of the blue lines are the data. Neither, for that matter, are the grey lines. The data we left behind long ago. What have these blue lines to do with the price of scones in Amsterdam? Another good question. Have we already forgotten that all we had to do was (1) agree on a definition of trend and (2) look at the actual data to see if it were there? I bet we have.

And say, wasn’t it kind of arbitrary to draw regression line starting in 1997? Why not start in 1998? or 1996? Or whatever? Let’s try:

These models are awful.

These models are awful.

This is the series of regression lines one gets starting separately from January 1990 and ending at December 2012 (so there’d be about two years of data to go into the model) through October 2014. Solid lines are “statistically significant”: red means increase, blue decrease.

This picture is brilliant for two reasons, one simple, one shocking. The simple is that we can get positive or negative trends by picking various start dates (and stop; but I didn’t do that here). That means if I’m anxious to tell a story, all I need is a little creativity. The first step in my tale will be to hasten past the real data and onto something which isn’t the data, of course (like we did).

This picture is just for the 1961-1990 block. Different ones would have resulted if I had used different blocks. I didn’t do it, because by now you get the idea.

Now for the shocking conclusion. Ready?

Usually time series mavens will draw a regression line starting from some arbitrary point (like we did) and end at the last point available. This regression line is a model. It says the data should behave like the model; perhaps the model even says the data is caused by the structure of the model (somehow). If cause isn’t in it, why use the model?

But the model also logically implies that the data before the arbitrary point should have conformed to the model. Do you follow? The start point was arbitrary. The modeler thought a straight line was the thing to do, that a straight line is the best explanation of the data. That means the data that came before the start point should look like the model, too.

Does it? You bet it doesn’t. Look at all those absurd lines, particularly among the increases! Each of these models is correct if we have chosen the correct starting point. The obvious absurdity means the straight line model stinks. So who cares whether some parameter within that model exhibits a wee p-value or not? The model has nothing to do with reality (even less when we realize that the anomaly block is arbitrary and the anomalies aren’t the data and even the data is “homogenized”; we could have insisted a different regression line belonged to the period before our arbitrary start point, but that sounds like desperation). The model is not the data! That brings us to our final lesson.

Lesson 7 Don’t use statistics unless you have to.

Who who knows anything about how actual temperatures are caused would have thought a straight line a good fit? The question answers itself. There was no reason to use statistics on this data, or on most time series. If we wanted to know whether there was a “trend”, we had simply to define “trend” then look.

The only reason to use statistics is to use models to predict data never before seen. If our anomaly regression or other modeled line was any good, it will make skillful forecasts. Let’s wait and see if it does. The experience we have just had indicates we should not be very hopeful. There is no reason in the world to replace the actual data with a model and then make judgments about “what happened” based on the model. The model did not happen, the data did.

Most statistical models stink and they are never checked on new data, the only true test.

Homework Dijkstra also showed a picture of all the homogenized data (1901-2014) over which he plotted a modeled (non-straight) line. Okhuijsen and van Rongen did that and more; van Rongen additionally used a technique called loess to supply another modeled line. Criticize these moves using the lessons learned. Bonus points for using the word “reification” when criticizing van Rongen’s analysis. Extra bonus points for quoting from me about smoothing time series.

Update See also Don’t Use Statistics Unless You Have To.

The Day I Made The Front Page!

I am the fellow on the left, age 15.

I am the fellow on the left, age 15.

In case you can’t see the fine print, it reads:

FRESH PUDDLE ICE formed from the January rains collapsed under the weight of a car driven by Adam Kennedy, above right, Sunday on Otsego Lake. Kennedy and friend Matt Briggs, both of Gaylord, examine the spot where the car’s wheels rest on thick lake ice. With an assist from lake resident Wayne Miller, a tow truck on shore south of the state park arrived and Kennedy attached the 300-foot cable to the trailer hitch. Miller paced the car’s distance from shore at 288 feet before the vehicle eased out.

Now that’s what I call news!

Sign of the old times.

Sign of the old times.

The date is Thursday 24 January 1980. We lived on a sort of bluff at the south end of Otsego Lake and were sitting have Sunday dinner (Banquet Chicken, probably) and we saw this guy spinning donuts on the ice. Somebody wondered if the ice was too thin.

Kerplunk. Just to show that you can’t trust anything you read, I didn’t know Adam from Adam, though the paper says we were pals. And how thick could the ice have been if the car had gone through? Global Warming was a menace even then! We sure were examining the spot where the car nearly sank into oblivion, though. Our postures positively radiate examinationness, or perhaps it is examinationativity. Anyway, not much else we could have done until the tow truck arrived.

The sticker on the side of the car, incidentally, is for Alpenfest 76. The town at all times of the year dresses itself up in an ersatz Bavarianism, but during the third week of July the residents, who were predominately Polish, join in for the thing which is Alpenfest. There is the burning of the Boog, die grosse kaffeepause, the singing of Edelweiss, a Queen contest which is always the subject of intense betting, booths of cutesy country crap (crafts! I meant crafts) on the Alpenstrasse (main street), a parade in which I marched as part of the High School band, and carnival rides.

The carnies used to have the Dime Game in which bettors placed dimes on colored squares and where somebody tossed a racquetball into a pen with colored holes. The prize was candy. Loved it. But the State in its role of in loco parentis shut it down because it looked too much like gambling. Or maybe it’s because the State treasures its monopoly on that sport. Or maybe it was standard American puritanism. Whichever way, it’s gone.

Careful readers will have noted the absence of ice fishing houses in the picture. Most of these were on the north side of the lake anyway, but many took them off because of the rain.

But there’s no mistaking the foggy grayness which is a permanent fixture of Northern Michigan winters. If it wasn’t snowing, it was about to. Gaylord is the high and snowiest point in the lower peninsula. Lots of lake effect. Sunny days in winter were rare, but boy were they pretty. Perfect time to go cross-country skiing.

Or hunting. My friend Chuck Coonrod—whose dad, then a school bus driver, is coincidentally mentioned in this same issue of the stately Herald Times—would head up the hill behind his house and go an kill animals. I still remember the first time I baked a liberally salted and peppered squirrel in the oven. It was good!

Behind his house was a steep, winding hill. Chuck had metal saucers on which we would sit and launch ourselves into near oblivion. This was still in the day of cloth snow suits. The end of a good afternoon saw us drenched and steaming sitting by his kitchen table snacking on the bacon his mother always had by the side of the stove.

Soft spots in the lake weren’t the only danger. My nephew had a video which I am unable to rediscover which showed how in the spring the wind would push chunks of ice onto the land. An inexorable flow, like cold lava. It doesn’t sound much, but all that weight does damage. I got my leg stuck once and thought it would going to break off.

Good thing we can simulate all this stuff on a computer now. So much safer.

Update The horrific (placed there by my enemies) of mislocating Alpinefest to the wrong month has been fixed. Consider this a belated trigger warning.

The Scientific Ethicist: Mathematics & Logic Edition

The Scientific Ethicist, PhD

The Scientific Ethicist, PhD

This week, three letters from concerned readers.

Can I Skip College?

Dear Scientific Ethicist,

I am a junior in high school and will graduate in the first semester of my senior year. Someday I would like to be a stay-at-home mom. I have no interest in going to college. I feel it would be a waste of money for me to go when I don’t intend to use my degree.

To say my parents are disappointed in me over this is putting it mildly. They have a life planned for me that includes college. I would also like to move away to somewhere where it’s warm year-round, and they don’t like that idea either.

How do I make them understand that this is MY life and everything will be OK?

Uninterested in Idaho

Dear Uninterested,

This is obviously related to the Fundamental Theorem of Calculus. Let me quote Wikipedia, “The first part of the theorem, sometimes called the first fundamental theorem of calculus, is that an indefinite integral of a function[1] can be reversed by differentiation. This part of the theorem is also important because it guarantees the existence of antiderivatives for continuous functions.

The second part, sometimes called the second fundamental theorem of calculus, is that the definite integral of a function can be computed by using any one of its infinitely many antiderivatives. This part of the theorem has key practical applications because it markedly simplifies the computation of definite integrals.”

As you can see, the rest follows easily. That’s the power of mathematics!

The Scientific Ethicist

P.S. See also the first, second, and third laws of thermodynamics in reference to your comment about heat.

Dating Woes

Dear Scientific Ethicist,

The school year has started and many high school girls like me are faced with a similar problem: how to politely decline when a boy asks you to a dance.

Whether it be homecoming, winter formal or prom, some boys go all out and ask girls in elaborate and creative ways. I don’t know what to do in these situations if I don’t want to go with the boy who is asking me. I feel bad saying “no” because of all the work they put into it, and also sometimes there is an audience watching. Should I just go anyway?

Saratoga Teen

Dear Saratoga,

Meta logic is the answer here, especially formal systems. A formal system must have a finite alphabet, a listing of the strict rules of grammar (exceptions aren’t allowed), a specified list of inference rules, and finally a set of indubitable axioms. The latter may be made up because, of course, science has no way of externally checking the validity of any set of axioms.

The point for you, and I’m sure you already see it, is that since you can create this formal system any way you like, the next time to you attend a formal you can act any way you like. Logic guarantees this.

Truly there is nothing more logical than logic!

The Scientific Ethicist

Social Media Prayers

Dear Scientific Ethicist,

I frequently receive requests via Facebook and other social media sites asking for prayers for people who are ill or suffering a loss. I’m not a religious person, but I would like to acknowledge their pain and extend my sympathy. Any suggestions?

Challenged in Tucson

Dear Challenged,

Have you considered that e is irrational? Every schoolgirl ethicist knows that

e = \sum_{n = 0}^{\infty} \frac{1}{n!}\cdot .

Now if e were rational, it would have the form a/b where the two numbers are integers, and where obviously b does not equal 1. Then

$latex \frac{1}{1}\ + \frac{1}{1}\ < e = \frac{1}{1}\ + \frac{1}{1}\ + \frac{1}{1\cdot2}\ + \frac{1}{1\cdot2\cdot3}\ + ... < \frac{1}{1}\ + \frac{1}{1}\ + \frac{1}{1\cdot2}\ + \frac{1}{1\cdot2\cdot2}\ + ... = 3&s=2$. Well, we repeat a procedure like this, working with infinite series, manipulating this way and that, and we finally conclude that e cannot be rational.

But you can be, using math, logic, and science!

The Scientific Ethicist

Be sure not to miss other penetrating installments of The Scientific Ethicist. Or send in your questions today!

Summary Against Modern Thought: God Is Not The Universe

This may be proved in three ways. The first...

This may be proved in three ways. The first…

See the first post in this series for an explanation and guide of our tour of Summa Contra Gentiles. All posts are under the category SAMT.

Previous post.

Pantheism is the belief that the universe (or multiverse or whatever is all that exists) is identical with God. It is an ancient and current belief. See inter alia Star Wars or attend any yoga class. Atheists speak like pantheists (see discussions about “spontaneous” effects, creation from “nothing”, etc.).

Chapter 26: That God Is Not The Formal Being Of All Things

1 FROM the foregoing we are able to refute the error of some who have asserted that God is nothing else than the formal being of everything.[1]

2 For this being is divided into substantial and accidental being. Now the divine being is neither the being of a substance nor the being of an accident, as shown above.[2] Therefore it is impossible for God to be the being whereby everything is formally.i

3 Again. Things are not distinct from one another in that they have being, since in this they all agree. If, then, things differ from one another, it follows that either being itself is specified by certain differences added thereto, so that different things have a specifically different being, or that things differ in that being itself is attached to specifically different natures. But the former of these is impossible, because an addition cannot be attached to being in the same way as a difference is added to a genus, as already stated.[3] It remains, therefore, that things differ because they have different natures, to which being is attached in different ways. Now the divine being is not attached to another nature, but is the nature itself, as shown above.[4] If, therefore, the divine being were the formal being of all things, it would follow that all things are simply one…ii

5 Further. That which is common to many is not something besides those many except only logically: thus animal is not something besides Socrates and Plato and other animals except as considered by the mind, which apprehends the form of animal as divested of all that specifies, and individualizes it: for man is that which is truly an animal, else it would follow that in Socrates and Plato there are several animals, namely animal in general, man in general, and Plato himself.iii Much less therefore being itself in general is something apart from all things that have being; except only as apprehended by the mind. If therefore God is being in general, He will not be an individual thing except only as apprehended in the mind. Now it has been shown above[6] that God is something not merely in the intellect, but in reality. Therefore God is not the common being of all.iv

6 Again. Generation is essentially the way to being, and corruption the way to not-being. For the term of generation is the form, and that of corruption privation, for no other reason than because the form makes a thing to be, and corruption makes a thing not to be, for supposing a certain form not to give being, that which received that form would not be said to be generated. If, then, God were the formal being of all things it would follow that He is the term of generation. Which is false, since He is eternal, as we have shown above.[7]v

7 Moreover. It would follow that the being of every thing has been from eternity: wherefore there would be neither generation nor corruption. For if there were, it would follow that a thing acquires anew a being already pre-existing. Either then it is acquired by something already existing, or else by something nowise pre-existing. In the first case, since according to the above supposition all existing things have the same being, it would follow that the thing which is said to be generated, receives not a new being but a new mode of being, and therefore is not generated but altered. If on the other hand the thing nowise existed before, it would follow that it is made out of nothing, and this is contrary to the essence of generation. Consequently this supposition would wholly do away with generation and corruption: and therefore it is clear that it is impossible…vi

We skip the next six arguments, which refute an error not of main interest to us.

————————————————————————————

[1] Sum. Th. P. I., Q. iii., A. 8.
[2] Ch. xxv.
[3] Ch. xxv.
[4] Ch. xxii.
[5] Ch. xv.
[6] Ch. xiii.
[7] Ch. xv.

i“This being” is the pantheistic deity, if it existed. Obviously, the universe is made of parts, is in potential, and all those things we already know God cannot be. This is probably the simplest proof in the whole book! So obvious is this that we’ll skip around the remaining arguments, though there is plenty there that is of interest.

iiThere was some confusion about this in the past. If you exist and I exist (and we do) then we both share existence, or being. But after that, we begin to differ. That’s all this means, and Aquinas draws the implication in the next sentence. There are not different kinds of to exist. The takeaway point is that in God existence is essence, or nature.

iiiThat is, we can know the essence of animal, and other essences, too! Once you grasp this seemingly simple point, boy howdy do things change.

ivIn other words, God cannot be a thing which only exists in your imagination.

vA good review: things which are in existence, have being, have form and matter. Take away the form of you and what is left? Nothing but dust. The form of man is his soul.

viI find this argument beautiful. Put another way around, if the universe were God, then nothing could change; things change; therefore the universe is not God. Being cannot alter into new ways of being Being, and nothing can come from nothing. If God were there universe, it would be a dull constant unchanging void with not even a seething quantum “vacuum” to liven it up.

« Older posts Newer posts »

© 2016 William M. Briggs

Theme by Anders NorenUp ↑