William M. Briggs

Statistician to the Stars!

Author: Briggs (page 1 of 536)

Upcoming Probability & Statistics Talks! The Climate, Over-Certainty & More!

lecturer

Update Classic Posts page link fixed.

When & Where

I have no idea. As soon as somebody hires me to give them.

Since I am out of the system, I don’t know anybody and thus can’t ask the right personages, whomever they may be, so I’m asking you, dear readers, on the wee small chance you might be the right person.

Titles of some talks:

  • The Over-Certainty Pandemic,
  • Science, Not Scientism,
  • Everything You Believe About Statistics is Wrong,
  • Top Seven Fallacies in Probability and Statistics,
  • What Predictive Analytics Should Be,
  • Global Warming Isn’t So Hot.

Custom talks, seminars, classes and so forth made on request. See the Classic Posts page for sample videos.

Blog Changes: Classic Posts

Following some good advice from our friend John Cook, I did some tweaking on the layout. Of most interest is the fixing of the Classic Posts page, which is now linked on the right sidebar.

Navigation is easier, and after I winnow some of the older material on statistics, it will be easier still.

The Contact Me page has the old Hire Me and Talks page and also information about subscribing to the blog. See the Contribute page about contributing guest posts. Any regular reader is a colleague.

I’ve also cleaned up the Who is WMB page. Pay special attention to the words “I am wholly independent; i.e., I have no position. I depend on you, dear reader, for my livelihood. I do not jest. Spread the word.”

Be careful who you spread it to. I was lined up at a major you-know-the-name think tank for a position on exposing the statistical errors in global warming, but it turned out upon vetting, so I was point-blank told, that a VP at the fine organization was himself same-sex attracted and took wild exception to my position on so-called same-sex marriage. So my job was called off.

What have the two things, global warming and the philosophy of natural law, to do with one another? “Nothing” is the answer I’m looking for. But politics is politics.

This reminds me of the best advice I can give new academics: keep your mouth shut and toe the line until you get tenure. But by then, of course, it might be too late.

Update For climate fans, it seems appropriate here to re-present the running total of all consideration I have received for my work on global warming: $0 (rounded to the nearest dollar). I only wish the conspiracy fans were right about Big Oil passing out money like the government does grants to true believers.

Time Series

I’ve been asked by people via email about where folks can go to read more about Time Series, particularly those new and shiny methods written about it “Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series“. Here.

And in my upcoming book—to be published by…?

I am the only one I know who writes about these things. All statistics books I know are interested, as is natural, in discussing zippy keen methods and not the philosophy. Not only with time series, but with all sorts of analyses. It’s not that the philosophy is entirely neglected, but it’s a poor cousin next to the mathematics.

Incidentally, I was very proud of us with that post. Many people, whether the agree or disagree, are finally starting to understand what I’m saying. If you understood that post, then you’ll finally get this one: “The True Meaning Of Statistical Models.” I’ll redo this one (yet again) maybe tomorrow or Monday. Not repost, redo. There’s always another way to state things.

What Book?

It’s being worked on, slowly and surely. A well known publisher is considering it. If they reject it, which is likely on the premise that most publishers reject most manuscripts, I’ll try with another, then another, then back to self publishing. I’ll comfort myself with the memory that Principia Mathematica was also self published.

On the other hand, the PM, though it was influential, wasn’t read! (I dare you to try.)

Always End With A Joke

Briggs: Jimmy, where are you going tomorrow?

Durante: I’m goin’ to the insane asylum.

Briggs: An insane asylym?

Durante: Yeah, I’m gonna get me a ravin’ beauty.

Pascal’s Pensées, A Tour: I

PascalSince our walk through Summa Contra Gentiles is going so well, why not let’s do the same with Pascal’s sketchbook on what we can now call Thinking Thursdays. We’ll use the Dutton Edition, freely available at Project Gutenberg. (I’m removing that edition’s footnotes.)

Update Comments fixed.

1

The difference between the mathematical and the intuitive mind1.—In the one the principles are palpable, but removed from ordinary use; so that for want of habit it is difficult to turn one’s mind in that direction: but if one turns it thither ever so little, one sees the principles fully, and one must have a quite inaccurate mind who reasons wrongly from principles so plain that it is almost impossible they should escape notice.

But in the intuitive mind the principles are found in common use, and are before the eyes of everybody. One has only to look, and no effort is necessary; it is only a question of good eyesight, but it must be good, for the principles are so subtle and so numerous, that it is almost impossible but that some escape notice. Now the omission of one principle leads to error; thus one must have very clear sight to see all the principles, and in the next place an accurate mind not to draw false deductions from known principles.

All mathematicians would then be intuitive if they had clear sight, for they do not reason incorrectly from principles known to them; and intuitive minds would be mathematical if they could turn their eyes to the principles of mathematics to which they are unused.2

The reason, therefore, that some intuitive minds are not mathematical is that they cannot at all turn their attention to the principles of mathematics. But the reason that mathematicians are not intuitive is that they do not see what is before them, and that, accustomed to the exact and plain principles of mathematics, and not reasoning till they have well inspected and arranged their principles, they are lost in matters of intuition where the principles do not allow of such arrangement. They are scarcely seen; they are felt rather than seen; there is the greatest difficulty in making them felt by those[Pg 2] who do not of themselves perceive them. These principles are so fine and so numerous that a very delicate and very clear sense is needed to perceive them, and to judge rightly and justly when they are perceived, without for the most part being able to demonstrate them in order as in mathematics; because the principles are not known to us in the same way, and because it would be an endless matter to undertake it. We must see the matter at once, at one glance, and not by a process of reasoning, at least to a certain degree. And thus it is rare that mathematicians are intuitive, and that men of intuition are mathematicians, because mathematicians wish to treat matters of intuition mathematically, and make themselves ridiculous, wishing to begin with definitions and then with axioms, which is not the way to proceed in this kind of reasoning. Not that the mind does not do so, but it does it tacitly, naturally, and without technical rules; for the expression of it is beyond all men, and only a few can feel it.3

Intuitive minds, on the contrary, being thus accustomed to judge at a single glance, are so astonished when they are presented with propositions of which they understand nothing, and the way to which is through definitions and axioms so sterile, and which they are not accustomed to see thus in detail, that they are repelled and disheartened.

But dull minds are never either intuitive or mathematical.

Mathematicians who are only mathematicians have exact minds, provided all things are explained to them by means of definitions and axioms; otherwise they are inaccurate and insufferable, for they are only right when the principles are quite clear.

And men of intuition who are only intuitive cannot have the patience to reach to first principles of things speculative and conceptual, which they have never seen in the world, and which are altogether out of the common.4

——————————————————————————————

1From Allan Bloom The Closing of the American Mind: How Higher Education Has Failed Democracy and Impoverished the Souls of Today’s Students (p. 52):

Every Frenchman is born, or at least early on becomes, Cartesian [the mathematician above] or Pascalian [the intuitive]…Descartes and Pascal represent a choice between reason and revelation, science and piety, the choice from which everything else follows…These great opponents whom no snythesis can unite—the opposition between bon sens and faith against all odds—set in motion a dualism…

It was, therefore, very French of Toucqueville to say that the Americans’ method of thought was Cartesian…

2The great fallacy is to suppose we can do with only one of these types (even inside one body). American and British thought plunges headlong into the mathematical—we are all Cartesians here. This isn’t a new observation. Tocqueville said “each American appeals to the individual exercise of his own understanding alone. America is therefore one of the countries in the world where philosophy is least studied, and where the precepts of Descartes are best applied…they follow his maxims because this very social condition naturally disposes their understanding to adopt them.”

Strict Cartesianism leads to scientism and the worship of rationality and reason as if these could live without intellection, what Pascal called intuition. No mathematician could even begin to think without intellection. Intuition, used in this special sense, is necessary and prior to logic, mathematics, and ratio. Axioms, for instance, are not provided by rationality. Pure rationality is always incomplete. I’ll have much more to say about this in the coming weeks.

3It is well to put it here the fallacy that says that because sometimes our intuitions fail us that they always do. Sometimes our mathematical reason also fails us, but nobody would claim that therefore all of mathematics should be tossed or is suspect (except radical skeptics; paradoxically, personages only found in Western universities).

4Relying only on one leads to rank pedantry, sterility, and blind alleys.

Netherlands Temperature Controversy: Or, Yet Again, How Not To Do Time Series

Today, a lovely illustration of all the errors in handling time series we have been discussing for years. I’m sure that after today nobody will make these mistakes ever again. (Actually, I predict it will be a miracle if even 10% read as far as the end. Who wants to work that hard?)

Thanks to our friend Marcel Crok, author and boss of the blog The State of the Climate, who brings us the story of Frans Dijkstra, a gentleman who managed to slip one by the goalie in the Dutch paper de Volkskrant, which Crok told me is one of their “left wing quality newspapers”.

Dijkstra rightly pointed out the obvious: not much interesting was happening to the temperature these last 17, 18 years. To illustrate his point and as a for instance, Dijkstra showed temperature anomalies for De Bilt. About this Crok said, “all hell broke loose.”

That the world is not to be doomed by heat is not the sort of news the bien pensant wish to hear, including one Stephan Okhuijsen (we do not comment on his haircut), who ran to his blog and accused Dijkstra of lying (Liegen met grafieken“). A statistician called Jan van Rongen joined in and said Dijkstra couldn’t be right because an R2 van Rongen calculated was too small.

Let’s don’t take anybody’s word for this and look at the matter ourselves. The record of De Bilt is on line, which is to say the “homogenized” data is on line. What we’re going to see is not the actual temperatures, but the output from a sort of model. Thus comes our first lesson.

Lesson 1 Never homogenize.

In the notes to the data it said in 1950 there was “relocation combined with a transition of the hut”. Know what that means? It means that the data before 1950 is not to be married to the data after that date. Every time you move a thermometer, or make adjustments to its workings, you start a new series. The old one dies, a new one begins.

If you say the mixed marriage of splicing the disjoint series does not matter, you are making a judgment. Is it true? How can you prove it? It doesn’t seem true on its face. Significance tests are circular arguments here. After the marriage, you are left with unquantifiable uncertainty.

This data had three other changes, all in the operation of the instrument, the last in 1993. This creates, so far, four time series now spliced together.

Then something really odd happened: “warming trend of 0.11oC per century caused by urban warming” was removed. This leads to our second lesson.

Lesson 2 Carry all uncertainty forward.

Why weren’t 0.08oC or 0.16oC per century used? Is it certainly true there was a perfectly linear trend of 0.11oC per century was caused by urban warming? No, it is not certainly true. There is some doubt. That doubt should, but doesn’t, accompany the data. The data we’re looking at is not the data, but only a guess of it. And why remove what people felt? Nobody experienced the trend-removed temperatures, they experienced the temperature.

If you make any kind of statistical judgment, which include instrument changes and relocations, you must always state the uncertainty of the resulting data. If you don’t, any analysis you conduct “downstream” will be too certain. Confidence intervals and posteriors will be too narrow, p-values too small, and so on.

That means everything I’m about to show you is too certain. By how much? I have no idea.

Lesson 3 Look at the data.

Here it is (click on all figures for larger images, or right click and open them in new windows). Monthly “temperatures” (the scare quotes are to remind you of the first two lessons, but since they are cumbrous, I drop them hereon in).

Monthly data from De Bilt.

Monthly data from De Bilt.

Bounces around a bit, no? Some especially cold temps in the 40s and 50s, and some mildly warmer ones in the 90s and 00s. Mostly a lot of dull to-ing and fro-ing. Meh. Since Dijkstra looked from 1997 on, we will too.

Same as before, but only from 1997.

Same as before, but only from 1997.

And there it is. Not much more we can do until we learn our next lesson.

Lesson 4 Define your question.

Everybody is intensely interested in “trends”. What is a “trend”? That is the question, the answer of which is: many different things. It could mean (A) the temperature has gone up more often than it has gone down, (B) that it is higher at the end than at the beginning, (C) that the arithmetic mean of the latter half is higher than the mean of the first half, (D) that the series increased on average at more or less the same rate, or (E) many other things. Most statisticians, perhaps anxious to show off their skills, say (F) whether a trend parameter in a probability model exhibits “significance.”

All definitions except (F) make sense. With (A)-(E) all we have to do is look: if the data meets the definition, the trend is there; if not, not. End of story. Probability models are not needed to tell us what happened: the data alone is enough to tell us what happened.

Since 55% of the values went up, there is certainly an upward trend if trend means more data going up than down. October 1997 was 9.6C, October 2014 13.3C, so if trend meant (B) then there was certainly an upward trend. If upward trend meant a higher average in the second half, there was certainly a downward trend (10.51C versus 10.49C). Did the series increase at a more of less constant rate? Maybe. What’s “more or less constant” mean? Month by month? Januaries had an upward (A) trend and a downward (B) and (C). Junes had downward (A), (B), and (C) trends. I leave it as a reader exercise to devise new (and justifiable) definitions.

“But wait, Briggs. Look at all those ups and downs! They’re annoying! They confuse me. Can’t we get rid of them?

Why? That’s what the data is. Why should we remove the data? What would we replace it with, something that is not the data? Years of experience have taught me people really hate time series data and are as anxious to replace their data as a Texan is to get into Luby’s on a Sunday morning after church. This brings us to our next lesson.

Lesson 5 Only the data is the data.

Now I can’t blame Dijkstra for doing what he did next, because it’s habitual. He created “anomalies”, which is to say, he replaced the data with something that isn’t the data. Everybody does this. His anomalies take the average of each month’s temperature from 1961-1900 and subtract them from all the other months. This is what you get.

Same, but now for anomalies.

Same, but now for anomalies.

What makes the interval 1961-1990 so special? Nothing at all. It’s ad hoc, as it always must be. What happens if we changed this 30-year-block to another 30-year-block? Good question, that: this:

All possible 30-year-block anomalies.

All possible 30-year-block anomalies.

These are all the possible anomalies you get when using every possible 30-year-block in the dataset at hand. The black line is the one from 1961-1990 (it’s lower than most but not all others because the period 1997-2014 has monthly values higher than most other periods). Quite a window of possible pictures, no?

Yes. Which is the correct one? None and all. And that’s just the 30-year-blocks. Why not try 20 years? Or 10? Or 40? You get the idea. We are uncertain of which picture is best, so recalling Lesson 2, we should carry all uncertainty forward.

How? That depends. What we should do is to use whatever definition of a trend we agreed upon and ask it of every set of anomalies. Each will give an unambiguous answer “yes” or “no”. That’ll give us some idea of the effect of moving the block. But then we have to remember we can try other widths. And lastly we must remember that we’re looking at anomalies and not data. Why didn’t we just ask our trend question of the real data and skip all this screwy playing around? Clearly, you have never tried to publish a peer-reviewed paper.

Lesson 6 The model is not the data.

The model most often used is a linear regression line plotted over the anomalies. Many, many other models are possible, the choice subject to the whim of the researcher (as we’ll see). But since we don’t like to go against convention, we’ll use a straight line too. That gives us this:

Same as before, but with all possible regression lines.

Same as before, but with all possible regression lines.

Each blue line indicates a negative coefficient in a model (red would have showed if any positive; if we start from 1996 red shows). One model for every possible anomaly block. None were “statistically significant” (an awful term). The modeled decrease per decade was anywhere from 0.11 to 0.08 C. So which block is used makes a difference in how much modeled trend there is.

Notice carefully how none of the blue lines are the data. Neither, for that matter, are the grey lines. The data we left behind long ago. What have these blue lines to do with the price of scones in Amsterdam? Another good question. Have we already forgotten that all we had to do was (1) agree on a definition of trend and (2) look at the actual data to see if it were there? I bet we have.

And say, wasn’t it kind of arbitrary to draw regression line starting in 1997? Why not start in 1998? or 1996? Or whatever? Let’s try:

These models are awful.

These models are awful.

This is the series of regression lines one gets starting separately from January 1990 and ending at December 2012 (so there’d be about two years of data to go into the model) through October 2014. Solid lines are “statistically significant”: red means increase, blue decrease.

This picture is brilliant for two reasons, one simple, one shocking. The simple is that we can get positive or negative trends by picking various start dates (and stop; but I didn’t do that here). That means if I’m anxious to tell a story, all I need is a little creativity. The first step in my tale will be to hasten past the real data and onto something which isn’t the data, of course (like we did).

This picture is just for the 1961-1990 block. Different ones would have resulted if I had used different blocks. I didn’t do it, because by now you get the idea.

Now for the shocking conclusion. Ready?

Usually time series mavens will draw a regression line starting from some arbitrary point (like we did) and end at the last point available. This regression line is a model. It says the data should behave like the model; perhaps the model even says the data is caused by the structure of the model (somehow). If cause isn’t in it, why use the model?

But the model also logically implies that the data before the arbitrary point should have conformed to the model. Do you follow? The start point was arbitrary. The modeler thought a straight line was the thing to do, that a straight line is the best explanation of the data. That means the data that came before the start point should look like the model, too.

Does it? You bet it doesn’t. Look at all those absurd lines, particularly among the increases! Each of these models is correct if we have chosen the correct starting point. The obvious absurdity means the straight line model stinks. So who cares whether some parameter within that model exhibits a wee p-value or not? The model has nothing to do with reality (even less when we realize that the anomaly block is arbitrary and the anomalies aren’t the data and even the data is “homogenized”; we could have insisted a different regression line belonged to the period before our arbitrary start point, but that sounds like desperation). The model is not the data! That brings us to our final lesson.

Lesson 7 Don’t use statistics unless you have to.

Who who knows anything about how actual temperatures are caused would have thought a straight line a good fit? The question answers itself. There was no reason to use statistics on this data, or on most time series. If we wanted to know whether there was a “trend”, we had simply to define “trend” then look.

The only reason to use statistics is to use models to predict data never before seen. If our anomaly regression or other modeled line was any good, it will make skillful forecasts. Let’s wait and see if it does. The experience we have just had indicates we should not be very hopeful. There is no reason in the world to replace the actual data with a model and then make judgments about “what happened” based on the model. The model did not happen, the data did.

Most statistical models stink and they are never checked on new data, the only true test.

Homework Dijkstra also showed a picture of all the homogenized data (1901-2014) over which he plotted a modeled (non-straight) line. Okhuijsen and van Rongen did that and more; van Rongen additionally used a technique called loess to supply another modeled line. Criticize these moves using the lessons learned. Bonus points for using the word “reification” when criticizing van Rongen’s analysis. Extra bonus points for quoting from me about smoothing time series.

The Day I Made The Front Page!

I am the fellow on the left, age 15.

I am the fellow on the left, age 15.

In case you can’t see the fine print, it reads:

FRESH PUDDLE ICE formed from the January rains collapsed under the weight of a car driven by Adam Kennedy, above right, Sunday on Otsego Lake. Kennedy and friend Matt Briggs, both of Gaylord, examine the spot where the car’s wheels rest on thick lake ice. With an assist from lake resident Wayne Miller, a tow truck on shore south of the state park arrived and Kennedy attached the 300-foot cable to the trailer hitch. Miller paced the car’s distance from shore at 288 feet before the vehicle eased out.

Now that’s what I call news!

Sign of the old times.

Sign of the old times.

The date is Thursday 24 January 1980. We lived on a sort of bluff at the south end of Otsego Lake and were sitting have Sunday dinner (Banquet Chicken, probably) and we saw this guy spinning donuts on the ice. Somebody wondered if the ice was too thin.

Kerplunk. Just to show that you can’t trust anything you read, I didn’t know Adam from Adam, though the paper says we were pals. And how thick could the ice have been if the car had gone through? Global Warming was a menace even then! We sure were examining the spot where the car nearly sank into oblivion, though. Our postures positively radiate examinationness, or perhaps it is examinationativity. Anyway, not much else we could have done until the tow truck arrived.

The sticker on the side of the car, incidentally, is for Alpenfest 76. The town at all times of the year dresses itself up in an ersatz Bavarianism, but during the third week of July the residents, who were predominately Polish, join in for the thing which is Alpenfest. There is the burning of the Boog, die grosse kaffeepause, the singing of Edelweiss, a Queen contest which is always the subject of intense betting, booths of cutesy country crap (crafts! I meant crafts) on the Alpenstrasse (main street), a parade in which I marched as part of the High School band, and carnival rides.

The carnies used to have the Dime Game in which bettors placed dimes on colored squares and where somebody tossed a racquetball into a pen with colored holes. The prize was candy. Loved it. But the State in its role of in loco parentis shut it down because it looked too much like gambling. Or maybe it’s because the State treasures its monopoly on that sport. Or maybe it was standard American puritanism. Whichever way, it’s gone.

Careful readers will have noted the absence of ice fishing houses in the picture. Most of these were on the north side of the lake anyway, but many took them off because of the rain.

But there’s no mistaking the foggy grayness which is a permanent fixture of Northern Michigan winters. If it wasn’t snowing, it was about to. Gaylord is the high and snowiest point in the lower peninsula. Lots of lake effect. Sunny days in winter were rare, but boy were they pretty. Perfect time to go cross-country skiing.

Or hunting. My friend Chuck Coonrod—whose dad, then a school bus driver, is coincidentally mentioned in this same issue of the stately Herald Times—would head up the hill behind his house and go an kill animals. I still remember the first time I baked a liberally salted and peppered squirrel in the oven. It was good!

Behind his house was a steep, winding hill. Chuck had metal saucers on which we would sit and launch ourselves into near oblivion. This was still in the day of cloth snow suits. The end of a good afternoon saw us drenched and steaming sitting by his kitchen table snacking on the bacon his mother always had by the side of the stove.

Soft spots in the lake weren’t the only danger. My nephew had a video which I am unable to rediscover which showed how in the spring the wind would push chunks of ice onto the land. An inexorable flow, like cold lava. It doesn’t sound much, but all that weight does damage. I got my leg stuck once and thought it would going to break off.

Good thing we can simulate all this stuff on a computer now. So much safer.

Update The horrific (placed there by my enemies) of mislocating Alpinefest to the wrong month has been fixed. Consider this a belated trigger warning.

Older posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑