William M. Briggs

Statistician to the Stars!

Category: Statistics (page 1 of 178)

The general theory, methods, and philosophy of the Science of Guessing What Is.

Please Don’t Smooth Your (Social Media) Data!

Friends don't let friends smooth.

Friends don’t let friends smooth.

Don’t

Don’t smooth your data and then use that smoothed data as input to other analysis. You will fool yourself. You will make over-confident decisions. It is the wrong thing to do. It is a mistake. It is a guarantee of over-certainty. I don’t know how to put it more plainly. Lord knows I have tried. See below for a non-success story.

Smoothing means any kind of modeling, which includes running means, just-plain-means, filtering of any kind, regression, wavelets, Fourier analysis, ARIMA, GARCH; in short, any type of function where actual data comes in and something that is not data comes out.

Do not use the something-that-is-not-data as if it is data. This is a sin.

Don’t believe me. Try it yourself. The picture is from an upcoming paper I and some friends are writing.

It shows two simulated normal noise time series, with successively higher amounts of smoothing applied by a k-rolling mean. From top left clockwise: k = 1, 10, 20, 30; a k = 1 corresponds to no smoothing. The original time series are shown faintly for comparison. The correlation between the two series is indicated in the title.

More smoothing equals higher correlations. Since there are no causes between these series, the correlation should be hovering around 0, which it is in the first panel. And that correlation stays near 0—for the original real not fake un-smoothed data. But if you calculate the correlation between the smoothed series…the sky’s the limit!

Now it is not true that in each and every and all instances that smoothing will increase the correlation between two smoothed series. It might be that (in absolute value), for your one-time smoothing, correlation decreases or stays put. But it usually will increase, and usually by a lot.

Why? Imagine any two straight lines with non-zero slopes. These two straight lines will have perfect Pearson correlation, either +1 or -1. Regression and other measures will also show perfect agreement. The proof of this is trivial, and I leave it as an exercise (don’t be lazy; try it). Smoothing makes time series data look more like straight lines, as the pictures show. Simple as that.

There are all manner of fine points I’m skipping and would make wonderful Masters projects. Just what kind of data and what kind of smoothing and what statistical measures are affected and by what magnitude? All these questions are quantifiable and will make for fun puzzles. My experience with actual data and actual smoothing and typical measures shows that magnitude is large.

It happens

Now, without betraying any confidences, let me tell you of the latest in a long and growing string of bad examples. Two companies, one internationally known for their quantitative prowess, another even better known for its ability to make vast wads of money. Call them A (stats) and B (client). I did not work for either A or B, but know and advised certain parties.

B advertised and wondered how much of an effect this had on its measure of success. A said they could tell, using sophisticated Bayesian models incorporating social media data.

Social media!

Wowzee! Tell people you have busted open the secrets of social media and they will dump buckets of cold cash on you. Hint: everybody who says they have it figured out is either exaggerating to themselves or to their clients. (Say, that’s a pretty bold statement.)

Anyway, smoothing occurred. And correlations greater than 0.95 were boasted of. I’m not kidding about this number. Company A really did brag of enormous “impacts” of its smoothed measures. And Company B believed them—because they wanted to believe. Sophisticated Bayesian models incorporating social media data! How could you go wrong?

The real correlations, using unsmoothed data, were near 0. Just as you’d expect them to be for such noisy data as “social media” predicting a company’s measure of success. Do you really think Twitter streams contain magic?

I told all involved. I explained pictures like those above. I was emphatic and clear. I stood neither to gain nor lose regardless of the decision. Only two people (at B) believed me, neither of whom were in a position to make decisions.

At least I am comforted that Reality is my friend here. The company’s will eventually realize, but probably never admit, that their measures are spurious. Because they will realize but not admit, these measures will be quietly abandoned…

…As soon as the next computer self-programmed big data machine learning artificially intelligent smart-phone-data algorithm comes along and seduces them.

Humanae Vitae & The Synod: Theories And Predictions

Selfish genes theory does not predict these.

This originally ran 14 May 2014, but since this weekend is Number Six’s big show, I thought it well to have another look. The title is New & Improved! See the Update at the bottom.

Old predictions

Theories are useful to explain and to predict. Any theory can explain, but only true or likely true theories can skillfully predict.

For instance, Uncle Bob explains that your car won’t start because of Gremlins. His theory, which he drags out on all State occasions, does explain. But it is, as I hope is obvious, a theory which is useless to make predictions.

Two obvious examples. If you say the sun will peek above the horizon at 6:32:17 AM because gravitational theory says it will, and the sun does its duty, your theory has something going for it, especially if the theory makes lots of accurate predictions. And if you say, and say each year for two decades, that the planet’s global average temperature will soar to “unprecedented” heights, yet the temperature misbehaves and stays put, you’re theory is likely false.

Now how about these predictions, made in 1968, on what would happen were contraception to be embraced (which, of course, it has been). This embracement will:

  1. “[O]pen wide the way for marital infidelity and a general lowering of moral standards”. Nailed it.
  2. Especially in the young, “[A] man who grows accustomed to the use of contraceptive methods may forget the reverence due to a woman”. Nailed it.
  3. He will “reduce her to being a mere instrument for the satisfaction of his own desires”. Hooked-up nailed it.
  4. He will “no longer considering her as his partner whom he should surround with care and affection.” Nailed it.
  5. “Finally, careful consideration should be given to the danger of this power passing into the hands of those public authorities who care little for the precepts of the moral law. Who will blame a government which in its attempt to resolve the problems affecting an entire country resorts to the same measures as are regarded as lawful by married people in the solution of a particular family difficulty?” Oh my, oh my, is that one nailed.
  6. “Who will prevent public authorities from favoring those contraceptive methods which they consider more effective?” None, that’s who: another hit.
  7. “Should [the government] regard this as necessary, they may even impose their use on everyone.” HHS mandate, anybody? Nailed it again.
  8. “It could well happen, therefore, that when people, either individually or in family or social life, experience the inherent difficulties of the divine law and are determined to avoid them, they may give into the hands of public authorities the power to intervene in the most personal and intimate responsibility of husband and wife.” Smack! Pow! Wow! Capital-N ailed it.

Who is this guy, this prescient sage, who, drawing from some theory, foretold the world of 2014 so accurately? Well, his name was Paul, and as he came from a long line of Pauls in the same Institution to which he was appointed Leader, he called himself Number 6. One thing we know, given his batting average, we should accord the theory which created these predictions pretty high weight.

Right?

Number 6’s theory also explains the popularity of divorce, out-of-wedlock births, and the rise in the belief of individuals’ “unlimited dominion” over “his specifically sexual faculties.” Number 6 didn’t actually specify these as predictions, though, taking them “as read.”

History

Now the most fascinating thing about these predictions is how they came to be made. What happened was this. Number 6’s predecessor called on a Commission of experts, who met and deliberated from 1963 to 1966 (Number 6 boosted Commission membership halfway through), giving a report to Number 6 two years before he made his predictions. Nobody was in any rush.

The Commission was loaded with sober academics and had the support of a good portion of the leadership of Number 6’s Institution. Word leaked out, as word always leaks out, about the Commission’s efforts and opinions, and this excited popular and media support. The Commission, not wanting to be on the wrong side of history, favored contraception. After all, this was a different world than that world which came before this world: or something.

After several years of glowing expectations, most expected Number 6 to endorse the Commission’s report. He did not.

Boy!, did tempers flare. To say the free-for-all crowd were displeased is a massive understatement. Number 6 and his Institution were ridiculed in the press and in academia and, indeed, by some leaders in Number 6’s Institution. One academic member of the Commission called Number 6’s predictions “that horrible document.” A prominent leader in Number 6’s Institution publicly charged Number 6 with “an anti-collegial act”. Ouch.

That fellow was far from alone. Many other leaders and groups of leaders castigated Number 6 openly. These dissidents went so far as to tell the common folk to ignore Number 6 and do what they please. And they did. And where they did do as they pleased, it was found that the Institution lost members.

Of course, it is not often remembered, perhaps willfully, that Number 6’s theory made stunningly accurate predictions, whereas his enemies’ theories made inaccurate ones.

The reason for that digression is important because again Number 6’s Institution will meet to discuss matters pertaining to human sexuality and the family. The meeting will go for at least two years. Experts will be confided in. Reports will be written.

As before, the press and a sizable chunk of leadership is on the side of liberalization; they particularly favor giving the nod to divorce but also to so-called same-sex marriage and perhaps even abortion. The world has changed, these people say, and therefore the Institution must also change—to become something that is not the Institution.

The Institution’s new leader Francis is being groomed by the liberalizers as the man with the plan, as somebody who is willing to set aside the old truths for new ones. These new dissidents are in the habit of parsing every public word of Francis’s to find in them support for their new truths. So adept are they at this that almost before Francis is done speaking, a news item or blog post is up saying, “Change we can believe in is coming!”

New predictions

My guess, working within Number 6’s theory, is whichever leader is in charge after the family synod is over will support tradition. The ban on contraception will be upheld. Marriage will be, as it can only be, declared to be between one man and one woman. Sodomy will still be a sin. Divorce will still be forbidden and not supported in the Institution’s activities. Abortion, if mentioned, will still be condemned.

The howling which will greet the announcement that there cannot be new truths, but only Truth, will be wondrous to behold, especially since, as before, liberalizers expect the vote to go with them. New dissidents will arise who, again as before, will tell people to ignore official proclamations and do what they want.

What will happen to rebellious families is obvious: more of what Number 6 said, a decrease in interest in marriage, increased state control over all things sexual, recognition that children belong to the state and not “parents”, and because of the dissolution of the family, an increase in support of euthanasia.

And people, even seeing the accuracy of these predictions, will still largely disbelieve the theory.

Update 14 May. It’s coincidence day at WMBriggs.com: Are Our Relationships Threatening The State?

Update 18 October 2014. Not that I want to brag, but it looks like the Truth Theory is still holding strong. But what a week!

The question is whether, after the conclusion of next year’s synod, Pope Francis will emulate his brother Number 6, or will he seek more worldly pastures?

I predict the former, in the following sense. I say Tradition holds, whether Pope Francis wants it to or not. It won’t matter what he or what anybody else wants, sin will still be sin. Doctrine will remain unchanged. Now, how will this Great Continuance happen? I have no idea. But I am reminded of the tale of Arius, a bishop who led one of the Church’s earliest heresies, a man whose power of convincing other Church fathers waxed and waned, but which never deserted him, and who, on his way to a final crucial meeting where he might have convinced others of his fallacy, had this happen to him:

It was then Saturday, and Arius was expecting to assemble with the church on the day following: but divine retribution overtook his daring criminalities. For going out of the imperial palace, attended by a crowd of Eusebian partisans like guards, he paraded proudly through the midst of the city, attracting the notice of all the people. As he approached the place called Constantine’s Forum, where the column of porphyry is erected, a terror arising from the remorse of conscience seized Arius, and with the terror a violent relaxation of the bowels: he therefore enquired whether there was a convenient place near, and being directed to the back of Constantine’s Forum, he hastened thither. Soon after a faintness came over him, and together with the evacuations his bowels protruded, followed by a copious hemorrhage, and the descent of the smaller intestines: moreover portions of his spleen and liver were brought off in the effusion of blood, so that he almost immediately died. The scene of this catastrophe still is shown at Constantinople, as I have said, behind the shambles in the colonnade: and by persons going by pointing the finger at the place, there is a perpetual remembrance preserved of this extraordinary kind of death.

The answer thus comes via Peter Kreeft, who is fond of quoting a Southern Baptist minister who managed to sum up the lessons of the Bible in four words. “I’m God. You’re not.”

Global Warming’s Shark Jumping Moment

Weeeeeeeeeeeeeee!

Weeeeeeeeeeeeeee!

Travel day today, so something light and airy—and incomplete.

President Obama, presumably sacrificing a tee time, went to keyboard and typed these words:

This is a big moment in the fight against climate change—stick it to climate change deniers by adding your name: http://t.co/fkCzkiMhFw

Now you may think the problem lies in the words stick it to climate change deniers, but you would be wrong. Mr Obama is a politician and stick it is political language, and typical language at that. It is the weak man who allows himself to be offended easily. Add to that that it probably wasn’t our dear leader but some staffer or other hack who wrote the tweet, then there is nothing unusual with stick it.

No. The shark jump is in adding your name. Add your name to what? A petition.

Climate science by petition! While global warming has always had a thick dressing of ideology, this converts the matter completely from science to pure politics. What we have here is the triumph of pragmatism and democracy, the disastrous idea that truth can be had by vote, finally applied to science. Here’s the petition pleading:

Deniers and deep-pocketed polluters make it pretty hard to get anything done on climate change—but here’s one meaningful way you can fight them: The EPA is collecting comments on President Obama’s climate plan, and it’s our chance to show public support.

This is one of the decisive moments in the fight against climate change. Collecting comments gives the EPA a chance to see what ordinary people have to say about this important issue. (Don’t worry—they hear from the special interests on every day that ends in Y.)

What ordinary people have to say about global warming is, as far the EPA goes, meaningless. What value is it to collect the opinions of those unaware of fluid physics about the value of that physics? Of course, it is of some use to ask people how potential regulations might affect them, but about the science behind the regulations there is none.

Yet BarackObama.com says

The other side thinks they can win this fight simply by shouting the loudest, and they have a lot of money to back it up. What they don’t have is a whole lot of people—genuine voices standing up for what’s right. And we’ve proved time and again that, when we raise our voices together, we can take on even the most powerful interests.

Forget the canard about money (I’m still waiting for my check from Big Oil), and forget the idea that the people currently in power need to fight “powerful interests”. What is the point of “voices”? Will these voices fix model parameterizations? Will crowd wisdom tell us the proper role of cloud feedback? Should we turn to social media to guess proxy temperature reconstructions?

I’m writing in a hurry and can’t develop the idea, but somehow I’m put in mind of those early sci-fi movies, where groups of concerned scientists and government officials would gather around a table to discuss the alien giant bug crisis. Sleeves were rolled up. Uniforms worn. Discussions ranged. But never once was there the idea of putting ideas to a vote among the populace.

Incidentally, if you add your email (I made one up) to the Science Petition, you are asked to give money to Mr Obama to “help defeat the dirty special interests”. Sheesh.

Truth, Knowledge, Belief, & Gettier Problems

"Hey, you never know."

“Hey, you never know.”

Proof Isn’t All That

The first section can be skipped for those who know what necessary versus conditional truth is.

I recall an anecdote about John von Neumann which had a fellow asking von Neumann for the proof of some mathematical proposition. Von Neumann asked the fellow which of other several theorems the fellow might already know, and he mentioned two, whereupon von Neumann proved the proposition twice, along the two different paths. Implicit in the story is that he could have proven it upon the other paths as well.

We don’t know what this proposition was, so call it X. Since X is necessarily true, we can have knowledge of it, where knowledge, as some philosopher define it, is “justified true belief.” They’d say the justification comes from the proof and the belief comes from us as an act of our intellect.

But does truth come from the proof? Von Neumann showed there were many different ways of knowing a proposition was true, but the multiplicity did not add to the truth of X. X was just true, and always was, regardless whether anybody knew it or believed it. So there is a difference between the truth of some thing and our knowing it; or rather, there seems to be a difference in the justification of our belief the thing is true and its truth.

Let’s clarify. Take our old standby argument with premises E = “All Martians wear hats and George is a Martian” relative to the proposition Y = “George wears a hat.” Y given E is true; that is to say, we know that Y given E is true, that it follows. We may therefore believe Y given E, as a sort of joint proposition, say, Y-given-E. But Y by itself, sans E, is not a necessary truth. Neither is E by itself a necessary truth. But Y-given-E is. Y therefore is a conditional truth, given or accepting or believing or having faith that E.

A necessary truth is one which is true no matter what. Take non-contradiction. It cannot be true that Z = “X is true and so simultaneously is not-X true”. In other (and confusing words), not-Z is true. There isn’t any way to think that Z (except, as many do, by changing it so that Z is no longer Z, and then forgetting they made changes). Why is Z false? Who knows? God made it that way. Why is it true that W = “For every natural number r, r = r”? I have no idea. God made it that way. What is our justification for believing W? Faith? Or is it that we’re too light in gray matter to discover a proof—or, worse, a counter proof?

Actually, we do have reasons for believing not-Z and W. That we cannot think of how Z is true is a dandy reason for thinking it false, and all experience is that for every natural number r, r does indeed equal r. Induction supplies the rest. From our senses to the truth!

All this is just a sketch, which we needed for the real meat which follows.

Get Gettier

A man hears his wife say she bought him a lottery ticket and he thinks to himself, R = “I now have a chance to win”. Unbeknownst to him, his wife was teasing. We know this, his wife knows this, but the man does not. The man accepts his wife’s word, conditional on which he believes R. R given the premise “Wife bought ticket” is thus a conditional truth. A believable truth, too, given he accepts (unconditionally) his wife’s word. R is not necessarily true, however, as is obvious.

Now Edmund Gettier famously claimed there were situations in which a person has a justified true belief, yet that belief did not meet the test of knowledge. Our lottery situation isn’t quite what he had in mind, because everybody would agree that R is a conditional but not necessary truth. To make this a “Gettier problem”, let’s add the premise “The man’s mother bought him a ticket for the same drawing but told nobody”. It is clear that R is now true, say Gettier followers, and the man is should believe it, but his claim doesn’t rise to the level of knowledge because his accepting R is based on his believing something which is false in fact (his wife’s joke).

But R is still a conditional truth to us and to the mother, who know of her actions. R, being contingent, can never be a necessary truth.

Gettier “problems”, I think, are based on forgetfulness. We forget who knows what and we forget what question is being asked of the evidence. To the man, R is conditionally true based on one set of premises, and to us it is conditionally false based on one set of evidence (just the wife’s statement) true based on another set (adding the mother’s). R is never true is the necessary sense. Plus, there are any number of premises which can exist, and which can be believed, that make it conditionally true. Even conditioned on the premise, D= “I, the man in this example, bought my own ticket” R is still only conditionally and not necessarily true.

In short, Gettier “problems” aren’t. This, incidentally, is one of the few cases where symbolic logic helps; I mean, being able to write the story down in symbols makes it much easier to see what goes where and who knows what, so that it is less easy to slip up.

Homework

I’m taking this example from Wikipedia, which (yes) does a good job explaining the set up.

The [justified true belief] account of knowledge is the claim that knowledge can be conceptually analyzed as justified true belief — which is to say that the meaning of sentences such as “Smith knows that it rained today” can be given with the following set of necessary and sufficient conditions:

A subject S knows that a proposition P is true if and only if:

  1. P is true, and
  2. S believes that P is true, and
  3. S is justified in believing that P is true

Recall von Neumann’s example and that X being true and anybody knowing X and the proof or belief of X are not the same thing. And also note that this definition mistakenly forgets to emphasize whether P is a conditional or necessary truth.

Here is a Gettier problem (also Wikipedia):

Smith has applied for a job, but, it is claimed, has a justified belief that “Jones will get the job”. He also has a justified belief that “Jones has 10 coins in his pocket”. Smith therefore (justifiably) concludes (by the rule of the transitivity of identity) that “the man who will get the job has 10 coins in his pocket”.

In fact, Jones does not get the job. Instead, Smith does. However, as it happens, Smith (unknowingly and by sheer chance) also had 10 coins in his pocket. So his belief that “the man who will get the job has 10 coins in his pocket” was justified and true. But it does not appear to be knowledge.

What has gone wrong?

Don’t Say “Natural Variability”

1711years

Word that the climate of doom we were promised (repeatedly) has not obtained has begun leaking out. Climatologists have known this for quite some time, but now even environmental activists are beginning to realize the horrible truth that their worst fears have not been realized.

The excuses have thus begun.

We have already learned “Don’t say ‘Hiatus'” because that is to speak nonsensically. Saying there is a “pause” or “hiatus” assumes the models which predicted the doom which did not happen were somehow right after all, and that it is Reality itself that is error.

It cannot be in the models we currently possess, because these models did not foresee what actually happened. The incontrovertible evidence is that these models are wrong. That they should not, in their current state, be trusted. That whatever they say is subject to extreme reasonable rational doubt. That decisions should not be made based upon their predictions (except the decision to produce better models).

To say there is a “pause” is to say that the models were right after all, even though Reality differed from the models. To say there is a “hiatus” is to say Theory is better than Reality. This is to commit the Deadly Sin of Reification.

One of the excuses is that the models were right after all, but the missing high temperature they predicted is actually in hiding. Sort of like in those movies where the Leader sneaks out of his palace or house and mixes with the ordinary people, and thus he learns What’s Really Important. That is, Global Warming has realized that people are important, too, and has given up its nefarious plans. Or something.

Anyway, the “in hiding” excuse can’t be right, not exactly, because the models already swore they took into account all the sources of heat, including the oceans. Obviously the models were wrong and they didn’t take some thing or things into account. What’s wrong, though, is anybody’s guess. Because some thing or things are wrong, however, it does not mean the thing you guess was wrong was the thing that was wrong. To prove it, you’ll have to redo the models and reforecast the future. Then we wait and see. In the meantime, keep quiet.

One thing we know with certainty is that the thing (in error) cannot be natural variability.

Natural variability, sisters and brothers, is what the models said they could predict skillfully. The models did not skillfully predict natural variability. Natural variability just is, in this sense, what the temperature does.

There is another sense of the phrase, though, a kind of enviro-religious sense that people might be using, which is, “What the temperature would do in absence of humans”. Now that is a valid thing to study. Only trouble is, it’s counterfactual. We can produce answers by the grant-load, but we’ll never know, or that is, we can never verify, whether any of them are true.

Because why? Because, of course, we humans are here and have been here. There is no way to remove our influence (or the influence of any species), so there is no way to know with certainty what the climate would be like without us. Of course, we might make reasonable guesses about what a never-were-humans climate would look like. But we would know those guesses are reasonable only after we can create models that can skillfully predict what the climate will look with us. Yet, as said, we’d never be able to verify those guesses because, of course, here we are.

Humans—and ants, aardvarks, and antelopes—are in integral part of the climate. All creatures influence the climate to some degree (get it? get it?). We are thus part of nature, thus part of real natural variability.

It was never a question whether humans influenced climate, for the answer was always yes; instead, the real science lay in understanding how we effect it. And how everything else effects it. And we’ll know we’ve done a good job with those questions—with understanding “natural variability”, that is—one we can produce good forecasts.

speaker

Older posts

© 2014 William M. Briggs

Theme by Anders NorenUp ↑