Skip to content

Category: Class – Applied Statistics

September 18, 2017 | 9 Comments

Signal + Noise vs. Signal — Important Update

If we imagine these are atmospheric concentrations or stock price anomalies, this is a terrific example of reification, or replacing what did happen with what did not.

Update I see that I failed below to demonstrate the ubiquity of the problem. So your homework is to search “testing trend time series” and similar terms and discover for yourself. Any kind of hypothesis test used on a time series counts.

My impetus was in reading an article about a paper some colleagues and I wrote about atmospheric ammonia. The author wrote, “The statistical correlation between hourly ammonia concentrations between measurement stations is weak due to large variability in local agricultural practice and in weather conditions. If data are aggregated to longer time scales, correlations between stations clearly increase due to the removal of noise at the hourly timescale.”

There’s the belief in “noise”, which does not exist, and there’s also the second (bigger) mistake, which is measuring correlation of time series after smoothing, which increases (in absolute value) the correlation (as has been proved here and Uncertainty_ many, many, many times). This happens even for two strings of absolutely unrelated, made-up numbers. Try it yourself.

So you just look for mentions of “noise” in stock prices, and so on and see if I’m right about the scale of the problem.

Original article

Two weeks ago the high temperature on the wee island upon which I live was 82F (given my extreme sloth, I am making all details up).

Now for the non-trick question: What was the high temperature experienced by those who went out and about on that day?

If you are a subscriber to the signal+noise form of time series modeling, then your answer might be 78F, or perhaps 85F, or even some other figure altogether. But if you endorse the signal form of time series modeling, you will say 82F.

Switch examples. Three days back, the price of the Briggs Empire stock closed at $52 (there is only one share). Query: what was the cost of the stock at the close of the day?

Signal+noise folks might say $42.50, whereas signal people will say $52.

Another example. I was sitting at the radio AM DXing, pulling in a station from Claxton, Georgia, WCLA 1470 AM. The announcer came on and through the heavy static I thought I heard him give the final digit of a phone number as “scquatch”, or perhaps it was “hixsith”.

Here are two questions: (1) What number did I hear? (2) What number did the announcer say?

The signal+noise folks will hear question (1) but give the answer to (2) (they will answer (2) twice), whereas the signal folks will answer (1) with “scquatch or hixsith”, and answer (2) by saying, “Hey signal+noise guys, a little hand here?”

We have three different “time series”: temperature, stock price, radio audio. It should be obvious that everybody experiences the “numbers” or “values” of each of these series as they happen. If it is 82F outside, you feel the 82F and not another number (and don’t give me grief about fictional “heat indexes”); if the price is $52, that is what you will pay; if you hear “scquatch”, that is what you hear. You do not experience some other value to which ignorable noise has been added.

For any time series (and “any” include our three), some thing or things caused each value. A whole host of physical states caused the 82 degrees; the mental and monetary states of a host of individuals caused the $52; a man’s voice plus antenna plus myriad other physical states (ionization of certain layers of the atmosphere, etc.) caused “scquatch” to emerge from the radio’s speakers.

In each case, if we knew—really knew—what these causes were, we would not only know the values, which we already knew because we experienced them, but we could predict with certainty what the coming values would be. Yet this list of causes will really only be available in artificial circumstances, such as simulations.

Of the three examples, there was only one in which there was a true signal hidden by “noise”, where noise is defined as that which is not signal. Temperature and stock price were pure signal. But all three are routinely treated in time series analysis as if they were composed of signal+noise. This mistake is caused by the Deadly Sin of Reification.

No model of any kind is needed for temperature and stock price; yet models are often introduced. You will see, indeed it is vanishingly rare not to see, a graph of temperature or price over-plotted with a model, perhaps a running-mean or some other kind of smoother, like a regression line. Funny thing about these graphs, the values will be fuzzed out or printed in light ink, while the model appears as bold, bright, and thick. The implication is always that the model is reality and values a corrupted form of reality. Whereas the opposite is true.

The radio audio needs a model to guess what the underlying reality was given the observed value. We do not pretend in these models to have identified the causes of the reality (of the values), only that the model is conditionally useful putting probabilities on possible real values. These models are seen as correlational, and nobody is confused. (Actual models, depending on the level of sophistication, may have causal components, but since the number of causes will be great in most applications, these models are still mostly correlational.)

We agreed there will be many causes of temperature and stock price values. One of the causes of temperature is not season—how could the words “autumn” cause a temperature?—though we may condition on season (or date) to help us quantify our uncertainty in values. Season is not a cause, because we know there are causes of season, and that putting “season” (or date) into a model is only a crude proxy for knowledge of these causes.

Given an interest in season, we might display a model which characterizes the average (or some other measure) of uncertainty we might have in temperature values by season (or date), and from this various things might be learned. We could certainly use such a model to predict temperature. We could even say that our 82F was a value so many degrees higher or lower than some seasonal measure. But that will not make the 82F less real.

That 82F was not some “real” seasonal value corrupted by “noise”. It cannot be because season is not a cause: amount of solar insolation, atmospheric moisture content, entrainment of surrounding air, and on and on are causes, but not season.

Meteorologists do attempt a run at causes in their dynamic models, measuring some causes directly and others by proxy and still others by gross parameterization, but these dynamical models do not make the mistake of speaking of signal+noise. They will say the temperature was 82F because of this-and-such. But this will never be because some pure signal was overridden by polluting noise.

The gist is this. We do not need statistical models to tell us what happened, to tell us what values were experienced, because we already know these. Statistical models are almost always nothing but gross parameterization and are thus only useful in making predictions, thus they should only be used to guess the unknown. We certainly do not need them to tell us what happened, and this includes saying whether a “trend” was observed. We need only define “trend” and then just look.

Why carp about this? Because the signal+noise view brings in the Deadly Sin of Reification (especially in stock prices, where everybody is an after-the-fact expert), and that sin leads to the worse sin of over-certainty. And we all know where that leads.


“But, Briggs. What if we measured temperature with error?”

Great question. Then we are in the radio audio case, where we want to guess what the real values were given our observation. There will be uncertainty in these guesses, some plus-or-minus to every supposed value. This uncertainty must always be carried “downstream” in all analyses of the values, though it very often isn’t. Guessing temperatures by proxy is a good example.

I have more on this topic in Uncertainty: The Soul of Modeling, Probability & Statistics.

September 1, 2017 | 5 Comments

Taleb’s Curious Views On Probability — Part III: Ergodicity & All That

Read Part I, Part II

Ergodic in probability has a technical definition. Without going into mathematical details (which are fine except possibly when applied), a “sequence” is defined as a run of measurements of some observable. A sub-sequence is a portion of the sequence.

Here is where belief that probability is ontic causes trouble. First, no real sequence is of infinite length, thus no sub-sequence can be infinite. The observations are measurements, as said, of real things, say, stock prices. The measurements do not possess any properties beyond those in the things themselves, i.e. prices of stocks. The measurements do not have a mean in the sense of a parameter from a probability model; of course, arithmetic averages can be calculated from any observed sequence. But the measurements do not possess any parameter from any probability distribution that may be used to represent uncertainty in them. The measurements do not possess probability. This we learned in Part I.

With me?

Ergodic, or ergodicity, is the property that any sub-sequence in the measurements possess the same probability characteristics of the entire sequence, or other sub-sequences. Since none of any real sequence possess any probability characteristics in any ontic sense, the term is of no use in reality, however useful it might be in imagining infinite sequences of mathematical objects.

We might find some use for ergodicity, rescue it as it were, in the following way. A set of assumptions M, i.e. a model, is used to make predictions of a sequence up to some point t. After t, we might amend these assumptions, to say Mt, and make new predictions. Why this change at t? Only because there is some new assumption (or observation etc.) which impinged upon your mind.

Example: Use M for stock price y; at time t, the stock splits, and so M is amended to Mt to incorporate knowledge of the split. If M ever changes (because your assumptions, premises, etc. do), however often, through time, in practice we do not have ergodicity. In this sense, ergodicity is just like probability in being purely epistemic. But since we know we changed M, we don’t need to label that change “ergodic activity at time t”.

Make sense?

Of course, since real sequences do not possess, in the ontic sense, ergodicity, there is no point in going and looking for it. You cannot find what doesn’t exist. For real sequences, you are always welcome to change your assumptions at any time. In this sense, it is you that creates practical ergodicity when you change M, which is how you know it’s there.

How do you know to change M? How indeed! That is ever the problem. There is no universal solution, save discovering the causes or y (which for stock prices isn’t going to happen).

Back to Taleb. His use of the term appears to assume the mathematical definition, which says probability exists; e.g. he says things like “detect when ergodicity is violated”. This is not only Taleb, of course, but most users of probability models. The error is common. It is why Taleb’s examples about ergodicity aren’t quite coherent. But it’s not his fault.

Switch to our last topic, repetition of exposure. This allows Taleb to run back to the precautionary principle he loves so well.

If one claimed that there is “statistical evidence that the plane is safe”, with a 98% confidence level (statistics are meaningless without such confidence), and acted on it, practically no experienced pilot would be alive today. In my war with the Monsanto machine, the advocates of genetically modified organisms (transgenics) kept countering me with benefit analyses (which were often bogus and doctored up), not tail risk analyses for repeated exposures.

Only frequentist statistics need confidence (and all readers of Uncertainty know the frequentist theory fails on multiple fronts, and is useful nowhere). Predictive probability does not.

It is true, and obvious, that if there is a risk in an act, repeating the act increases the overall risk.

What risk is there in, say, eating a GMO BLT? I have no idea, and neither does Taleb. There are well known benefits, though, as there always are when bacon is involved. If I knew of any risk, it may be that the cumulative benefits outweigh those cumulative risks. But I know of no risks save that “GMOs might hurt me”.

That statement is actually a tautology: it is equivalent to “GMOs might hurt me and they might not hurt me.” It therefore as the assumption to a model of S = “GMOs will hurt me” of no use. Tautologies never add information; they are like multiplying by 1. S does not have a probability without assumptions.

I might, as Taleb likes to do in the precautionary principle (review!), use different assumptions, say, “Monsanto’s lawyers are jerks and their GMOs cause, when the circumstances are in place, small amounts of damage when eaten.” With that, we can form a medium to high probability that S is true, especially upon repeated exposure (it would certainty and not only high probability except for that “circumstances” condition).

Now Monsanto’s lawyers are jerks. Suing because Monsanto’s DNA wanders via natural pollination into some poor innocent farmer’s field is evil and shouldn’t be allowed. But from these truths it does not follow Monsanto’s GMOs cause harm. You need more than just suspicions that they might cause harm, because “might” is a tautology.

It’s enough for Taleb, because he wants you to consider not only the harm that GMOs (or global warming) will cause you, but will cause all of humanity plus its pet parakeets. Yet he offers (as far as I can see) nothing more than the tautology as evidence for S, and however many times you multiply a tautology, it is still a tautology in the end. A thousand “might harms” is still one “might or might not harm”.

If you are determined to prove GMOs cause harm, you need to demonstrate how. And then you still haven’t demonstrated that the benefits of them outweigh these harms. There will be no one-size-fits-all decision there.

August 29, 2017 | 4 Comments

Taleb’s Curious Views On Probability — Part II: Skin in the Game

Skin in the game

Read Part I

It is in one sense fortunate that the mathematical, or rather quantitative, roots of probability began with gambling. Routine gambles are easy to understand, and the calculations not only easy, but as models have great applicability to actual events. All know the story of how quantitative probability flourished, and flourishes, from these beginnings.

On the other hand, it has been difficult for probability to remember that its more robust, fuller, and certainly more supportive roots which are non-quantitative. That gambles were easily quantifiable and made skillful models produced the false idea that all probability is, or should be, quantitative. And this led to the main error, discussed last time, that probability exists. It also produced a second error, which I won’t examine here (but have at length in Uncertainty), that probability is subjective.

Given the rules of craps—our premises—we can deduce the probability of winning and losing. We can also apply this model to real dice. And the same is true for card games, slot machines, and so on. These models have been found to work well. But even casinos change out worn dice and bent cards knowing the models are no longer as applicable.

These models work well for single gamblers (with assumed fortunes), but they cannot be applied to groups of gamblers, because how much and how long people, plus how many people, gamble cannot be captured by the simple premises. Here I agree with Taleb when he says about groups of gamblers, “Some may lose, some may win, and we can infer at the end of the day what the [casino’s] ‘edge’ is, that is, calculate the returns simply by counting the money left with the people who return.” This observational data is used to infer premises for a model beyond the premises available per game (which are easy).

Taleb continues: “We can thus figure out if the casino is properly pricing the odds.” The odds for each single game are deduced, so that means, at first glance, that the overall odds are also correct. But sometimes it pays for casinos to change single-game odds. If few wins at some slot machine, few will use it (after word spreads); likewise, if one pays off well, more will use it. Observed behavior can help slide the single-game deduced odds to entice more gambles. Since behavior is volatile, so will be these models.

I also—everybody also—agrees with Taleb that when a gambler goes but he must stop playing. For some reason he calls going bust an “uncle point” (crying uncle?). Everybody also knows that because a certain gambler reaches an “uncle point”, that other gamblers might still have money. This seems to be something of a revelation to Taleb, though, who calls the models applied to groups of gamblers “ensemble probability” models, and those applied to single gamblers (with known or assumed fortunes) “time probability” models.

Taleb then argues, what isn’t a secret, that sometimes people use the wrong model. They’ll use a single-gambler model for a market (group), and a group model for a single-gambler. I don’t think this often happens, however, not with stocks, anyway, with so much money involved.

He says, “I effectively organized all my life around the point that sequence matters and the presence of ruin does not allow cost-benefit analyses; but it never hit me that the flaw in decision theory was so deep.”

Well, of course, the presence of ruin, i.e. if one is ruined, the cost-benefit is not flawed, it is as easy as can be. That that possibility of ruin exists does not conceal a flaw in decision theory, either.

I agree that decision theory has many flaws, but I see them differently. Many formal quantitative methods allow for impossible values (infinities or other large numbers), or they assume probabilities are real or they conflate probability and decision. Probability is not decision.

Taleb is concerned with “tails”, which is to say, large values. Now actual observed large values may or may not be well modeled; often they are not, and then Taleb’s criticism is spot on. For instance, normal distributions are as overused as the word “like” is in ordinary conversation. Other times there are possibilities in decision analysis for “tail” values that can’t be seen, and that’s a flaw with either the probability model or decision criterion (or both).

Somehow Taleb believes people, unless they possess genius, cannot figure probability if they do not have “skin in the game”, his favorite marketing phrase. This is false, as is obvious. People who do not give a rat’s rear about an outcome are less likely to attend to the problem as closely as those who do care, which is clear enough. But having money on the line does not bring the psychic gift of probability awareness. Indeed, gamblers with much “skin in the game” are apt to be the worst estimators.

That’s enough for Part II. I’ll wrap it up in Part III, Ergodicity and all that.

August 28, 2017 | 9 Comments

Taleb’s Curious Views On Probability — Part I: Probability Does Not Exist

Ye Olde Statistician points us to an essay (a book chapter?) by our old pal Nassim Nicholas Taleb called “The Logic of Risk Taking“. Let’s examine it.

You, dear reader, do not have a probability of being flattened while crossing the street. Nobody does. Nobody has any probability of anything. Nothing has a probability of anything.

The reason is this (quoting de Finetti in word and typeface): PROBABILITY DOES NOT EXIST.

You cannot have in abundance or in fraction that which does not exist. Yet Taleb says, “the risk of being killed as a pedestrian is one per 47,000 years.” Ignoring the number, but the proposition itself will not sound wrong to most. It is wrong. Since probability does not exist, there is no blanket risk of you being killed as a pedestrian.

Probability, absolutely all of it all of the time, is conditional. You walk to a corner and desire to cross. At this point you must form premises on which to act. You might say, “I might get hit”, which adds nothing to your ability to form a probability of “I will get hit”. (This, and everything else, is proved in Uncertainty.)

You might instead think, “There are no cars coming anywhere”, and form a very low probability of “I will get hit”. Or you might say, “If I hurry, I can make it.” A higher probability.

Now suppose you are an actuarial (a statistician with less personality, as the joke goes) and want to guess how many pedestrians will go to their reward next year for having the audacity to cross the street. No easy job, that. Are you limiting this to the once United States? Everywhere? You still need premises to form probabilities of propositions like “There will be X killed”. Which premises?

Well, you might take the number flattened last year and use that as a base for some ad hoc model, which may or may not be useful in making predictions. You could form premises state-by-state, and then feed these into an ad hoc model. Or county-by-county. Or city-by-city. Or individual-by-individual.

Have the idea?

Change the premises, i.e. assumptions data and the like, and you change the probability. I don’t know what premises Taleb used to arrive at “one per 47,000 years”, but they must exist somewhere, at least in his imagination.

That probability depends on assumptions is the very point made in the last two articles discussing Taleb and the precautionary principle (here and here). Other words for assumptions and premises are model and theory.

Now suppose you meet the actuarial on his lunch hour and he tells you of his recent calculation, a model with various assumptions that led him to state “You’ll be dead street meat at the rate of one per 47,000 years”. This might form your new premise, from which you deduce (circularly) you have that chance of being killed.

When you get to the intersection, and you insist on using the actuarial’s number (he being an expert), it means ignoring all that is before you except that it is an intersection which you will cross. So if you live on a New York City avenue, it means ignoring that Access-a-Ride bus rocketing your direction towards the red light at which, by law, the texting wild-eyed driver must stop.

If you believe probability exists, and you believe an expert has discovered the probability for your particular situation, and Taleb is an expert, then ignoring circumstance is the rational thing to do; it is the only thing to do and stay consist with your belief probability exists.

Or you could chuck the idea that probability exists into the trash heap and hope the Access-a-Ride bus meets that perpendicular oncoming City bus and duels it out with him.

We need this demonstration probability does not exist as a baseline to discuss the remainder of Taleb’s article. The 47,000-year figure, for instance, comes from this:

About every time I discuss the precautionary principle, some overeducated pundit suggests that “we cross the street by taking risks”, so why worry so much about the system? This sophistry usually causes a bit of anger on my part. Aside from the fact that the risk of being killed as a pedestrian is one per 47,000 years, the point is that my death is never the worst case scenario unless it correlates to that of others.

I gather his over-educated pundit meant “We take risks by crossing the street”, which is true—but only on the premise that all actions possess a risk, that all risk is contingent.

I do not know if Taleb believes probability exists; he at times appears to imply it, at other times perhaps not. I’m not familiar enough with his writings to know if he has made a direct statement on the matter. So that if you love Taleb, there’s no reason to become upset with me.

More to come…