William M. Briggs

Demonstration of how smoothing causes inflated certainty (and egos?)

I’ve had a number of requests to show how smoothing inflates certainty, so I’ve created a couple of easy simulations that you can try in the privacy of your own home. The computer code is below, which I’ll explain later.

The idea is simple.

I am going to simulate two time series, each of 64 “years.” The two series have absolutely nothing to do with one another, they are just made up, wholly fictional numbers. Any association between these two series would be a coincidence (which we can quantify; more later).
I am then going to smooth these series using off-the-shelf smoothers. I am going to use two kinds:
1. A k-year running mean; the bigger k is, the more smoothing there is’
2. A simple low-pass filter with k coefficients; again the bigger k is, the more smoothing there is.
I am going to let k = 2 for the first simulation, k = 3 for second, and so on, until k = 12. This will show that increasing smoothing dramatically increases confidence.
I am going to repeat the entire simulation 500 times for each k (and for each smoother) and look at the results of all of them (if we did just one, it probably wouldn’t be interesting).

Neither of the smoothers I use are in any way complicated. Fancier smoothers would just make the data smoother anyway, so we’ll start with the simplest. Make sense? Then let’s go!

Here, just so you can see what is happening, are the first two series, x0 and x1, plotted together (just one simulation out of the 500). On top of each is the 12-year running mean. You can see the smoother really does smooth the bumps out of the data, right? The last panel of the plot are the two smoothed series, now called s0 and s1, next to each other. They are shorter because you have to sacrifice some years when smoothing.

The thing to notice is that the two smoothed series eerily look like they are related! The red line looks like it trails after the black one. Could the black line be some physical process that is driving the red line? No! Remember, these numbers are utterly unrelated. Any relationship we see is in our heads, or was caused by us through poor statistics methodology, and not in the data. How can we quantify this? Through this picture:

This shows boxplots of the classical p-values in a test of correlation between the two smoothed series. Notice the log-10 y-axis. A dotted line has been drawn to show the magic value of 0.05. P-values less than this wondrous number are said to be publishable, and fame and fortune await you if you can get one of these. Boxplots show the range of the data: the solid line in the middle of the box says 50% of the 500 simulations gave p-values less than this number, and 50% gave p-values higher. The upper and lower part of the box designate that 25% of the 500 simulations have p-values greater than (upper) and 25% less than (lower) this number. The outermost top line says 5% of the p-values were greater than this; while the bottommost line indicates that 5% of the p-values were less than this. Think about this before you read on. The colors of the boxplots have been chosen to please Don Cherry.

Now, since we did the test 500 times, we’d expect that we should get about 5% of the p-values less than the magic number of 0.05. That means that the bottommost line of the boxplots should be somewhere near the horizontal line. If any part of the boxplot sticks below above the dotted line, then the conclusion you make based on the p-value is too certain.

Are we too certain here? Yes! Right from the start, at the smallest lags, and hence with almost no smoothing, we are already way too sure of ourselves. By the time we reach a 10-year lag—a commonly used choice in actual data—we are finding spurious “statistically significant” results 50% of the time! The p-values are awful small, too, which many people incorrectly use as a measure of the “strength” of the significance. Well, we can leave that error for another day. The bottom line, however, is clear: smooth, and you are way too sure of yourself.

Now for the low-pass filter. We start with a data plot and then overlay the smoothed data on top. Then we show the two series (just 1 out of the 500, of course) on top of each other. They look like they could be related too, don’t they? Don’t lie. They surely do.

And to prove it, here’s the boxplots again. About the same results as for the running mean.

What can we conclude from this?

The obvious.

BORING DETAILS FOLLOW

Tars Tarkas on The Lesson Scientists Never LearnDecember 24, 2024
Merry Christmas Briggs. Thanks for all your articles.
SHAWN E MARSHALL on The Lesson Scientists Never LearnDecember 24, 2024
I don't believe Time exists as a physical entity I don't believe the speed of light is constant I cannot…
Johnno on The Lesson Scientists Never LearnDecember 24, 2024
BRIGGS, YOU FOOL! AND ALL YOU FOOLISH EXPURTS TOO!!! There is also a sixtly! SIXTLY - A novice can assess…
hudbwu on The Lesson Scientists Never LearnDecember 24, 2024
Well this was a cherry article (not sarcasm). Merry Christmas!
Johnno on How The Enlightenment Led To Contraception (And Other Forms of Anti-Natal Behavior)December 24, 2024
A better word than "insane", would be "Satanic," because we would do a disservice to the actually clinically insane, who…

William M. Briggs

The Lesson Scientists Never Learn

House Cramnibus Funds New Follow-The-Science Panics

Probability Puzzle Paradox: Which Boxes To Take?

As Woke Falls, What Rises?

Class 32: The Evils, Excesses & Errors Of Epidemiology I (All Should Read)

Who’s Better at Playing Doctor, Boys Or Girls?

How The Enlightenment Led To Contraception (And Other Forms of Anti-Natal Behavior)

Science Model Told To Say Masks Work Discovers Masks Work

In Which Feynman Makes A Mistake: Negative Probabilities (Used In AI, QM, Finance)

Class 31: How Views Diverge With New Evidence

Ridley Claims Materialists, Atheists, & Secular Humanists Don’t Preach

What Health Insurance Should Be, But Isn’t

England Voted To Have Doctors Kill Their Patients

Emergence Is Substance; Entanglement Is Substance: Science Is Healing

The Advent Of Lousy Music

Russia’s New Paradigm — Guest Post by Ianto Watt

Old Lodge Skins’ Prayer Of Thanksgiving

Facing The Inevitable — Guest Post by Uncle Mike

How To Read The Paper Claiming Semaglutide Reduces Cardiomyocyte Size and Cardiac Mass

Class 30: Hypothesis Testing Stinks I

Mark Twain On The Dictatorship Of Health

The Lesson Scientists Never Learn

House Cramnibus Funds New Follow-The-Science Panics

The limits of statistics: black swans and randomness

Much too certain: miscellaneous Sunday topics

The limits of acceptable criminal behavior to combat global warming

On male pattern baldness and global warming

Some more reasons why I should be in charge

Demonstration of how smoothing causes inflated certainty (and egos?)

Still a few days left to guess who will win Presidential race

Predict who will win the US Presidential Race

There Is No Culture Without Inequality: Stove 1

Impose Your Beliefs Before They Impose Theirs

Telekinesis: Ideas In Our Reenchantment & Rectification

The Physics Of Blessings: Ideas In Our Reenchantment & Rectification