William M. Briggs

Statistician to the Stars!

Page 3 of 571

Masters Statistics & (Some) Predictions

Here’s a view of the winning scores from par since the inception of the tournament.

Difference from par and year.

Difference from par and year.

The early years saw little variability, but since the 60s there were a lot more very high or low scores; variability increased. Tiger Woods had his biggest year in 1997 with 18 under, but poor Zach Johnson in a small typhoon in 2007 cleared with field with 1 over.

The best projection for this year is a score 10 under, with a 90% chance it will be between 4 and 15 under.

Youth does not have a significant, or at least overwhelming, advantage. Some 72% of the winners were 30 or older, and 14% 40 or older. The oldest was, as everybody knows, Jack Nicklaus who took home the Green Jacket at 46 in 1986. The youngest was Tiger Woods, just 22 in 1997. There isn’t an clear signal that suggests older or younger players are coming out ahead.

Age of winner by year.

Age of winner by year.

Don’t forget that many of the “mini-trends” visible are from golfers winning more than one title, and necessarily aging in between victories.

Large margins of victory are rare. Tiger Woods had the biggest, a 12-shot lead in 1997, followed by Jack Nicklaus with a 9-shot gap in 1965, with Raymond Floyd in third place with an 8-shot margin in 1976. Ties are common: nearly 21% of time there is a sudden-death playoff. A 1-shot lead is the most usual outcome, happening 28% of the time, followed by a 2-shot lead at 23% of the time, and 3-shot victory at about 12%. Margins of victory 4 or more shots about 16% of the time.

Margin of victory by year.

Margin of victory by year.

The trend, if any, seems to be for closer margins of victory with the occasional break out.

Here’s another indication age doesn’t play that much of a role. There are no clear signals in age and the difference from par or the margin of victory (some jittering has been added to this plot to separate close points). Of course, age does play some role. There aren’t any 10-year-olds nor 60-year-olds making the cuts. Once a player gets past the cut, his age is not of much predictive value—however much it may mean to the player’s aching bones!

Good news for "old" men.

Good news for “old” men.

Players from these once Unite States took home about 3 out of every 4 Green Jackets, winning 74% of the time. Perhaps somewhat surprisingly, the next most winningest country is South African with just over 6% of the victories, followed closely by Spain, with about 5%. The Brits took almost 4%, the Germans just under 3%. Only 7 other countries took anything (there were 11 winning countries in all).

Most players have only won once: 65% of the tournaments were by a man who never repeated. About 19% of the time saw a golfer winning twice, around 10% were three-peaters, two men (Arnold Palmer and Tiger Woods), or 4%, won 4 times, and only one time did anybody win 6 (Jack Nicklaus, of course).

Who will win this year’s tourney? I have no idea. But to make a guess, I like this Jordan Spieth fellow, though he’s awfully young. (This is posting on Friday, but was written right before the tournament started.)


P-Value Hacking Is Finally Being Noticed

Fig. 2 from the paper.

Fig. 2 from the paper.

Since I’m on the road, all typos today are free of charge.

Some reasonably good news to report. A peer-reviewed paper: “The fickle P value generates irreproducible results” by Lewis Halsey and three others in Nature: Methods. They begin with a warning well known to regular readers:

The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample-to-sample variability in the P value…

[Jumping down quite a bit here.]

Many scientists who are not statisticians do not realize that the power of a test is equally relevant when considering statistically significant results, that is, when the null hypothesis appears to be untenable. This is because the statistical power of the test dramatically affects our capacity to interpret the P value and thus the test result. It may surprise many scientists to discover that interpreting a study result from its P value alone is spurious in all but the most highly powered designs. The reason for this is that unless statistical power is very high, the P value exhibits wide sample-to-sample variability and thus does not reliably indicate the strength of evidence against the null hypothesis.

It do, it do. A short way of saying this is small samples mislead. Small samples in the kind of studies interested in by most scientists, of course. Small is relative.

But, as I’ve said many, many, __________ (fill in that blank) times, p-values are used as ritual. If the p-value is less than the magic number, SIGNIFICANCE is achieved. What a triumph of marketing it was to have chosen that word!

Why is this? As any statistician will tell you, the simplest explanation is usually the best. That’s true here. Why are people devoted to p-values? It isn’t because they understand them. Experience has taught me hardly anybody remembers their definition and limitations, even if they routinely use them—even if they teach their use to others!

Most people are lazy, and scientists are people. If work, especially mental toil, can be avoided, it will be avoided. Not by all, mind, but by most. P-values-as-ritual does the thinking for researchers. They remove labor. “Submit your data to SPSS” (I hear a phrases like a lot from sociologists). If wee p-values are spit out, success is announced.

Back to the paper:

Indeed most scientists employ the P value as if it were an absolute index of the truth. A low P value is automatically taken as substantial evidence that the data support a real phenomenon. In turn, researchers then assume that a repeat experiment would probably also return a low P value and support the original finding’s validity. Thus, many studies reporting a low P value are never challenged or replicated. These single studies stand alone and are taken to be true. In fact, another similar study with new, different, random observations from the populations would result in different samples and thus could well return a P value that is substantially different, possibly providing much less apparent evidence for the reported finding.

All true.

Replacement? The authors suggest effect size with its plus-or-minus attached. Effect size? That’s the estimate of the parameter inside some model, a number of no (direct) interest. Shifting from p-values to effect sizes won’t help much because effect sizes, since they’re statements of parameters and not observables, exaggerate, too.

The solution is actually simple. Do what physicists do (or used to do). Fit models and use them to make predictions. The predictions come true, the models are considered good. They don’t, the models are bad and abandoned or modified.

Problem with that—it’s called predictive statistics—is that it’s not only hard work, it’s expensive and time consuming. Takes plenty of grunting to come to a reasonable model—and then you have to wait until verifying data comes in! Why, it’s like doing the experiment multiple times. Did somebody mention replication?

P-value hacking, you asked? According to this study:

P-hacking happens when researchers either consciously or unconsciously analyse their data multiple times or in multiple ways until they get a desired result. If p-hacking is common, the exaggerated results could lead to misleading conclusions, even when evidence comes from multiple studies.

Funniest quote comes from one Dr Head (yes): “Many researchers are not aware that certain methods could make some results seem more important than they are. They are just genuinely excited about finding something new and interesting”.

Amen, Head, amen.

Binue Plus! The answer to all will be in my forthcoming book. Updates on this soon.


Thanks to reader Al Perrella for alerting us of this topic.




I’m on the road—and in a chair.


What Evidence Would Persuade You That Man-Made Climate Change Is Real?

How close are the models to reality?

How close are the models to reality?

Somebody named Ronald Bailey (he isn’t anybody I know) at the inaptly named Reason asked the good question which heads this post.

Second thought: the question is awful. No scientist I know disputes “Man-Made Climate Change Is Real”. None. What scientists like myself do dispute is that man-caused changes to climate are well understood and predictable. And I have proof.

But the idea behind Reason’s query is still good, even if Bailey himself doesn’t understand much about his subject.

Everybody out of love with any scientific theory should be prepared to say what it would take for amour to flourish. Just as everybody infatuated with a hypothesis should be able to state what would dim their ardor.

This does not only go for skeptics of global-warming-of-doom, but also for proponents. Tell us, if you dare, what would convince you that you’re wrong. I’m answering the question below, but I also insist you answer it, too. If your answer is of the form “Shut up”, “I don’t have to answer”, or “Your answer, Briggs, wasn’t to my liking, and here’s why”, as it is expected to be for many progressives, congratulations. You’re a True Believer. (Appell, I’ll even let you answer.)

I have already changed my mind about global warming. I was initially a believer that bad times were on their way. Why? Well, I was young, fresh to the field. I knew how smart my betters were; I knew how wonderfully complex their models could be; I saw the increasing success in weather forecasting and the improvements in short-term (out to a year or so) climate predictions.

The temperatures, back then, were on their way up, too, in accord with what some climatologists were predicting. I never made the mistake, like Bailey, to count the same piece of evidence more than once. Rising temperatures were consist with the theory that increasing CO2 caused increasing temperatures. But a melting glacier is a consequence of that heating, it is not additional proof of the theory. What an elementary mistake to think it was! Likewise, nothing that was a consequence of increased temperatures counts as additional evidence of why the increased happened.

That I saw people making these mistakes, in a big, enthusiastic way, was what started my path back to Truth. How many papers announced “This evil will befall us once the temperature increases past the point of no return”? Thousands; more; they continue in a steady stream. And all of them were taken as evidence that the CO2-theory was right.

That being impossible, and stupid, I began seriously looking into the problem.

That’s when I noticed climate model forecasts had no skill. Before, I merely took it for granted they had. The predictions models made were not as good as saying “next year will look like last year”, i.e. persistence. The models were poor globally, and even worse locally. The temperatures, for some two decades now, are not going in the direction the models promised.

This can only mean that the models were (are) broken. Why? Well, the theory which underlies them must be busted. Where? Who knows? It could be many things, or just one big thing. It’s not my job to find out, either. Though I and some pals of mine have some guesses.

Your car doesn’t start. You can then authoritatively state, “My car is busted.” It would be asinine and unscientific to say, “Even though my car doesn’t start, it really does work and really is taking me places.” Yet that is what supporters of the current models are saying. The models don’t work but proponents still claim they’re still taking us to the future. This is a form of politically correct lunacy.

But therein lies my answer to the question. I would change my mind and believe the models had a good, and not a dismal, handle on reality if they were to start making good predictions. About the future.

I had to add that, what seemed unnecessary, “About the future” because of the unfortunate habit of some modelers to claim their models make good “forecasts”—-of the past. Yes, they do this. It’s called “hindcasting” or “backcasting”. It’s a way of testing model fit with observed data. It can be useful to discover wild or egregious flaws in models, but no matter how well a model hindcasts, it’s no guarantee it will make good future-casts.

Future-casts, i.e. predictions, are the only test. There is none other. And models have so far failed that test.

But if they were to pass that test, and pass it consistently, then I’d have to believe the models were on to something, and that the theories which drive the models are likely true.

« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑