Twenty Tips For Interpreting Scientific Claims

It's science!

It’s science!

Title of today’s post is taken from article of the same name in Nature by William Sutherland, David Spiegelhalter, Mark Burgman. Several readers asked me to comment.

I’ll assume you’ve read the original. I kept the same order and wording as their points, and try not to repeat any of their good points.

Differences and chance cause variation. Chance can’t and doesn’t cause anything. Chance isn’t a thing, therefore it can never be a cause. Differences don’t cause things per se: things do (sizes of differences can certainly change rates of change). We cannot always identify causal agents, just correlates of change.

No measurement is exact. Well, not quite, but I take their point. Measurement error is vastly more prevalent than acknowledged and almost never accounted for. Leading to…can you guess? Over-certainty.

Bias is rife. Amen and amen. But, just like admonishing the public by reminding them they look ugly in jeans, they always think it’s the other guy and not them. Yes, you, even you, are biased. Even you. And you. Even if you’re part of a team that won prizes.

Bigger is usually better for sample size. Indeed, except for cost and the possibility of being overwhelmed or misled by errata, bigger is always better.

Correlation does not imply causation. But the opposite is true: causation causes correlation. People often forget the distinction between ontology and epistemology. This is also my fault for not making this distinction clearer more often. Most probability models are epistemological, meaning they say what are the changes in probability of some outcome given changes in input variables. The problem comes when people interpret the changing probabilities of the outcome as being caused by the input variables, which is usually not true.

Regression to the mean can mislead. See this on the so-called Sports Illustrated curse.

Extrapolating beyond the data is risky. The reason is probability models are usually not causative and even when they are few check them for accuracy (everybody checks them for fit, via p-values, posteriors and the like).

Beware the base-rate fallacy. Think of it this way. If you’re forecasting “No rain” for Tucson each day, you’re likely to be right most of the time. But your boast carries little importance. Try the same forecast for Norman, Oklahoma and you’re accuracy heads south. This is why we should speak of skill—the improvement over naive predictions—instead of accuracy rates. Right, climatologists?

Controls are important. The only thing wrong with this point is that important should read everything. The more you can control, the closer your model comes to causality. Problem is that controlling “everything” in human interactions or in anything contingent is impossible. It will always—as in always—be possible that something other than what we thought caused the outcome.

Randomization avoids bias. No. Randomization is not a property. It “gives” nothing to your results. “Randomization” belongs to the old days of magical thinking. Rather, assigning control of an experiment to persons without a financial or emotional interest reduces but cannot avoid bias. That residual bias exists is why there are always calls for replication.

Seek replication, not pseudoreplication. And speaking of replication… Listen up sociologists, psychologists, and so on: It is not a replication unless the experiment is repeated in exactly the same way where they only differences are those things you could not control in the first experiment. “More or less” the same way is not exactly the same way and is therefore not a replication. A mass of published literature on the same subject is only a weak indicator of truth. Who remembers frontal lobotomies, etc., etc., etc.?

Scientists are human. And because they are typically in positions commanding money and people, they fall prey more often to the standard sins.

Significance is significant. No, it is not, or at least not necessarily. “Significance” means attaining a wee p-value, one less than the magic number. And this result may not have and usually does not have practical bearing on questions of interest about the thing at hand. Finding a wee p-value is child’s play. Finding something useful to say is far harder.

Separate no effect from non-significance. Here I must quote: “The lack of a statistically significant result (say a P-value > 0.05) does not mean that there was no underlying effect: it means that no effect was detected.” This is only partially true. Lack of a wee p-value might mean the effect was there but undetected. On the other hand, the effect might be there and detectable, too. It’s just the p-values are terrible at discovering which situation we’re in. An effect without a wee p-value may still be important. If instead we looked at probability models as they should be looked at, as predictive statements, we could say more.

Effect size matters. Wee p-values alone mean nothing. Repeat that until you get sick of repeating it. This is another call for predictive analytics.

Study relevance limits generalizations. It’s funny how many reporters never read the papers they report on.

Feelings influence risk perception. And this is because feelings are part of what we risk! Money, after all, is only a crude device to measure our feelings. And just because you hate fat people eating transfats does not mean the risk of disease from eating transfats is high. And just because you hate smoking does not mean that “second-hand” smoke is perilously dangerous, etc., etc.

Dependencies change the risks. Try not to look at anything in isolation, unless the thing is amenable to isolation. Dice come to mind. The changes that await us when global warming finally strikes (soon, soon) do not.

Data can be dredged or cherry picked. “Big data” anyone? One thing Big Data guarantees is shocked looks on the face of managers who were certain sure they picked up a “significant” signal in their gleaming, massive datasets.

Extreme measurements may mislead. Like I always say, any survey or result is true conditional on the set of premises belonging to the experiment. Vary any of these premises, the result no longer holds. The more premises, i.e. conditions, there are, the greater the chance the results are not meaningful beyond the realm of this single experiment. Journalists often change this premises in their reporting; but to be fair, so do many scientists when summarizing their work. Memorize this.


Twenty Tips For Interpreting Scientific Claims — 35 Comments

  1. I’ll stipulate that what you’ve written is accurate (not necessarily “true”). Then I repeat my request that I’ve made repeatedly, both in comments and email. While my question is regarding the general concept (that is, I’m not specifically concerned with the example case I’ll mention).

    Suppose I am interested in the question “does second hand smoke negatively impact the health of those exposed to it?” and I’m a confirmed Briggsian. What do I do to determine an answer and how do I assess the confidence I should have in that answer? Or, do I throw up my hands and say “there are some things that can never be known through any type of statistical analysis and reasoning and this is one of them?”

  2. Briggs, I want to thank you for the many informative posts you have put up over the years I have been lurking here. Quite an education you have given me. I found your site in 2009 i believe, from a link at Steve M’s website. There was a discussion about a certain Postdam Institute Doctors 2007 paper that had rediscovered the triangle filter and involved smoothing of either 11 or 15 or perhaps 29 years. There was a link to a posting you had done (recently revisited) about not performing analysis on smoothed data.

    Again, thank you for the service you perform. While reading your interesting posts the statistics are assimilated almost without effort on my part.

  3. Suppose I am interested in the question “does second hand smoke negatively impact the health of those exposed to it?

    Not to downrate the usefulness of statistics, but if the effect is so small you need statistics to get the answer then the answer is probably not or too small to worry about. Would you feel the need to run a statistical study to determine if intercepting bullets with the head is a bad idea?

  4. “Suppose I am interested in the question “does second hand smoke negatively impact the health of those exposed to it?””

    I am with DAV on this one. To add to his catching bullets example, how many times would you need to pound your head against a brick wall before you can decide that it’s a bad idea?

  5. Something odd in this sentence? ==> “Journalists often change this premises in their reporting; but to be fair, so do many scientists when summarizing their work.”

  6. RE: “Correlation does not imply causation. But the opposite is true: causation causes correlation.”

    WHAT’s interesting are those situations when the effects of causes are not properly (or at all) recognized; this is commonly due to an intermediate effect acting as a subsequent “cause” that directly/more directly induces the effect that is measured.

    This often leads to “obvious” system behaviors being believed to be understood…until…some unrecognized intermediate effect alters the thus far unrecognized intermediate effect causing very different outcomes to result in response to the same measured/observed ‘causal’ input.

  7. Sheesh. Ok Dav and MattS, I’d said it was an example, not a study I wanted to undertake. To address both of your examples (bullets, headbanging), the negative effects are immediate. For a variety of things (not to say I buy that there is an effect for all things such as second hand smoke, transfats, power lines, etc.) the effect comes after years of exposure. Can’t happen you say? I counter with mesothelioma. Yes, the ads on TV for lawyers looking to line their pockets are reprehensible examples of egregious profiteering and opportunistic exploitation but, nevertheless, I don’t think you’d argue that asbestos exposure is benign. And yet there are no immediate effects.

    Your points fundamentally imply that “if it’s quite obvious, you don’t need statistics. If it’s not obvious, statistics won’t help.” That’s the feeling I’ve gotten from many of Dr. Briggs’ posts.

    But to go back, suppose there WAS something we didn’t already know from the standpoint of “it’s obvious, everybody knows that” where statistical reasoning and analysis could shed light. How would Dr. Briggs proceed?

  8. Pingback: Twenty Tips For Interpreting Scientific Claims ...

  9. Rob Ryan,

    Everything is potentially harmful or hazardous. To some it’s something as innocent looking as a jar of peanut butter. People have died falling off of sidewalk curbs. Water is a poison if you drink too much of it. In fact, too much of anything is a poison. It doesn’t take much to find something wrong with anything. And that’s the problem.

    Too much ado is made over ridiculously small risks. Recently, a school near me was evacuated and a hazmat team called in(a bus load, even) because someone broke a mercury thermometer. I’ve been exposed to far more than what can be found in the typical thermometer and dentists used to regularly stick it in mouths but when was the last case of mad-hatter’s syndrome that you know of?

    Is asbestos harmful? Maybe if you ingest or breath it — for years because you work with it — but a chunk of it sitting on a shelf isn’t particularly harmful. Same with lead.

    If it isn’t an obvious problem why does it need a solution?

  10. OK, so let’s say (hypothetically) that I agree with you. If that’s the case, then of what use is statistics and how is it used? This blog and the comments are relentless in criticism of alleged misuse of statistics, EXTREMELY light on what it’s properly used for and how to use it properly.

  11. Rob Ryan,

    Well, unfortunately statistics is easily abused. Apparently even Mark Twain noticed this 100+ years ago. However, one good use is when looking for a solution it could indicate heading in the right direction. It can help in forming a model. I’ve used it to form world models for robots. It can help in decision making. To name a few.

  12. Rob,

    “EXTREMELY light on what it’s properly used for and how to use it properly.” Not so, not so, not so.

    I have written post after post after post on predictive statistics, on the philosophy of probability, and on how to think about probability properly.

    I have the idea that some folks are after a formula or formulas which would function much like classical t-tests or Bayes factors. No such formula exists. That is the central lesson I preach. Each problem has to be attacked anew. Of course there are general guidelines and ways of doing predictive probabilities (look up posterior predictive distributions), but each situation has to be analyzed independently. Plus, as I’ve also pointed out many times, the software isn’t there, or isn’t in a form for public use yet. Much expertise is required to start. And then we’re still left with all that non-quantifiable business I keep going on about…

    This reminds me (again) I have to revamp my Classic Posts page to ensure all the right links are in the right place. I’m about a year behind.

  13. Rob,

    “Can’t happen you say? I counter with mesothelioma.”

    It’s completely possible to get mesothelioma with zero asbestos exposure and some people with very high exposures won’t get it.

    Even an order of magnitude increase in a tiny risk is still a tiny risk.

    When you start talking about lifetime risks below 1% anything less than an order of magnitude increase in the risk isn’t worth spending money on fixing.

  14. Agreed. You have done so. But dice, etc. are not what I’m getting at, nor is philosophy. You’ve repeatedly gone over how probabilistic and statistical reasoning are misused, but very little on how they can be practically used. What is a practical (in a broad sense) question or problem upon which light can be shed be statistical analysis and how is that light shed?

    I’m not a simpleton looking for “here’s the formula to use, good luck.” But a real example would be helpful.

    But I’ve been consistently reading your blog and the majority of the comments for a couple of years and the general view that I’ve developed is that the overarching message is (paraphrasing Larry talking to Bud in Kill Bill Volume 2). “What are you trying to convince me of? That statistics is as useful as an a _ sh_ _ e … right here (pointing at elbow)? Well, guess what. I think you just convinced me.”

  15. This is an interesting article when considered in conjunction with yesterday’s Hayek theme. The underlying theme of the Nature link is that the state is only limited by the quality of scientific advice the politicians are privy to and that if this advice was improved only good would come of it. This is the usual fatal conceit of the central planner. There is not only the limitation (pretence) of knowledge problem but there is also the inherent corruption of political power, the kleptocratic nature of the state. The answer is not better advice but a sharp reduction in the state’s ability to meddle.

    It is interesting that on the same page there was a link to this article:

    This is as sad as it is funny.

  16. “Bias is rife” I see this in climate science frequently. The believers who come to skeptics sites will often start shouting “confirmation bias” and assuming everyone on the blog is a conservative, free market fanatic. I have repeatedly explained that bias works on all sides, including those who believe in climate change. This is a concept they just do not seem to grasp. It’ s always the “other guy” who is biased and they are pure as the driven snow.

  17. MattS, I wouldn’t be surprised if you’re correct with respect to mesothelioma, though I’ve heard differently. Sunburn and melanoma? You all have been diligent in showing where statistical reasoning (in your opinions) is not helpful. Where (again,in your opinions) IS it helpful.

    As I said in my original comment, I was merely using an example. You don’t like my examples. Give me one you like, or is all that an education in statistics (insofar as it’s presented utilizing Briggsian concepts) is good for is developing skill in understanding why all statistical evaluations are wrong?

  18. Briggs, are you not confounding the different types of sampling randomization and replication ? In field sampling we can randomly select which sample instances go into the various treatments or randomly assign treatments to sample-instances. The replicates can be the N within treatment rather than the whole experiment repeated.

    In any case, I agree that the randomness merely minimizes / allows estimation of that aspect of variance & sample selection error/bias. The replication yield an estimate of within treatment variance.

  19. Regarding your Sports Illustrated article, agreed; however there might be something else at work.

    SI is in the business of selling it’s expert analysis (i.e., predictions) of upcoming sporting contests for whatever is the price of the magazine. This expert analysis is avatared on their covers. When it turns out that their analysis (prediction) is no better than the average Joe’s, they can do one of two things:

    1) admit their lack of predictive analysis expertise, or

    2) let it be known that, yes, they really are great experts, but their expertise is so powerful that of itself it generates some strange jinx….

    This is known as turning lemons into lemonade.

  20. Rob,

    Statistics is useful in studying the behavior of large complex systems, particularly those resulting from the aggregated behavior of many independent components (population behavior). However, even in this, many overestimate what statistics can determine.

    When it comes to applying statistics to law / government regulation or to problems that cost money to solve there are plenty of problems out there that don’t require statistics to detect. When you have solved every last one of those, then we can reconsider using statistics in these areas.

  21. Rob,

    “Sunburn and melanoma?”

    Nope. While it is possible to associate risk factors with cancer statistically at the population level, even the link between smoking and lung cancer is not as strong as it is made out to be in the press or by anti-smoking activists. There are a significant number of people aged 90+ who were heavy smokers and never got lung cancer.

    There are no causal links to any cancer strong enough that doctors can look at a specific instance of cancer in a specific patient and determine what caused that specific cancer. Until the knowledge to diagnose the cause of specific instances of cancer is developed saying X causes cancer Y is at best disingenuous.

  22. MattS: Argh. I hadn’t intended to make this a discussion of whether smoking (first or second hand) causes lung cancer, asbestos exposure causes mesothelioma, or sunburn causes melanoma. Those were examples, you (all) have gone out of your way to convince me that statistics can’t help here. Great to know.

    In fact, what I wanted was a discussion of what statistical data and reasoning therefrom can tell us about anything. What I’ve gleaned from the comments is “nothing important about anything, i.e., if statistics can tell us anything, it’s so obvious we already know it.”

  23. @Rob — I like to bring up smoking/lung cancer because it is perhaps the most significant correlation ever found that wasn’t a bullet in the head. I bring it up because it is a 2×4 of reality. The correlation factor is somewhere between 22 an 40. The number of people that smoke is still 20%. Most of the people who smoke all their adult lives WILL NOT get lung cancer, by most I mean > 90%.

    There are clear cases where statistics IS useful. If you are producing cans of RC Cola, you sample the cans to ascertain the quality of the batch. You don’t test every single can destructively. You might continuously check the fill point diverting 1/1000, 1/10,000, or 1/100,000 of the fill material and analyze it to avoid having to sample the cans as frequently. If you can test 100% though for low cost, you do. 100% testing would include, scales and optical scanners. The scale weighs and rejects anything that doesn’t fit weight specs. The optical scanner can be configured to check level and color.



    When we know what we want, we can look for it.

    This is how things become fubarred. My general rule of thumb is simple. If it says study and has a wee p-value it starts out with a really damn low usefulness value. If the RR, HR, OR is 3 are worth looking into if only because they appear so infrequently. Other than the Smoking/Lung Cancer connection, the only one I have ever seen worth anything is the Oral Cancer/Oral Sex link (RR > 9). If performing oral sex on someone who has HPV, you chances of getting cancer is much greater. In Herr Briggs world and mine, this isn’t a big problem though.

    Ask the lead epidemiologist of a research institution if I am correct and there is a good chance he will tell you I am wrong. For me that old lady who has smoked a cigar every day and had 3 shots of jack to follow it and lives to 120 is much more important to evaluating the “TRUTH”, than the people who died. Figure out why she lived and you learn more than trying to find correlation for why people died. Wee P-Values are the study of death. The really useful information in my humble opinion are the non wee p values that get thrown away. The non wee p values are more likely correct about the non existence of a connection than the wee p values are about an existence.

    Sentences like that DO NOT cause the panties of many women to fly off.

  24. Rob: My intent was not to convince you that statistics cannot help there. It was to point out that if statistics can help there, it’s not in the way we use them now. Most studies start out with the supposition that these things cause cancer and go from there. It is a lot like climate change. Complex systems, with unknown variables, boiled down to a single value that tells us there is causality. That’s a recipe for failure, I think.

    Statistics may point us in a direction–give us an idea of a factors that may be important. It’s when the statistics become the end game that the problem arises. Getting a wee p-value is not the end, just the beginning. It seems like causality may exist, so we research further. Conversely, we may drop studying a factor if there seems to be no correlation or causality. It’s just that science and the government have turned statistics into “reality” when they are not.

    I definitely agree with Brad that we should be studying why people live just as much as we study why they die. One needs to study both life and death, not just death. What do centurians have in common? It’s entirely possible this question is too complex to be handled by statistical analysis of lifestyles and it’s possible that we might find patterns via statistics. Just as any other tool in science and math have limits, so do statistics. I think that science and politics have elevated said tool to a level of usefulness that is not real. (I am now picturing the TV sales pitch that it “slices, dices, peels the food and cooks it” exaggerations often seen in infomercials. There is no tool that fixes or creates everything–statistics included.)

  25. Rob,

    “In fact, what I wanted was a discussion of what statistical data and reasoning therefrom can tell us about anything. What I’ve gleaned from the comments is “nothing important about anything, i.e., if statistics can tell us anything, it’s so obvious we already know it.””

    No, your conclusion is backwards. It’s not that “if statistics can tell us anything, it’s so obvious we already know it” it’s that if you can’t figure it out without statistics it isn’t worth the effort to know it outside of academia.

  26. Pingback: Twenty Tips For Interpreting Scientific Claims ...

  27. I have one tip.

    Statistics doesn’t lie. People can only use statistics to fool you if you don’t know it. The more you know, the less you would need to rely on other people. You just might be able to enjoy many great philosophical or scientific papers published in reputable journals in the future.

    So, learn more about Statistics. Don’t try to learn Statistics from a blogger. Check out the following site – open course for free:

  28. Pingback: the Revision Division

  29. Sheri,

    I think JH is worried that if more people followed me, her occupation would be in danger. A natural reaction. But I’d keep her around for old time’s sake.

  30. Yes, even the blogger is a statistician. A good online course is more organized and rigorous in the way the contents are presented. I would not recommend any of the statistic blogs that I subscribe to. And I am not talking about online resources such as this The online learning link I recommended in my previous comment is free!

    Briggs, I have no idea what your comments to Sheri were about. What were you drinking? Did you put the cork back in your half-drunk bottle and pop it in the fridge? Left-over wine can make great risotto.

  31. JH: It’s great that you learn by a rigorous and organized course. Not everyone learns best through that method. As long as the individual has the appropriate credentials (and even if he/she does not, if they make sense and adhere to the rules of math and science), I see nothing wrong with learning from a blogger. Maybe having taken organized, rigorous courses, I am more inclined to to with this more “laid-back” way of learning. Plus, I enjoy the comments–something I never saw in an academic setting.

    Everyone learns in their own way. Your courses are fine for those who chose to go with that method. For others, blogs may be the best choice.