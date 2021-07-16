The paper (more like a glorified note, really) “Models Only Say What They’re Told to Say” (by me) will appear in the Springer book Prediction and Causality in Econometrics and Related Topics shortly. I’m providing a PDF of the paper in advance. There is also a coronadoom example or two!
The paper has a fraction of math in it, but none anybody who paid attention in middle school would have trouble with. It is mostly, as with most of my professional work, a matter of philosophy.
Now on that subject, I want to clear up a misunderstanding some have about the necessarily true statement that “all models only say what they’re told to say.”
First, it is indeed necessarily true, the proof of which you can find in the paper, and I won’t bore you with here.
Second, it is neither good nor bad that “all models only say what they’re told to say.” It is just The Way Things Are. It is also no limitations on models or on our understanding of how the world works. Our understanding is models.
Nevertheless, it often serves as a pithy reminder to say “all models only say what they’re told to say” when people are being terrorized by a model, as they were during the (still lingering) coronadoom panic, and as they are in an increasing number of ways.
Models can be good or bad. The ones we’re terrorized with, and called “Denier!” for doubting, are the bad ones. Which is why it’s good to highlight that the model is only saying what its builders want it to say.
Here’s an example from the paper, adapted and modified from The Price of Panic.
Here is a press headline from a Minnesota news source on 13 May 2020: “Updated Model Predicts COVID-19 Peak In Late-July With SAHO Extended Through May; 25K Deaths Possible” [the source is listed in the paper]. This was of course produced during the coronavirus panic.
The article stated:
An updated model from the University of Minnesota and state’s health department is predicting that COVID-19 cases will peak in late-July with 25,000 deaths possible — if the stay-at-home order is extended until the end of May…In Scenario 5 [the model scenario relied up by government], the stay-at-home order is extended for all until the end of May. With that happening, the model predicts that the COVID-19 peak will happen on July 27, with the top intensive care units (ICU) demand being 4,000 and 25,000 possible deaths.
Another estimate, Scenario 4, predicts that if the stay-at-home order is extended by a month into mid-July, the peak would occur on July 13 with 3,700 as the top ICU demand and 22,000 possible deaths.
In previous briefings, Minnesota’s Governor Tim Walz had asked Minnesotans not to focus on specific numbers, but rather focus on when the peaks might occur. “Modeling was never meant to provide a number,” Governor Walz said on Wednesday. “It was meant to show trend and direction, that if you social distance you buy more time.”
This is false. It, and many similar comments made by numerous official sources during the panic, were common. They were all not only false, but misleading, too. Enforced stay-at-home social distance working was an input to all of these models. We cannot therefore point to model output and say, at least with a straight face, “See? The model says stay-at-home social social distancing works. Which is why we need to implement it.” This mistake made countless times during the crisis. This is detailed at length in [16], including a discussion of how the World Health Organization made the same mistake, based on a high-school science project, to conclude social distancing worked, see [17], when, as always, this was a premise of the model.
The models reported on here were built saying social distancing reduces death. This assumption was an integral part of the models. It was not a “discovery” of the models, it was a condition of them. The models had to say social distancing worked because they started with the premise social distancing worked.
You cannot “discover” stay-at-home social distancing worked via any model—though it may be discovered via after-the-fact observation. You had to have built in that possibility in the first place. You knew in advance that it worked because that’s what you told the model.
You cannot run the model, wait for the output, run to your Governor and say “The latest model says social distancing works.” If your Governor had any sense he would say, “Didn’t you write the model code? And didn’t the code say somewhere that social distancing worked?”
The rush to embrace these models as if they were oracular says more about the goals and desires and decision makers during the panic than it says about model making.
Incidentally, according to the CDC, as of 13 September, attributed COVID-19 deaths in Minnesota were 1,803, with the peak occurring in mid-May, [18]. These represent all attributed deaths, including those deaths where the individuals died of multiple causes. The error of the models relied upon to make decisions was therefore at least 12 times, an enormous and horrendous mistake. The restrictions is Minnesota did stay in place, but were interrupted by the Minneapolis riots, a time when social distancing was not observed. If the models had any bearing on reality, since social distancing did not obtain, the deaths should have been higher than 25,000.
You can read the rest in the paper.
Dear Briggs. Thank you and God Bless you. It is to be wished that 200 million Americans will read, understand and remember this.
Remember, propaganda is models. Models say what they are told to say.
(Just an aside, when I read the title of the post, I first thought of airhead models in tight dresses that say what they are told to say. I think it applies to these models too.)
Thank you Briggs! I always enjoy, and learn with, your writings.
So Mark Twain was almost right. There are actually FOUR kinds of lies; White lies, Damnable lies, Statistics and MODELS.
This is a silly notion. It’s a tautology. Yes, we input equations with parameters and run a program (“model”) and we analyze the output. But that doesn’t mean we know the output when we run the model. No one says “let’s assume that social isolation saves lives and make a model using that assumption to verify that isolation saves lives. Then we’ll enforce an edict to isolate based on the model.” If anyone believes that such a thing happened, there’s no hope for any common ground.
We have models for, for example, gravitational interaction and what rocket thrust, amount of fuel, etc. are needed to put a satellite in a particular orbit, or send a bunch of experiments to Mars. They’re based on a more fundamental set of models of how we’ve observed gravitational forces to behave, how rocket thrust works in various environments (atmosphere, space), aerodynamic drag, etc.
We (a very broad use of “we,” I certainly don’t do this) see what is required to achieve the desired objective. It’s based on our level of confidence in our physics understanding. So, what about another example where there’s no objective? We understand a lot (certainly not everything) about fluid dynamics. We want to understand what happens to sediment when it washes out of a river into the ocean after a flood. How far out will it go? How will it disperse laterally? What will be its vertical profile? Yes, the equations and the parameters are put in but the model is run to see what our best understanding of quantities, flow rates, and physics will lead to. If we did this (again, “we” does not mean “me and my team”) and someone said “isn’t there a place in the code/model where you tell the model that the silt will move outward approximately x meters, it will be detectable approximately y meters east and z meters west of the river/ocean boundary?” The answer would be “No, we told the model about the river flux, the density of water, the specific gravity of the silt particles, the movement of tides, and our best understanding of the physics involved, etc. If we knew enough to plug in x, y, and z we’d not have bothered to run the model. Of course, we’ll still have to validate it against the observed facts. If it’s clearly erroneous, we’ll know that the properties and parameters were incorrect or that we don’t sufficiently understand the physics of the situation.” That’s very far from “the model only told us what we told it to tell us.”
“No one says “let’s assume that social isolation saves lives and make a model using that assumption to verify that isolation saves lives. Then we’ll enforce an edict to isolate based on the model.” If anyone believes that such a thing happened, there’s no hope for any common ground.”
Oh YES we can! And they do! Just ask any biased polling group or journalist, and then recall that scientists are no different. Even if the scientists want to be honest, the people paying them for results are not, and those people are the same ones paying the journalists and the pollsters and the marketing firms and the legalese departments.
Naturally no-one is saying absolutely everyone is out there to screw you and rig reality. But it exists, and it happens, and there’s plenty of evidence that this is the case here.
Nobody has a reason to rig a rocket model because there’s no incentive to make it fail for doing what it is designed to do. But someday if society “progresses” crazy enough and the environmentalists believe all those rockets up in space are endangering the planet which is killing the polar bears, then you will start to see rigged models designed to reduce the use of rockets and demonstrate that rockets are a danger to our planet and our health. The gravitational and thrusts and fuel are irrelevant to putting the rocket up there, except where they can feed the outcome that the rocket going and being up there is bad for us.
Johnno, you’re suggesting that polling groups doing polls and journalists writing articles or doing “investigative reporting” is analogous the process to which Dr. Briggs refers and I mentioned? That doesn’t make sense. Sorry, I don’t believe that the pols go to epidemiologists and say, effectively, “we need you to make a model that shows that isolation saves lives. Payment for the model is contingent on providing the results we described.” I don’t put any group on a pedestal but, without evidence, I’m not willing to accept that.
No one says “let’s assume that social isolation saves lives and make a model using that assumption to verify that isolation saves lives. Then we’ll enforce an edict to isolate based on the model.”
No one says this but they do this. Certainly as our host points out and I can verify, officials in Minnesota did exactly this. At every point their analysis was based on their precious models, but at every stage their models were wrong. Walz claims success since his policies “allowed us to” do better than his models, but this assumes that the models were an accurate reflection of what would happen without his policies (even though they weren’t accurate in the scenarios best matching his policies!) In his press conferences he constantly pointed out whether a model predicted a spike, or whether it said that his new policy would instead save lives. This is exactly how policies were actually carried out. He may not have said “we will do this because our model says it will save lives, when it was created with the assumption that doing this will save lives” but that’s what he actually did.
We have models for, for example, gravitational interaction and what rocket thrust, amount of fuel, etc. are needed to put a satellite in a particular orbit, or send a bunch of experiments to Mars. They’re based on a more fundamental set of models of how we’ve observed gravitational forces to behave, how rocket thrust works in various environments (atmosphere, space), aerodynamic drag, etc.
You are making a common error of physical scientists. That is, because we can discover regular physical laws which allow us to predict what happen in certain constrained situations, and so make good models there, then all models must be built on a similar foundation.
But consider what would be necessary to accurately model the spread of a new disease from physical laws: You would have to know how people gather or stay apart, including in response to new polices, news of disease, supply shortages from the disease etc. You need to know how much distancing actually affects the spread of the disease which will depend on whether people are outside or inside, ventilation, the type of activity they are involved in, their natural resilience against the disease, etc. If you throw in masks you must know the material in the mask, how properly the mask is being worn (is it a snug fit, is it fully covering the nose and mouth, etc.) how often people will get sick of wearing them and sneak them off, etc., and that’s without getting into the properties of the disease itself. And all this must be done for a disease which, at the time the models were created, had barely anything known about it (even whether it was airborne properly, or spread through droplets through sneezes, or primarily spread through some other fashion, was unknown.)
But we haven’t even gotten into the effects of increased hospital demand, disruptions to supply chains, public panic, depression, etc.
There are no set of laws that will let you accurately determine the behavior of an entire state under all these factors. So you either pretend those factors don’t exist, which gives a model so removed from reality as to be useless (kind of like saying “we will use this model that says how objects move in a weightless vacuum” to try to predict the motion of an underwater object in a current surrounded by schools of fish), or you must arbitrarily decide that your measures will have a certain preventative effect (the IHME explicitly did the latter with respect to masks.)