How An AI-Machine Learning-Physics-Sociological-Whatever Model Can Be Good For One Man & Bad For Another

How An AI-Machine Learning-Physics-Sociological-Whatever Model Can Be Good For One Man & Bad For Another

A model—which I do not distinguish between a theory—can be useful to one man and useless to the next.

“But Briggs, people don’t treat theories the same as models.”

Right: they do not. They should.

Suppose you are an academic in, oh, I don’t know, England. And you want to pander to your rulers. So you issue a model that says (as you told it to say, because all models only say what they are told to say) THE END IS NIGH.

Only the model turns out to be a complete bust. What you predicted not only did not happen, but something like the opposite did. Any resemblance between your model and Reality did not even reach the level of coincidence, because there is no resemblance between your model and Reality. Monkeys throwing flaming guano on keyboards would have made better predictions than your model.

Ignominy? Embarrassing? An outcome over which you should hang your head in shame and retire to a cave at the bottom of the sea to weep over your faults?

Not at all! For it turns out that your rulers, grateful to be needed, as your model implied, heap praise and love upon you. Your putrescence is not castigated, but rewarded! You are promoted. You are given a raise. You are lauded. Your critics cast into the outer darkness, where there is weeping and gnashing of teeth.

Meanwhile, Joe Bloke, like your rulers, also believed your model. He looked upon it and said, “Lo, the media has called this The Science, and I believe it is.” He quit his job, sold all his worldly goods and headed for the hills to await what you predicted was inevitable. Poor Joe.

The main point to this little tale I already gave: a model that is useful, even gloriously so, to one man, can be utterly useless, and even actively harmful, to a second.

No less important is this: accuracy, the degree to which the model matches Reality, is to a great degree irrelevant to a model’s usefulness.

The point of making a model is to gain some sense of the world. Most of us, rightly, ignore most models. But not all. Any model you favor will be used by you to make some decision, in what you do, or what you fail to do, in your thoughts or in your words (ahem).

It’s not that anybody, or rather everybody, takes a model’s output and treats it as sacrosanct. The model’s output is manipulated into something workable, and that makes sense, by each model user.

So that if a model said, “The probability of X is p”, you might treat this as “X is probably going to happen; I gotta do something about X quick”. Whereas your neighbor might hear the same thing and say, “P? Only p? Pfaagh. I don’t care.” And he might say this even if X means a lot to him, which it might not.

We make use of models. Only rarely should we, or do we, take raw model output and use it directly. It should be, an usually is, filtered and manipulated by whatever decision or use we can get out of the model.

The little example I gave is not exaggerated. Take it seriously. It shows how models can be, and are, used across a full range of circumstance.

This is the answer, but for some it won’t seem satisfying.

Ideally, of course, a model would match Reality exactly, precisely, with no error. But this only happens in complete causal models, which are as rare as a Woke zombie with a sense of humor.

The vast, vast majority of models we use, in every field of science, are not causal but associational. They only announce things like “The probability of X is p”. Even if p is high and X happens (or p is low and X does not happen), the model has not matched Reality. It cannot. It can only give probabilities—which are often not explicitly or numerically stated, but which are there.

That means we can’t use matching Reality as a measure of verification. Something else is needed.

There is the sense, though, that if p is high and X happens (and vice versa), the model is better than if the size of p and X have nothing to do with one other.

This sense can sometimes be quantified, and when it can, it forms the basis of many (proper) “scores”, like the Brier, log-loss, complete rank probability, and so on (look some up here; I won’t belabor what “proper” means here).

All these scores have uses, especially during model building times: they are popular in “machine learning” and “AI” work. But it would be amazing, even astonishing, that these one-size-fits-all scores would represent your score, which I call the Money Score (MS). Here’s how that works in a simple situation.

Suppose your doctor has a model for a Dread Disease, which you don’t want to have. He announces the model says you have a probability of p for the DD.

Now if p is “low”, you (if you are like many) will blow it off, and act as if you don’t have the DD. But if p is “high”, you will act like you do have it, at least in the sense of getting more tests, biopsies, and so forth.

If you don’t have the DD, and you blew off the model, all is well, and you benefit, at the least by saving yourself grief. If you do have DD and you acted on the model’s “high” p, then you gain the advantage of earlier treatment and the like.

But if you don’t have the DD, and the model “p” was high and you thought you did, then you suffer angst, worry, and the pain and costs of those other tests or treatments. And, no, you do not now “know” you don’t have the DD because of the other tests. You would have been better off blowing off the model.

And if you do have the DD, and the model p was “low” and you believed you don’t have the disease, then you again suffer in the obvious way.

This simple situation can be pictured easily:

That a, b, alpha, and beta, and the p at which you act (or not), you have to fill in. I don’t know what values you have. I’m not even sure I know my own: that would depend on the situation. Even stronger: Not everything can be quantified.

No two people will have the same set of values for (a, b, alpha, beta, p), except by coincidence. Which means, again, that a model that is good for one man can be bad for another.

This is why model evaluation is so difficult, and why models which look bad jokes flourish and are defended.

Incidentally, if you work in modeling (such as “AI”), it behooves you to get to the know the values (a, b, alpha, beta, p) of your users, or to demonstrate one of the simple scores (like log-loss) you use approximate the MS well enough. Of course, today we only did the simplest kind of modeling, but the ideas here extend to all types.

Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.

5 Comments

  1. William Wallace

    So it was a good thing for Joe. ?

  2. Hagfish Bagpipe

    That’s your theory, Briggs. It may be true, or not. I may believe it, or not. I might be able to put it to good use, or bad, regardless. The only sure thing is I’ll make some dumb comment.

  3. Briggs

    Hagfish: Mission Accomplished!

  4. Incitadus

    Reality doesn’t have to be real you just have to convince enough people that it is
    the revelation of the method.

  5. Milton Hathaway

    Re “Scoring Rules”, the technical articles in Wikipedia are a constant reminder to me that run-of-the-mill experts are often lousy communicators. After reading and rereading the introductory paragraphs, my reaction was . . . huh?

    Maybe ChatGPT can do better? Or at least I can ask follow-up questions if I’m still confused.

    Me: What is a scoring rule?

    ChatGPT: A scoring rule is a function that assigns a numerical score or utility value to a forecast or prediction made by a model. It is used to evaluate the performance of forecasting models and to compare different models to determine which one is better. The scoring rule takes into account the accuracy and precision of the forecast and how close it is to the actual outcome. The goal of a scoring rule is to incentivize the forecaster to make accurate and precise predictions by penalizing them for incorrect forecasts. Scoring rules are widely used in statistics, machine learning, and decision theory to assess the quality of predictions and to make informed decisions based on them.

    If “Wikipedia Contributor” was a paid position, I advise them to find another job, and soon.

Leave a Reply

Your email address will not be published. Required fields are marked *