Submitted for your approval, yet another paper. On Probability Leakage, posted at Arxiv. Once you, my beloved readers, have had a go with it, I’ll incorporate your comments in an updated version and send it to a peer-reviewed journal, because there is no better guarantor of truth than peer review.

Only a sketch of the paper is given here. Please read it before commenting: it’ll probably save me re-typing what’s already there.

**Abstract**

The probability leakage of model M with respect to evidence E is defined. Probability leakage is a kind of model error. It occurs when M implies that events, which are impossible given E, have positive probability. Leakage does not imply model falsification. Models with probability leakage cannot be calibrated empirically. Regression models, which are ubiquitous in statistical practice, often evince probability leakage.

**Definition**

The exact definition is this: we model observables y via some model M, possibly conditional on explanatory data x and indexed on (unobservable) parameters. We can derive a posterior distribution on the parameters given the old data (call it z) and assuming the truth of M. Ordinary (and inordinate) interest settles on the posterior of the parameters.

The posterior, or functions of it, are not observable. We can never know whether the posterior says something useful or nonsensical about the world. But because we assume M and we have seen z, it logically follows that the posterior predictive distribution p(y|x,z,M) exists (this “integrates out” the parameters and says something about data not yet seen). This distribution makes observable statements which can be used to verify M.

Now suppose we know, via some evidence E, that y cannot take values outside some set or interval, such as (y_{a},y_{b}). This evidence implies Pr(y < y_{a} | E) = Pr(y > y_{b} | E) = 0. But if for some value of x (or none is x is null), that Pr(y < y_{a} | x, z, M) > 0 or that Pr(y > y_{b} | x, z, M) > 0, then we have a probability leakage; at least, with respect to E.

The probabilities from the predictive posterior are still true, but with respect to M, z, and x. They are not true with respect to E if there is leakage. This probability leakage is error, but only if we accept E as true. Leakage is a number between 0 (the ideal) and 1 (the model has no overlap with the reality described by E).

**Model Falsification**

The term falsified is often tossed about, but in a strange and loose way. A rigorous definition is this: if a M says that a certain event cannot happen, then if that event happens the model M is falsified. That is, to be falsified M must say that an event is impossible: not unlikely, but impossible. If M implies that some event is merely unlikely, no matter how small this probability, if this event obtains M is not falsified. If M implies that the probability of some event is ε > 0 then if this event happens, M is not falsified period.

Probability leakage does not necessarily falsify M. If there is incomplete probability leakage, M says certain events have probabilities greater than 0, events which E has says are impossible (have probabilities equal to 0). If E is true, as we assume it is, then the events M said are possible cannot happen. But to have falsification of M, we need the opposite: M had to say that events which obtained were impossible.

Box gave us an aphorism which has been corrupted to (in the oral tradition), “All models are wrong.” We can see that this is false: all models are not wrong, i.e. not all are falsified.

**Calibration**

Calibration has three components: probability, exceedance, marginality. All three are proved impossible if there is probability leakage. If M is to be evaluated by a strictly proper scoring rule, the lack of calibration guarantees that better models than M exist.

**Example**

Statistics as she is practiced—not as we picture her in theoretical perfection—is rife with models exhibiting substantial probability leakage.

Regression is ubiquitous. The regression model M assumes that y is continuous and that uncertainty in y, given some explanatory variables x, is quantified by a normal distribution the parameters of which are represented, usually tacitly, by “flat” (improper) priors. This M has the advantage of mimicking standard frequentist results.

…The logical implication of M is that, for these values of x, there is about a 38% chance for values of y less than 0 (impossible values) at Location A. Even for Location B, there is still a small chance 2% for values of y less than 0 (impossible values).

**Conclusion**

An objection to the predictive approach is that interest is solely in the posterior; in whether, say, the hypothesis (H) that absentees had an effect on abandonment. But notice that the posterior does not say with what probability absentees had an effect: it instead says *if* M is true and given z, the probability that the parameter associated with absentees is this-and-such. If M is not true (it is falsified), then the posterior has no bearing on H. In any case, the posterior does not give us Pr(H|z), it gives Pr(H|M,z). We cannot answer whether H is likely without referencing M, and M implies the predictive posterior.

The area to the left of the vertical line represents probability leakage. The “normal model” says our uncertainty in y is characterized by a normal distribution. The “Location A” and “Location B” is from a larger regression model where one regressors is a categorical variable based on one of two locations.

Arxiv is great for previewing papers, because I just recalled I stupidly left out some clarifying information that turns everything into delightful mathematical objects (for those who like those kinds of things).

According to E y lives on some set S

_{1}, but if there is leakage M says y lives on S_{1}U S_{0}. That is, Pr(y in S_{1}| M ) < Pr(y in S_{1}| E). The inequality is strict. This only implies that there is at least one element of S_{1}in which the probability of y taking that value is different conditional on M and E. Of course, it might be that M and E give different probabilities for every element in S_{1}; but this need not be so. Lastly, if we knew which elements differed, we would of course have not used M. Plus, in many situations, E does not give enough information to say what are the probabilities of each element; it only announces which sets are probability 0.OK. I’ll bite. What exactly is wrong with using a normal distribution in a limited range if it fits the data (assuming normalization)? How is it any different than using a straight line approximation for the (more or less) linear portions of a sinusoidal curve? I generally use discrete data but I do have one variable where the density plot is quite similar to ‘A’. It would reasonable to use the normal distribution as an approximation of the actual density for values above x=zero in this case.

DAV,

A normal was used in the (very common) example, with the results seen. What’s wrong is the enormous leakage, meaning

anydecisions based on this model will be poorer than they would have been using a model with no leakage. That was proved by Schervish in his 1989 scoring paper, which proves out that calibrated models do better regardless of loss function; and by the proof above that models with probability leakage are not calibrated.I’ll be visiting the idea of “approximation” soon.

You touched on something that I would like to hear more about…

A model is falsifed if P(y|x,m)=0 and P(y|x,E)>0.

As the normal model is ubiquitous, and is it range is always infinate, it is not possible for

P(y|x,m) to equal zero. You may say that it is effectively zero, but non-zero nonetheless. e.g. Models of daily stock market volatiltly say that a -20% day is a once in a billion year phenomon. But of course that happened on 10/19/87.

I think I am more in line with DAV on this one. Without context, your paper fails to

impress an old electrical engineer who routinely models circuit performance with

truncated Gaussian distributions with NO appreciable error (e.g. amplified thermal noise hitting a well designed Analog to Digital converter).

Doug M,

I address this in the paper.

Bill S,

You are actually agreeing with me. And your example perfect to demonstrate what I mean.

First, I do not claim that the normal model never fails to provide a reasonable approximation. I claim that it often does. I give an example of a common situation—these fill soft science journals, and are used in, e.g., marketing, advertising, and on and on—of a normal model with horrible performance, but which looked good from all classical validation measures.

Second, a truncated normal is not a normal. A truncated normal (truncated to the known bounds of the apparatus under consideration) will have no probability leakage. This is a good thing.

Third, physical situations, for lots of good theoretical reasons, often are well modeled, or well approximated, by normals. But they do well because of another reason you suggest: physical situation models are usually checked against real observations. Many or most models are not.

Fourth (update), this paper does not just refer to normals, but to models of any kind. Normals are just the most used.

This is more like a definition for

bad modelingthanmodel falsification.Bad modeling! One is supposed to build a model M based on evidence E. The leakage problem as defined is easily avoidable.

BTW, why is the word “leakage” used?

JH,

Your comments are somewhat confused. Perhaps another reading of the full paper will help you?

The definition of falsification is, in fact, rigorous. Have you a counter argument to suggest why this is not the definition of falsification?

As you read, falsification is only respect to some external evidence E. Too, most probability models in use are not falsified, so this is not a good definition of bad modeling.

You said the leakage is “easily avoidable.” One would think so, but it is not avoided and I would wager that a good many regression models you yourself have used (in teaching and consulting) suffer from it. Care to bet?

Pardon my ignorance, but wouldn’t a Bayesian modelling approach be immune to this particular issue?

Toby,

Everything here is written from a Bayesian perspective, so no.

I am not sure what you mean by external evidence E? Is there so-called internal evidence?

Are you saying that the

some external evidence Eis obtainedafteryou build the model M? But that’s not what your example in your paper seems to be illustrating.Give me a reasonable scenario in which the definition is applied to conclude that a model M is falsified.

Your model M in your example, assuming you use a linear normal model, says that the measure of abandoment can be negative, but the data evidence E indicates it can’t be. So M will never be falsified based on your definition?

Also do you come up with the model before or after the measurement of abandonment (E) is defined and observed? The answer is important because, to me, there is no point talking about falsification or misspecification of any model formed before data is collected.

Let me rephrase bad modeling as

model misspecification.If I have data evidence E as described in your example, fitting a normal linear regression is a case of model mis-specification. Hence Bill suggested a truncated normal.I can also fit a gamma distribution (a generalized liner model) to the abandonment data. In the case, this gamma model M says that the adornment measure can only be positive, so does the data evidence E. Based on your definition of falsification, the gamma model M isnâ€™t / cannot / will never be falsified? Itâ€™s not false? We know that this M might still be a misspefied model and might not fit the data well.

No need to bet. I admit to my laziness because I think trying to explain it to my students would cause more confusion instead. Still, the kind of misspecification in your example is easily avoidable.

Please donâ€™t ask me to read the paper again. How about you explain it in a different way?

Darn typos. Ah…

Hey, Mr. Briggs, what are those \rightarrow convergence thingys in Section 2.3 Calibration? They donâ€™t look Bayesian to me. Who interprets probability via the Law of Large Numbers?

Hahaha…

Briggs,

Congratulations on, as always, a thoughtful and clear paper. My only substantive comment is that you write off the continuous model /discrete data leakage issue–“Since all real-world measurements known to us are discrete and finite, yet we so often use continuous probability models to represent our uncertainty in these observables, we must turn a blind eye towards leakage of the kind just noted.”–and I would like to know more! I think this is an interesting issue in terms of prediction.

Often, models we know are wrong do better whether because they are more parsimonious, shrink more, etc. For instance, in some applications of conditional class probability estimation (in statistical learning), restricting the estimates to be in, say, {0.05, 0.10, …, 0.95} can provide better estimates that using estimates in [0,1] even though the probabilities themselves are in [0,1]. Perhaps an analogue in reverse is happening here: by using a necessarily wrong but parsimonious parameterized continuous distribution, we do much better from a predictive standpoint than trying to estimate the actually discrete distribution. Thus, perhaps our continuous and hence “fully leaked” model is actually good. If this is the case, maybe leakage and thus lack of calibration is not so bad (at least this kind of leakage; clearly your call center end-point leakage is pretty bad!). Anyway, a further investigation of continuous/discrete could be interesting.

Incidentally, if we are going to stick with the end-points only, how about the case where you know endpoints exist but you do not know what they are (ie, E tells you there are endpoints but does not identify y_a and y_b)? Again, it could be that by estimating them you do worse than if you pretended they did not exist and used the model. I suppose one response is that this and the above are unlikely to be true asymptotically.

One other semi-substantive comment. If you are going to submit this for truth-guaranteeing peer review, I think there needs to be more of a “so what”. I think some of this is already in there but not presented with as much oomph as necessary: put it all over the abstract, intro, and conclusion! Perhaps, for impact, you may want to get some fMRI data and show that leakage causes thesists’ brains to be squishier ;-)!

A couple of minor points:

[1] Typos: the R’s are different in the third line of Section 1; “is x is” on page 4, “It well to point out” on page 7, “T-distribution” or “t” distribution on page 7 (depends whether you mean the whole vector or a scalar), why residuals in quotation marks on page 7?

[2] Stylistic question: Why Pr and not \mathbb{P}?

[3] What does null x mean? Integrate over the distribution of x? But that is not given. I think it means as if we did know any x at all (and hence assumed y~N(mu,sigma^2)) but was not totally clear on this.

[4] I am somewhat confused about the notation at the top of 2.3. Are these arrows denoting n -> \infinity where n is the size of (x_{old}, y_{old}) and what exactly are Q_i and P_i (if not Q and P conditional on x=x_i)? If not, then I am confused why you sum i=1…n.

[5] Maybe plot some of raw data or the model fit in Section 3?

[6] A quibble with the top of page 8: H_{\theta} is assumed entirely unconditional but isn’t there always an implicit condition on the observed data and model (ie, z and M)? I suppose it depends who you ask but I would tend to say so.

Again, thanks for a nice paper and for your daily work on the blog!

A minor quibble regarding “Of course, E could be

modfied so that it admits of the continuity of y, but is it still true that the

probability of actual observable events (in real life applications, with respect

to any continuous M) is 0.”:

Wouldn’t it be a better in light of the precision used to measure, regardless of the continuity or otherwise of what is being measured, to give the label “actual observable events” to things like “temperature observed between 35.05 and 35.15”. There doesn’t seem to be any practical difference between modifying E to admit the possibility of continuity of y, and simply being honest about what was actually observed.

Will study and get back to you, but my first impression is that the paper is EXCELLENT!

Reminds me of a conversation (one-sided mainly) that I had with Dr. David B. Duncan (now deceased) at my parents 50th anniversary party in Carmel Valley. Duncan was of course the author of the Duncan Multiple Range Test, a method of comparing distributions. He pointed out to me, with great amusement on his part (and confusion on mine) that such comparisons (models) inherently contain impossible outcomes.

He drew it all out on a napkin. Which I subsequently misplaced. You never know what you’ve got ’til it’s gone.

Jonathan D,

It is a question of verification. You may claim (in your E) that temperatures exist between 35.05 and 35.15 (which, given the evidence we have about statistical physics, they surely do, but they do not take values in the continuum; ultimately, temperature takes only discrete values) but you will never be able to verify any statement you make.

Let me give you a clearer example, one often found in business. Our y is a “Likert scale”, a 1 – 5 (or 1 – 7) response on some question where higher indicates better. This y is often used in regression models. All attempts to make people stop this are futile. The argument I hear from statisticians is that “All I want to know about is the parameters anyway, so these models are close enough.” But we see (in the paper) that this argument fails.

Anyway, to your question, while there are surely attitudes in the “gaps” (between, say, the 1 and 2), we will never know if we are right if we create a model which predicts, say, a y = 1.5.

But what we can and should do is to the model the actual probability of seeing the only answers we can (and will) observe.

B,

The typos were placed there by my enemies. Curse those scoundrels!

Quite right about the lack of a clear motivation. Not enough words on why this is bad. But for now I’ll say it in one: gambling. Anybody who makes bets (i.e. decisions) on models with probability leakage is setting himself up for a loss.

The question of how do we know about the endpoint values excellent. I’m writing something separate about that. Turns out to be a very interesting problem. Short answer: not all probabilities are quantifiable.

Yes, models with leakage can be good, in some sense. We have seen that not all models with leakage are falsified. So it might be the case that the leaky M before is is the

bestwe can do.Thanks very much for this close reading. This has been enormously helpful to me. Oh, I find $\Pr$ more charming than $\mathbb{P}$: the former is less easily mistaken for a distribution function.

Uncle Mike,

Thanks!

JH,

Good comments on the calibration. Yes, these are (sometimes) frequentist concepts. So these results show that frequentists who resist the Bayesian urge will suffer loss in their models have leakage. Of course there is also a large literature on Bayesian calibration (esp. from Dawid and others).

The leakage I mention might be easily avoided, but it isn’t. This is the point: is it not avoided. People do not even think there is a problem. I have shown there certainly is. You say you don’t want to tell your students of this type of error to avoid confusion. Yet I wonder how many of these students will go out and incorrectly use regression. Most of them? All of them? How many of them know how to fit a generalized linear model with gamma link function?

I had thought the falsification definition was clear, but apparently not. If a model says that a certain event cannot happen, yet we observe eventually the event, then the model is falsified. Simple as that. The regression model M for abandonment is not falsified because it predicts negative abandonment. I actually discuss this in the full paper, which makes me suspect (I am surely wrong) that you might not have read this part.

Bill’s truncated normal can easily be falsified if Bill truncates at the wrong values, such that his model covers too narrow a range.

It indicates both Bayesian and frequentists approaches have strengths and weaknesses. It seems to me that a modern statistician should know both methods.

Whether students know how to fit a GLM or GLNM is irrelevant. The chance is that this paper will be refereed by academic statisticians. They are sharp, and want to know what they have learned from the paper.

One can only hope to have a kind referee like

B.No, not that your definition is not clear, itâ€™s that your abandonment example points to the intuition: a bad model can result in a large probability leakage.

The example is supposed to tie all your topics indicated by section heads together, isnâ€™t? So your definition is more like for bad modeling.

Ha, Billâ€™s truncated normal! It is actually Tobitâ€™s.

Yes, the truncation might be placed at the wrong values. (Check out Tobit model.) For the abandonment (y) example, I know for certain that y is positive based on how y is obtained. Iâ€™d just truncated at the lower limit of 0 and place no upper limit. There, a model that allows all positive values. So instead of saying that the model can be easily falsified, Iâ€™d say one can easily find a model that cannot be falsified; see also

B’scomments.So for an adequately postulated model, how can one compute the probability leakage? If you define a measure, you’ll will have to be ready to tell how it can be calculated or estimated using the sample.

For the calibration section, Iâ€™d suggest that you read the paper about strictly scoring rule by Tilmann Gneiting (remember this paper? No, the paper didnâ€™t answer my questions I asked before.), in which the same notations of Q and P are used. Note that how they start with a theoretical definition of strictly proper scoring rule; and then the convergence is presented since a strictly proper score is to be defined for/applied to samples.

JH,

Pretty much everything you have said here is either false or confused.

Probability cannot be defined via either frequentism or Bayes: only one of these (or perhaps neither) is right. To say that both have “strength and weaknesses” is false. If Bayes is true, to the extent that frequentist methods have strengths it is when they approximate Bayesian truth. If frequentism is true, to the extent Bayes methods have strengths it is when they approximate frequentist truth. If both are false, then we should be busy finding the true definition. Since I claim, and believe I have proved, that only Bayes is true, that is what I advocate using.

The definition of falsifiable is perfectly clear. If you cannot understand it, well, I’m sorry about that.

You seem to be arguing that because you, JH, know how to make a good model that probability leakage is not problematic or does not exist. This is fallacious. My claim is that, very often , models with large probability leakage are used in actual practice. I further claim that many who use these poor models do so because they feel they are “close enough.” I am showing why they are not close enough. You are claiming that because better models exist that poor ones do not. This is very strange. In order to refute my claim,you must show that models with leakage are not in fact in use. Good luck.

You have also admitted that you often use models that evince probability leakage so as not to “confuse” your students. Students will go out and use bad models because they have learned no alternatives.

You asked for an example of a falsified model, and I suggested that B’s model, if truncated incorrectly, is easily falsified. This is a simple mathematical demonstration—a proof, if you like—of falsification. You don’t accept this simple claim because, again, you say that better models exist. I hope you see the fallacy in this. It is the same mistake you made earlier.

Probability leakage can only in some instances be computed with z alone or when x is null. If you knew what new x to expect, then you can calculate leakage unambiguously; else you have to model x. If x is null—as I explain in the paper, which I’m beginning to suspect…but never mind—then probability leakage can be computed in advance. I say more about this in the paper.

Tilmann Gneiting is a good friend of mine. We sat on the AMS Probability & Stats committee together. I have contributed discussions to work of his in JASA and elsewhere. I have seen him present the paper on calibration twice, have discussed it with him many times, and have, as a matter of fact, faithfully represented his work (giving him full credit) in my paper. I have also written papers myself on calibration. If you will notice in my paper, I also mention proper scores and calibration. You are also confusing models with the scores: you need to read his paper closer. Lastly, since you say nothing about them, I assume that you agree that my proofs in this paper are correct.

UpdateSince you are a frequentist, I would love to have from you a guest post defending the confidence interval. I could then rebut. Deal?Mr. Briggs,

Please read my comments again. (I don’t tell my students to do so, it’s usually easier and more helpful to explain things again.) Let me not imagine what went through your mind when you wrote your comments. For example, neither did I say that probability leakage is not a problem, nor did I say the definition is not clear.

The strictly scoring paper did a wonderful job in presenting the materials. I am suggesting that you see how your friend presents the materials. Your conclusion of my confusion is not called for. I challenge you to explain the paper to your readers! If you want to engage in such an accusation statement of no logical reasoning and evidence, let me throw one at you! If you understood the paper, you’d know it didn’t answer my question in a previous post!

Have you noticed that I haven’t actively participated in your discussion of p-values? Have I ever claimed that I am a frequentist? What’s the qualifications of being a Bayesian? Do I have to preach about the philospphy to be one?

If people want to know about the debate about the two approaches, google it, there are more than enough discussions out there. I rather read a new paper to learn something new, which is one of the reasons I put forth my comments on your paper!

(Sorry about typos as I am not using a regular keyboard.)

JH,

I am now not certain, except for your unobjectionable claim that my writing could be better, if you have any disagreements with me.

I have no disagreements but questions that I think would be worth exploring. No, i don’t think you have sold and explored your ideas well, just to be honest. Isn’t this why you post your papers? Seek honest opinions. I am quite good at offering compliments too. I love your blog… does this help improve your paper? ^_^

Just to be clear, when I said that I have no disagreement, I mean that there is nothing to disagree in your paper. And it doesnâ€™t mean that I agree with your ridiculous comments below.

Pretty much everything you have said in the above comments is based on your imaginations.

For example, your proofs in the paper are correct? You have proofs in your paper? An anecdotal calculation is a proof? A proof of what? A bad model can result in a big probability leakage! It’s a mathematical exercise problem in a mathematical class for the Bayesian chapters.

Your comment on frequentist and Bayesian methods has just destroyed your calibration section.

No kidding. Why would anyone disagree with that superficially? But if you think deeper, if you know you’ve truncated incorrectly, yes, it’s false to start with. But if you don’t know you’ve truncated correctly, can you easily falsified? Here is another claim that you never provide evidence, yet you expect others to provide evidence/proof for their claims.

And you said,

Where did I say that a better models exist, hence I don’t accept this claim? Simply your imagination!

Too many statements of no logical reasoning and evidence in your response.

But if you donâ€™t know

whetheryouâ€™ve truncated correctly, can you easily falsified the model?JH,

Knowing in advance whether a model will be falsified is a separate question of knowing if a model is falsified. Many models are posited which, before being applied to new data, are not known to be falsified. This is why they are posited. After new data comes in, these models might become falsified.

I gave the rigorous criterion of falsification. Many statisticians (Bayesian Andrew Gellman on his blog, others) incorrectly state what falsification is. Read the reference for an example of such an misunderstanding (the one from Dawid, a Bayesian).

There is a large literature on calibration with which I suggest you are unfamiliar, particularly Bayesian arguments for and against the usefulness of calibration, especially with regard to decision analysis (proper scores and all that). There is at least one proof in my paper where I show that any model with probability leakage cannot be calibrated. Perhaps this eluded you because I did not label the start of the proof with the words “theorem.” Nevertheless, it is there—and it is a valid proof.

I claim that many, very many, models in actual use evince probability leakage. Do you dispute this claim?

I further claim, and prove, that probability leakage causes people to misinterpret evidence for and against parameter-based hypotheses. Do you dispute this claim?

Though I did enjoy reading the strictly scoring paper, it didnâ€™t answer my questions as you claimed that it would, which is disappointing.

So, donâ€™t quote any references for me to study please, briefly describe what those papers say, just as you should do in your paper. Donâ€™t suspect what I donâ€™t know, please, tell me what you know.

Simply saying that Gellman incorrectly states what falsification is or asking people to read the references donâ€™t make you more correct in your claims.

Show me first your evidence to support your claims, and then I’ll refute them.

Still, a bad model can result in a large probability leakage and of course, can lead to wrong conclusions (not misinterpret evidence). Whatâ€™s new here?

How about thinking about how one can estimate/compute the probability leakage for an adequately postulated model? I mean, a model that is properly posited based on the observed data, not one that the software allows you to (see http://wmbriggs.com/blog/?p=4381).

My question below stays unanswered.

If you donâ€™t know whether youâ€™ve truncated correctly, can you easily falsified the model?Yes or No?

JH,

You did not answer any of the questions I posed. This was not generous of you. Nevertheless, I will answer yours.

This was answered when I said: “Knowing in advance whether a model will be falsified is a separate question of knowing if a model is falsified. Many models are posited which, before being applied to new data, are not known to be falsified. This is why they are posited. After new data comes in, these models might become falsified.”

This is not a question, merely an unfocused accusation. Which exact question about calibration do you have? And you did not answer whether you have found flaws in my proofs that models with probability leakage cannot be calibrated. You cannot keep evading this.

Papers—any paper—are not textbooks. They are not expected to re-define every concept from scratch; they expect an educated audience aware of the topic at hand. I expect that my audience are well versed in model verification (to include calibration, proper scores, and so forth).

This is true. My stating that others are in error, even though I point to a source which unambiguously shows that they are if you would care to read it, does not prove that these others are in error.

There are two claims I am making: (1) that people often make mistakes about what falsification is; and (2) what falsification is. As I have been repeatedly asking, do you claim that my definition is false or flawed? If so, prove it. Claim (1) is proved below.

If you are claiming you knew all about probability leakage before you saw my definition, well, it would be ungentlemanly of me to dispute you. I can only answer that I have never before seen my definition in print (not the term, of course: the calculation). Plus there are many users of statistical models who are unaware of leakage and the harm that it can cause. And you have admitted that you do not teach your students of this concept so as not to “confuse” them. You never answered how many of your students go on to use models with leakage.

You also did not answer my question about parameter-based hypotheses. This is a part of “what’s new.” That people who make inferences of models with leakage (or which are falsified) are too sure of themselves. And I can quantify by how much.

Again, I go over this in the paper. I ask you to find the relevant passage.

——————————————

To prove Claim (1), I need only show that some Bayesian mistake what falsifiabilty is. You have already proven, by your first question, that you do not understand it. Here are a sample of quotes from very prominent sources.

“I falsify models all the time.” Gelman, speaking of probability models in the sense of giving probabilities to events that E says are impossible; hence falsification is impossible. http://www.stat.columbia.edu/~cook/movabletype/

“[T]he probability that the ‘truth’ is expressible in the language of probability theory…is vanishingly small, so we should conclude a priori that all theories are falsified.” Same reference.

“[P]assing such a test does not in itself render [a] theory `proven’ or `true’ in any sense—indeed, from a thoroughgoing falsificationist standpoint (perhaps even more thoroughgoing than Popper himself would have accepted), we can dispense with such concepts altogether.” Dan Navarro. Same link, but in the story “One more time on Bayes, Popper, and Kuhn.”

“A theory that makes purportedly meaningful assertions that cannot be falsified by any other observation is ‘metaphysical.’ Whatever other valuable properties such a theory may have, it would not, in Popper’s view, qualify as a

scientifictheory.” He also says in the same paper, “Causality does not exist.” These are from the Dawid reference you could not be troubled to read.(I am xperiencing dÃ©jÃ vu), so if new data come in, it might be falsified but not easily falsified? Attach a probability to each of the two words â€œmightâ€ and â€œeasilyâ€ for me. Yes, I am nitpicking.

Well, again, whether or what students/practitioners do and can do really has nothing to do with the statistical merits of a paper. However, it may be relevant if you want to send your paper to Journal of Statistics Education or The American Statistician.

If you know the papers already, you should use them to demonstrate your points. And if you could not even be troubled to briefly summarize the key points of the papers, how could you expect anyone to be bothered to read the papers? Do you enjoy this kind of useless assertions? I donâ€™t.

The effect of model misspecification have been studied intensively, at least in classical statistics, e.g., the book â€Limited dependent and qualitative variables in econometricsâ€ by Maddala

1983.A book that an undergraduate student wonâ€™t be able to understand. It examines the effects of misspeciaction of the same situation as the abandonment example, and therefore the solution of Tobit Model. Old but a great book.The probability leakage to me is an effect of model misspecificatin. As Bayesians always follow behind what classical statisticians do in certain areas of research, I imagine the effects of model misspecification have also been researched intensively. I am not writing the paper, it’s up to you to study the literature on how it’s done.

Sorry, itâ€™s neither a question nor an unfocused accusation. Check out the post in which you referred me to the strictly proper scoring paper.

I’ll let you Bayesians fight amongst yourself. Gellman is a great Bayesian statistician though.

My last comment on this post.

JH,

You have not answered any of my questions. Again. Nevertheless, I will answer yours (again). I will number and re-put my questions to you in separate headings, so that we don’t lose track of them. Please answer each number, so we know where we are.

Falsification:(1) The statements about falsification were proofs of what falsification means and what it does not. Do you agree with these claims or not?

(2) All I claim is that falsification is this and such. I gave a criterion which may be applied to any model as a test. Do you in fact agree or disagree with this criterion? No evasions, please.

(3) I also answered your query with references that people do in fact misunderstand falsification. Your answer is that “I’ll let you Bayesians fight…” This is not an answer at all, of course. It was and is irrelevant whether any people do in fact misunderstand falsification for my definition to be true or false. Nevertheless, since you first asked, do you agree or not that some prominent statisticians mistake what falsification is?

(4) Further, at your request, I gave a practical example of a model which might be falsified. I cannot attach a probability that this model “might” be falsified, nor do I need to, since all I wanted to demonstrate what that it was possible to falsify a model by the criterion which I outlined. As I have repeatedly stated, if you knew in advance that a model was falsified (via external evidence E), you would not use it. You only use it because you do not know (have not proved) that it is in fact falsified. Do you agree with this or not?

Probability leakage:(5) The reference you cite does not include any notion of probability leakage. If you claim it does, show exactly where: page numbers and quotations, please.

(6) Second, you have not acknowledged whether you agree with my claim that many, very many, models in actual use evince probability leakage. Do so, or prove me wrong.

(7) It is false to say that undergraduates would not understand model misspecification (though they might not understand that particular book). I claim that undergraduates can be taught the concept of falsification and probability leakage. I further claim that you, and many others, did not and do not

in practiceconsider probability leakage as model misspecification. If you dispute this, prove I am wrong.You say, “As Bayesians always follow behind what classical statisticians do in certain areas of research, I imagine the effects of model misspecification have also been researched intensively.” There are two claims here: one is false, the other true.

(8) Bayesians do not always follow behind what classical statisticians do. This is a silly claim. But you are welcome to prove it.

(9) Model misspecification has indeed be researched intensively. Probability leakage does not figure in these investigations. If you claim it has been, prove it.

Probability leakage is a kind of model misspecification, but it is more than that, as the next section shows.

Probability leakage and falsification:(10) I claim to show that models with leakage are

notnecessarily falsified. Do you dispute this? If so, prove my argument is wrong.Parameter-based hypotheses:(11) I asked you several times whether you can show if my claim that parameter-based hypotheses in models with probability leakage are in error or have misstated probabilities. This is a very important claim I make. Is it in error or not? If it is, prove it (show that my argument contains an error).

Calibration:(12) I ask you again, can you show where my claim that models which evince probability leakage cannot be calibrated? If you cannot, say so.

If you have an exact question about calibration, state it and I will answer.

Journals:(13) Plus, I never had any intention of submitting this paper (suitably modified) to the journals you mentioned. What made you think I would?

Here is what you do all the time: you donâ€™t show your evidence, but when people disagree with you, you demand evidence. What kind of game is that?

Falsification:This paper is worth reading. http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf

You definitely can define what falsification means however you wish, but using one example to demonstrate the negation of it? Do philosophers do that?

Again, your falsification requires external evidence E, itâ€™s not clear what E is in your abandonment example (is this example you are talking about? I have started forgetting some details in your paper, sorry.)

The fact that a probability leakage can only be calculated for specific cases is not too attractive to me.

Probability leakage:(6) No, the book doesnâ€™t use the probability leakage. Of course, you may name whatever you want for the probability you defined. Naming it differently doesnâ€™t mean it has not been explored. A good literature review can show people you have done your research thoroughly.

All I know is that I postulate a model based on the sample/data evidence, I canâ€™t guarantee the future data would falsify my model at all based on your definition of falsification. I accept the uncertainty as I am no fortune teller.

I have no idea whether many models evince probability leakage. It might be true because some Bayesians would blindly use models because the software allows them to, without knowing how one can properly postulate a model.

Where is your evidence that many models evince probability leakage, besides the ONE poorly postulated model of the standard normal being fit to the abandonment data?

(7) That’s right. Undergraduate students can understand the concept of misspecification. Just as they can easily understand why the results of the least square regression (requires no normality assumption) can only apply to the x variable values that fall in the range of observed data. However, to understand the Maddalaâ€™s book? I doubt it.

Provide me evidence and then I shall see if your claims are worth responding. Yes, my imagination might be false, no doubt about that.

Again, If you want to sell your paper, itâ€™s up to you to research thoroughly. If no one has studied it, then you have new idea to market, and thatâ€™s great. I have submitted a paper that the referee told me my ideas had already been studied. Bummer.

(8)

Bayesians do not always follow behind what classical statisticians do. This is a silly claim. But you are welcome to prove it.Of course, itâ€™s silly because you fail to read the entire sentence â€¦in certain areas of research. Letâ€™s see how many Bayesian analysis papers contain â€œlikelihood functionsâ€ build by Bayesians. Not too many. Yes, the analysis frame work might be Bayesian. Iâ€™ll get back to you tomorrow when I am in my office, and if I have time.

(9) Maybe you are right in saying that probability leakage does not figure in these investigations. itâ€™s up to you to find out; see (6) also.

Please fix my html code for boldface. Missing a html angle bracket?

Probability leakage and falsification:(10)

I claim to show that models with leakage are not necessarily falsified. Do you dispute this? If so, prove my argument is wrong.So models with leakage are not necessarily falsified. By your definition, whatâ€™s to dispute here? Youâ€™ve got to make it so that the conclusions sound more sophisticated.

Parameter-based hypotheses:(11)

I asked you several times whether you can show if my claim that parameter-based hypotheses in models with probability leakage are in error or have misstated probabilities. This is a very important claim I make. Is it in error or not? If it is, prove it (show that my argument contains an error).Again, demonstrate your claim to me first. What exactly you mean by â€œparameter-based hypotheses in modelsâ€? The assumption of a particular prior distribution?

Calibration:(12) Well, I guess I have to read the paper again.

If your external E means future data, the of course, one cannot calibrated the model NOW, one has to wait for the future data! Not fun at all.

The left side of the following canâ€™t converge to a sample mean of Q_{i}s.

\frac{\sum P_i(y)}{n} \rightarrow \frac{\sum Q_i(y)}{n}

Instead, you should write that the difference converges to zero. If the external evidence means the future data, then you might want to use another notation m for its sample size.

In addition to the trivial mistakes, itâ€™s not clear to me why the convergence is correct.

If the evidence E means observed data and if we know M with respect to E evinces probability leakage, then no need to prove it. Self-evident at least to me anyway. ^_^ This simply begs for the question of how once can detect that probability leakage. If you cannot, not sure how valuable this claim is.

Journals:(13)

Plus, I never had any intention of submitting this paper (suitably modified) to the journals you mentioned. What made you think I would?I didn’t say that you would. I said that IF you want to sell your papers based on what students know or donâ€™t know, those might be appropriate journals.

Is it my English? Or is it your over-active imagination again and again and again and again and again and again (Sixth time is the charm)?

JH,

Last comment, eh?

You say, “Well, I guess I have to read the paper again.” That much we can agree on.

(1) You did not answer.

(2) You did not answer.

(3) You did not answer.

(4) You did not answer.

(5) You did not answer.

(6) You admit the reference you supply does not speak of probability leakage. Thank you.

You then say, “A good literature review can show people you have done your research thoroughly.” I cite 19 papers, which is about average. You are not familiar with this literature. I am. Others who are as familiar as I am will see that these citations are adequate.

As I said earlier, a paper is not a text book. It cannot teach every subject from scratch. If you are truly interested in this topic, you have to do the homework to make intelligible criticisms.

“Where is your evidence that many models evince probability leakage, besides the ONE poorly postulated model of the standard normal being fit to the abandonment data?”

This is a good question. Look to any (empirical) psychology, sociology, education, etc. journal to find examples aplenty. I further have more than a decade of experience in consulting and have found a vast number of marketing statistics examples use models with leakage. Look to some of the very examples you yourself use in regression.

(7) You now admit that undergraduates

canunderstand misspecification. Yet you say you won’t teach them because they cannot understand it. Which is it?“Again, If you want to sell your paper, itâ€™s up to you to research thoroughly.” This is a sorry thing to say, mere intimation that my work has been sloppy. When in fact all you have shown is that you are unfamiliar with Bayesian theory.

(8) Good idea to retreat here, as you did.

(9) I am right, yes. Thanks.

(10) “So models with leakage are not necessarily falsified. By your definition, what’s to dispute here? Youâ€™ve got to make it so that the conclusions sound more sophisticated.”

You’re letting your pique show. I have given citations (in comments and in the paper) itself to show what is new. You did not know, before I told you, that models with probability leakage are not necessarily falsified. Many others also do not know.

(11) I am sorry that you cannot understand this claim. Perhaps after reading through the bibliography I provided, my question will make more sense.

(12) You did not answer.

“Instead, you should write that the difference converges to zero.” This is false. I should write it just as I did.

“In addition to the trivial mistakes, it’s not clear to me why the convergence is correct.” This is true: I mean, I agree it is not clear to you why the convergence is correct. Remember that Gneiting paper I cited? Now is the time to read it.

“If the evidence E means observed data and if we know M with respect to E evinces probability leakage, then no need to prove it. Self-evident at least to me anyway. ^_^ This simply begs for the question of how once can detect that probability leakage. If you cannot, not sure how valuable this claim is.”

This is a good point. We statisticians do a poor job thinking about evidece. We are very good at the math, but not so good on the epistemology. Anyway, the technical definition of probability leakage you now acknowledge to be true.

But it is false to say that ths “simply begs the question of how once [sic] can detect probability leakage.” You misue the term “begging the question.” Plus, I have given examples and guidance on how to calculate leakage.

(13) Can’t admit you were wrong, eh? I understand. It can be painful.

New question(14) Have you ever

(a) computed a predictive posterior distribution (explained in my paper)

(b) computed probability leakage?

(8) There is nothing to retreat.

(9) Did I say or imply that you are wrong? Let’s recapture here.

JH:

The probability leakage to me is an effect of model misspecificatin…. I imagine the effects of model misspecification have also been researched intensively…, itâ€™s up to you to study the literature on how itâ€™s done.Briggs:

Model misspecification has indeed be researched intensively. Probability leakage does not figure in these investigations. If you claim it has been, prove it.JH:

Maybe you are right in saying that probability leakage does not figure in these investigations. itâ€™s up to you to find out; see (6) also.And your conclusion:

Briggs:

I am right, yes. Thanks.(You were probably smiling a big wide grin when you type this.)

So you are right if you say so. However, what is the logic reasoning behind this conclusion?

(6)

You admit the reference you supply does not speak of probability leakage.I never said it did.

…

(7) Understanding the CONCEPT of misspecification is different from uderstanidg of model misspecificaion.

(12) As n goes to infinity, a function of n will converge to….another function of n? Is an empirical distributions discrete or continuous? Does its inverse function exist?

Oh, begging the question has come to mean “raising the question.”

JH,

(1) You still did not answer.

(2) You still did not answer.

(3) You still did not answer.

(4) You still did not answer.

(5) You still did not answer.

(6) You clearly intimated that the concept of probability leakage was if not in that book then known elsewhere. I asked for references to where you had seen it. You cited this book.

(7) You say “Understanding the CONCEPT of misspecification is different from uderstanidg of model misspecificaion” This is false.

(8) We are done here, it looks. Thank you.

(9) The “logical point” is that I have claimed that this concept has not been elucidated in the literature before. You have not refuted this claim.

(10) Same.

(11) Same.

(12) Again, this is explained in Gneiting’s paper (and in other sources).

(13) You clearly intimated that I would submit to one of these journals. I obviously never had any intention of doing so.

(14) You did not answer. Or rather, it appears you have and the answer is “No.”

My intension was pure curiosity. If you had computed probability leakage before, then you would be able to say so.

(15) “Have you read all the papers quoted in your paper?” Yes. You have not, by your own admission.

(12) Ahâ€¦ you really donâ€™t see the mistakes I have pointed out in (12). Gneiting and Raftery donâ€™t make those mistakes! I know you claim that youâ€™ve read it, but here is the paper. http://www.stat.washington.edu/research/reports/2004/tr463.pdf

Please consider my previous comment again.

Note that how they start with a theoretical definition of strictly proper score; and then the convergence is presented since a strictly proper score is to be defined for/applied to samples.

(9) Yep, here is the game you play.

Briggs:

I claim that God exits (doesnâ€™t exit). If you refute it, submit your evidence.JH:

Maybe there is evidence, but itâ€™s up to you to figure out.â€(She simply has no interest in the claim.)

Briggs:

You have no evidence, hence my claim that God exists (doesnâ€™t exit) is right.(7) FALSE? How about becoming an academic?

Use either one of the following as my answers to the rest of the questions:

â€œSo what?â€

â€œWhatever.â€ (Youâ€™d read my replies obscurely anyway. You canâ€™t think straight! I have cast several magic spells on you. ^_^)

JH,

(1) – (6) Nothing

(7) Have you a post to offer me?

(8) Nothing

(9) Logic is a terrible, unforgiving mistress.

(10) – (11) Nothing

(12) We’re out of sequence. A minor typo on my part, perhaps. But no answer whether the proof that models with probability leakage cannot be calibrated.