Statistics

# Weight Of Evidence

Some have the idea that there should be a “weight of evidence” that is separate from probability, or in addition to it. Probability judgments are held more or less strongly, and from that feeling—and it is a feeling, not a thought—comes the idea of weight.

The idea has merit, but in an opposite sense to what most mean.

All probability fits this schema:

Pr(Y|E),

where Y is the proposition of interest (Y can be a mathematical object), and E is the evidence or propositions thought probative of Y.

E is usually a compound proposition, which we might write

E = E_1 E_2 … E_p (all “anded”),

where it is assumed, in the presence of Y, none of the E_i can be deduced from any E_j, and where any of the E_i can itself be complicated (ands, ors, etc.). The numbering is usually somewhat arbitrary.

Now, to slightly switch tack, in mathematics it is well known that often some theorems may be proved in any number of ways. Here is a page which collected 122 proofs of the Pythagorean Theorem. I haven’t stepped through all of them, but a cursory look shows that each provides a different line of proof.

So if we let Y = “a^2 + b^2 = c^2” and E_73 = “Let CE = BC = a, CD = AC = b, F is the intersection of DE and AB….[the rest of the details of proof #73]”, and where, as always, the word and symbol definition and grammar are part of the evidence.

Then Pr(Y|E_73) = 1. But also Pr(Y|E_1) = 1, Pr(Y|E_2) = 1, …, Pr(Y|E_122) = 1, and so on. We could also write

Pr(Y| E_1 E_2 … E_p) = 1.

And we could strip away some of the E_i from the middle and still be left with the extreme probability. But we have that because there are so many different ways of proving this theorem, that it’s not just true, but “really true”, which is to say, of profounder importance. The weight of all this evidence doesn’t make Y an truer, because any one proof is sufficient for that. It’s that we sometimes suspect that what we have in hand is not a universal truth, but a local truth.

A local truth is when Pr(Y|WB) = 1 but where W it itself not a necessary truth and where B is our background information which is necessarily true. A local truth is when Pr(W|B) < 1. For example, W = “Riemann hypothesis”. There are many mathematical theorems which assume this, and show Pr(Y|WB) = 1. But it is still the case that Pr(W|B) < 1 for the B we know. That is, knowing only what we know B, the Riemann hypothesis is not proved.

A universal truth is when Pr(Y|WB) = 1 and when Pr(W|B) = 1, where B is necessarily true. The Pythagorean Theorem fits into this scheme. Technically, we should write Pr(B|I) = 1, where “I” represents our inductions, such as axioms, but I leave it out to make the notation simpler.

If all we had in hand was the simplest earliest evidence for the Pythagorean Theorem, E_1, we’d still have Pr(Y|E_1) = 1, but the theorem, while still proved, would not be as solid in our minds as when we have all E_i. That’s because any particle of E_1 that is removed, making it E*_1, would greatly lessen Pr(Y|E*_1).

It’s the same when the probability of Y is non-extreme, i.e. in (0,1), excepting the limits.
Weight of probability is a judgement to how robust Pr(Y|E) is from imagining removing parts of E. Weight of probability is thus a sensitivity measure, saying how stable the probability of Y is with respect to the given evidence.

It can in formal situations be quantified. Assuming some ad hoc probability model with observations D_n = (Y,X)_n, we can compute Pr(Y|X,D_n) or Pr(Y|X,D_{-i}) where we imagine removing the i-th observation. This gives an idea of the weight of evidence of particular observations. It’s all conditional, though, as removing the i_th observation may have more or less weight depending on the size of n.

Weight of evidence is important. For now consider Y = “P = NP”, a classic problem in computability theory (you can read about it here).

Just as we had the collection of proofs of the Pythagorean Theorem, there is a similar collection of results that show Pr(Y|E_1) = 1, …, Pr(Y|E_5) = 0, …, Pr(Y|E_8) = undecidable, …

Those aren’t typos! There are 116 different local proofs showing Y is true or false, and in a couple of cases undecidable (it is no problem having undecidable probability: see this).

Now since (this) Y will be necessarily true or false, it must be that at least some of these proofs are local, which means the evidence upon which they rely is itself not necessarily true. Or that a lot of people are making logic mistakes, which includes the chance of different interpretations of definitions. Both are live possibilities. The latter is, I am guessing, more likely correct, because definitions are often tacit and they are always tricky.

In any case, you can see that the weight of probability, or evidence, here is a genuine concern.

Categories: Statistics

### 2 replies »

1. McChuck says:

HTML apparently isn’t playing nice with others this morning:

“All probability fits this schema:

&nbps;&nbps;&nbps;&nbps; Pr(Y|E),

where Y is the proposition of interest (Y can be a mathematical object), and E is the evidence or propositions thought probative of Y.

E is usually a compound proposition, which we might write

&nbps;&nbps;&nbps;&nbps; E = E_1 E_2 … E_p (all “anded”),”

2. Briggs says:

More typos placed by my enemies!

They have been removed.