The Unexpected Hanging Paradox

From reader Neil Taylor comes the request we examine the unexpected hanging paradox. I’m stealing from Wikipedia, which stole a version from Wolfram.

A judge tells a condemned prisoner that he will be hanged at noon on one weekday in the following week but that the execution will be a surprise to the prisoner. He will not know the day of the hanging until the executioner knocks on his cell door at noon that day.

Having reflected on his sentence, the prisoner draws the conclusion that he will escape from the hanging. His reasoning is in several parts. He begins by concluding that the “surprise hanging” can’t be on Friday, as if he hasn’t been hanged by Thursday, there is only one day left – and so it won’t be a surprise if he’s hanged on Friday. Since the judge’s sentence stipulated that the hanging would be a surprise to him, he concludes it cannot occur on Friday.

He then reasons that the surprise hanging cannot be on Thursday either, because Friday has already been eliminated and if he hasn’t been hanged by Wednesday noon, the hanging must occur on Thursday, making a Thursday hanging not a surprise either. By similar reasoning he concludes that the hanging can also not occur on Wednesday, Tuesday or Monday. Joyfully he retires to his cell confident that the hanging will not occur at all.

The next week, the executioner knocks on the prisoner’s door at noon on Wednesday — which, despite all the above, was an utter surprise to him. Everything the judge said came true.

Nothing better illustrates the iron-clad fact that all probability is conditional, and that every probability depends on the conditions or premises assumed. In all arguments there are implicit premises about the definitions of the words used and grammar. Many disputes in logic and probability, such as the Monte Hall problem, are resolved with ease once the words are made explicit.

We’re after the truth of different propositions, for the ambiguity starts at the beginning. The first might be A_1 = “The prisoner is hanged next week”; a second A_2 = “The prison is hanged next week, but cannot guess with certainty the day”; a third A_3 = “The prison is hanged next week, but might guess with certainty the day and thus won’t be surprised, and if he does guess the day he won’t be hanged”.

The prisoner in the example was hoping it was A_3, or maybe something like it. He must have been a student in logic (perhaps at Berkeley), hoping that by his mental labors he could outguess the warden, figure the day, and then reason because the warden promised he wouldn’t know the day that therefore his execution would be stayed. But since our condemned con was only a student, he didn’t reckon that the warden would know what the logic student was using logic, would figure the student would reason the way he did, and then would pick a day before Friday and hang that dirty rotten so-and-so.

That is, the student was interpreting the warden’s words with an awful literalness. His interpretation used the same words everybody else gets, but his inference about what those words meant is different for me. An ordinary SOB like Yours Truly, who is as deserving to be hanged as the student, would interpret the words of the warden to mean that I’d be writing my last blog post next week no matter what, even if I got word of the execution date from the grapevine, even if I heard it from the warden himself in a whisper, even if the Governor called to grant a pardon.

On my thinking, the probability of A_1 is 1, and A_2 and A_3 are of no interest to me, even if their probabilities are 0. Why? Because I reason that the warden was speaking loosely, as wardens do when they want to rub it in and make it sting. For A_2 to be true, both conjuncts have to be true. I already figured the first conjunct is certain, therefore the second won’t mean much to me as I make my way to the gallows.

Next consider the student isn’t a student, but a Berkeley professor. He would never take place in riots (not since his knees went) and can’t understand why he’s wearing the stripes. Since he’s a clever fellow, he reasons initially like the student, but then he knows the warden would guess he would reason like the student. At this point he has to take the words of the warden with that extreme literalness, or like an ordinary man. If like an ordinary man, yet still as a Berkeley professor, he’ll make out his will, leaving everything to the local Masonic chapter, and then begin silently weeping.

But suppose he interprets them literally. It makes not one whit, nor even two whits nor three, of difference how the warden actually thinks to the professor’s probability. The warden will do as he may. Probability is a matter of perspective, not necessarily of reality. If the professor thinks the warden is speaking hyper-literally, and will allow the professor to walk free if the professor is genuinely surprised when the headsman walks through the cell door, then it is still up to the professor to figure to what extent the hyper-literal promise means. Whatever the warden will do eventually is immaterial to the probability the professor forms.

Now the prof can figure like the student, reason that the warden would know he would do this, and then the professor has additional implied premises, which is that the warden would think the prof is thinking like a student and not a professor, and so the prof would re-iterate the student’s thinking process, again coming to the conclusion that on no day would he be surprised. But then the professor might think that the warden would know that the professor is a clever fellow, and would figure he’d do the re-iteration, and again surprise the professor on some day.

But then the professor would et cetera. Again, where this stops is irrelevant to what actually happens. Yet it is crucial and decisive to the probability the professor forms. It is he who supplies the implicit premises, and it is his probability. This is true for you, the dear reader, who wants to figure a probability for the professor. You may think the warden is on to the prof, and that the prof in turn is on to the prof, and so on. Your probability would then be different than the professors and the warden’s, and the warden’s would differ from you and the professor.

The answer, then, is that there is no single answer. It all depends.


  1. A judge tells a condemned prisoner that he will be hanged at noon on one weekday in the following week but that the execution will be a surprise to the prisoner.
    Loose wording here. How can “the execution” be a surprise if the judge declared it will happen? Only the timing can be a surprise — as explained by the rest of the post.

  2. Monte Hall problem ASSUMES the prize is behind door C and therefore the probability is 2/3 if B is opened. There is nothing that says the prize not behind A. No one is forced to open B or C if the prize is behind A. The assumption vanishes. I’m rejecting the argument to switch as flawed by an assumption not a fact. (Or more accurately, the change in probably is based on ONE situation, not the possible two.)

  3. In the prisoner’s reasoning, the daily regress from Friday to Thursday to Wednesday, etc., is equivalent to the executioner allowing the prisoner to “expect” or “predict” the execution on any day, and then agreeing not to execute if the prisoner expects it on that day. If that actually is the executioner’s agreement, then the prisoner can indeed avoid the execution by announcing his expectation of the execution every morning. (If the prisoner truly expects his execution every morning, he will most likely die of anxiety before Friday anyway.)

  4. Monte Hall problem ASSUMES the prize is behind door C and therefore the probability is 2/3 if B is opened. There is nothing that says the prize not behind A.

    It is given that the prize is behind one of the doors (G). If door A is selected P(A hides prize|G)=1/3 and P(B or C hides the prize|G)=2/3=P(A does not hide the prize|G).

    Showing which of B or C is not hiding the prize changes nothing since you already knew one of them didn’t. The information remains the same therefore P(A hides the prize|G) remains 1/3 and P(B or C hides the prize|G) remains 2/3.

    Perhaps I’ve missed your point.

  5. The alleged hanging “paradox” is only a paradox when one constrains oneself to the defined parameters (i.e. when “the words are made explicit” as Briggs put it). The “paradox” evaporates when the problem is tweaked to ferret out underlying basic principles:

    Have the warden say the hanging will occur on some surprise day over the next 30 days. In that scenario one quickly realizes that there’s numerous days, weeks on end of them, where a surprise event must occur…but only the last few days where the surprise vanishes because ‘if it hasn’t happened by now, it must happen tomorrow’ (the logic logically applied starting on the last Friday and working backward chronologically).

    By changing the scenario by increasing the period’s range one easily realizes a non-linearity – extrapolating backwards from the last day to determine with certainty when the event must happen, meaning no surprise, only works for a few days, after which that technique fails.

    For the “hanging” or “surprise quiz” scenarios this reveals not a “paradox” (despite the numerous papers attesting to such) but merely a finite puzzle: “How many days before the stated last day must the event happen to be a surprise?”

    That though experiment prompts one to realize that whatever the period the warden states (assuming he didn’t lie and circumstances cannot violate his conditions), the event must occur (to be a surprise & depending on how one counts) no later than three days before the asserted last possible day.

    Messing around with the scenario into closely related scenarios is sometimes key to ferreting out such non-linearities — this is the exact opposite of Briggs’ assertion, “Many disputes in logic and probability, such as the Monte Hall problem, are resolved with ease once the words are made explicit.” Making such problems explicit helps ensure that underlying features & principles stay hidden.

    The principle applicable in the Monty Hall problem, typically abstract and not at all intuitive to most, similarly becomes almost intuitive for most if the initial number of doors is set at 100, or 10,000, before being winnowed to the usual three.

    Both the paradox & Monty Hall problem have, at their core, the value of additional information. The ‘hanging/quiz paradox’ can be reframed as:

    “If the convicted/student discovers my no-later-than-date for the event, how much time before that date must I make the event happen to ensure a surprise?”

    From the convicted’s/student’s perspective, the value of the additional information — the ultimate no later than date — has a quantifiable value for reducing uncertainty in real-world scenarios. Some statistics courses teach methods for estimating this … for some reason that is never addressed here.

  6. The Monty Hall is paradoxical because of the human foible of starting over when the situation becomes a choice between the final two doors.

    The faultiness of the logic becomes more apparent if one realizes that the initial choice partitions the space into two areas, X and Y, one of which has only a 1/3 chance of containing the prize (call it X) then failing to realize that knowing which subareas of Y are barren doesn’t change anything. The probability of the prize being in the Y area remains at 2/3. If you still doubt this, look up the Mythbusters episode where they show this empirically.

    The hanging problem of today’s paradox is pretty much the same. While a Friday hanging might not be a surprise on Friday, it’s only because not being hanged earlier is now a certainty. It’s like opening both of area Y’s doors and determining the prize MUST be on the X side.

    On any other day than Friday, you can’t be certain that today is not the hanging day. It’s the idea that it has to be a surprise that causes the paradox. However, on Thursday, it would still be a surprise to learn the hanging will be tomorrow and not today.

  7. I often have difficulty wrapping my mind around the concept of the probability of a one-time event. For example, the weather forecast calls for an 80% chance of rain tomorrow. What does that mean? The only way I can make sense of it is to imagine counting, over an extended period, the number of times it has rained and not rained on days where the forecast called for an 80% chance of rain. But then it’s no longer a one-time event.

    On the other hand, I have no problem accepting the relevance of probability to the door-picking task in the Monte Hall problem, which is a one-time event.

    Any ideas?

    Extra credit: are pull tabs a game of chance?

  8. Seems to me, if he were hanged on Friday it would be a surprise. If the prisoner wasn’t hanged before he’d figure that Friday must be the day and so would not be a surprise and so wouldn’t be hanged as per the promise. But, surprise! he’s hanged Friday.

  9. Milton,
    I often have difficulty wrapping my mind around the concept of the probability of a one-time event.

    Likely because you don’t see probability as the confidence in the truth of a proposition given the evidence. P(next roll = 2 | die) is the probability of the proposition (net roll=2) being true — a one-time event. P(winner of next Lakers game = Lakers | some information) is the confidence in the specified outcome the next game — another one time event.

    Frequentists think of probability as the ratio of number of times the proposition is true (T) to the number of tries (N). IOW: (P= T/N). This may be fine for estimating the probability of the proposition but causes problems with propositions that can’t be repeated.

    The frequentist approach will arrive at P(next roll = 2 | die) = 1/6 by substituting the expected number of times a die roll would be (2) after an infinite (or very large) number of rolls.

    The non-frequentist however arrives at 1/6 because the confidence in the outcome of any side showing on the next roll is deduced from the number of possible outcomes. In the case of a die, there is only one face labeled (2) and there are six sides. There is no evidence that any side would be preferred so the confidence in side (2) showing is 1/6. The confidence (probability) is deduced directly from the available information without adding suppositions.

    Some people have a hard time with that last sentence and want to keep adding suppositions. A lot like trying solve 3+1=? by supposing the formula was something else.

Leave a Comment

Your email address will not be published. Required fields are marked *