Statistics

Poem Codes

This post originally ran 2 September 2009.

My favorite joke: Two cannibals are eating a clown and one says to the other, “qbrf guvf gnfgr shaal gb lbh?” Ha ha!

I tell it in the old style of the Usenet (remember that?1), where the punchline has been obscured by “Rot13”. This device was used back in the relatively pr0n-free internet days—well, not exactly free, but at least still blushing—when naughty jokes and other robust material was encrypted so that the unwary could not stumble upon it unaware. There was the feeling in those olden days that naughtiness, and especially advertising, were thought to be illegal and immoral. Rot13 allowed people to side-step naughtiness.

Rot13 is a simple substitution cipher, the kind found in the puzzle sections of newspapers (do they still have those?). These are exceedingly easy to solve: every time you see one letter you substitute for it another, and always the same one. In Rot13, the substitution letter is always 13 away from the actual letter, where we assume the (Latin) alphabet is in the form of a circle (“a” is next to “z” where the circle joins together). So, for example, when you see an “s” it really means “f”; when you see an “a” it is a “n”, and so on. Utilities were available to accept a string of Rot13’ed letters and “un-Rot” them; now we have websites: paste the punchline here.

Substitution ciphers can be done easily by hand: but this benefit is also a glaring weakness because it is trivial to decrypt them. Rot13-messages were obviously not meant to be unsolvable, but just as obviously, some messages are meant to be. In terms of deciphering, the next step up in difficulty are transcription codes, and the most amusing among these is the poem code, used largely in World War II. Most books on cryptography concern themselves with code breaking, but Leo Marks’s Between Silk and Cyanide describes beautifully the creation of cipher schemes like poem codes: I recommend it strongly.

This is how poem codes work. Start with a poem which you have memorized: it needn’t be especially long, nor complete. For example, this fragment from Ulysses will do: “for my purpose holds to sail beyond the sunset, and the baths of all the western stars until I die.” Select five words as a key from this: say, “for”, “sail”, “all”, “stars”, “die.” String them together and then number the letters, starting with “a” as 1, the second “a” as 2, etc.; or if there is no second “a”, then “b” is numbered 2; if no “b” then “c” gets labeled 2, and so on until we have numbered all letters. The result:

f o r s a i l a l l s t a r s d i e
6 12 13 15 1 7 9 2 10 11 16 18 3 14 17 4 8 5

Now suppose we want to encrypt the message, “We have run out of cigars, situation desperate.” Incidentally, encoding must not be confused with encrypting—our message, for example, may be encoded, “Nothing left for Mark Twain to do, dammit” (where we hope the person hearing this is clever enough to figure it out). Since there are 18 letters in our poem selection, we write out the message in groups of 18 letters, padding the end with nonsense letters, like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
w e h a v e r u n o u t o f c i g a
r s s i t u a t i o n d e s p e r a
t e a b c d e f g h i k k l m n o p

Note the first letter from our poem snippet, an “f”; under it is a 6: the second letter is an “o” and under it is a 12. In our (padded, grouped message) the 6th column of letters is “eud”, and under the 12th is “tdk”. It was more or less standard practice to send the encrypted message in groups of five letters, which reduced (but of course did not eliminate) transmission errors. So the first part of our message would be

    eudtd koekc pmwrt.

Try encrypting the entire message by hand! You’ll discover it is quite easy to do; you’ll also see where and how errors arise, an invaluable service.

We still have to tell whoever is on the receiving end which five words we picked as a key: these were the 1st, 6th, 14th, 17th, and 20th. The simplest method is to substitute letters, so: “afnqt” would be affixed to the encrypted message. The receiver then (roughly) follows the procedure backwards to decrypt the message.

Naturally, all nuance is ignored in this short discussion; but know that poem codes are tough to break for short messages and where the poem has been used to encrypt rarely or only once. If the poem is used often, or if it is well known, then breaking messages encrypted with it is not difficult. Additionally, poem codes have the advantage of ease and memorability, and the non-requirement of any computational device.

Finally, I teach you a joke known only to a select few (I learned it in the USAF as a crypto-tech; this is one of the many perquisites provided regularly to my dear readers). Hold up your hand with the back of it facing your victim. Wiggle you fingers vigorously and ask, “What is this?” When you hear, “I don’t know,” pull down your ring and forefingers and pinkie, leaving just the second extended, and say, “Cipher for this.”

Update: a challenge! As suggested, here is a code which you are welcome to break.

  agmpw   tdenl   wyecs   eotas   saobn   ynodo   orlet

I’ll post the solution in a week (if reminded).

——————————————-

1Yes, I know it still exists.

Categories: Statistics

8 replies »

  1. Gee, I had to seek help from Excel to comprehend this post.

    t e a b c d e f g h i k k l m n o p

    I think I see how you produce the nonsense letters at the end. Is there any reason why the letter “k” is repeated twice? Ha… with which finger do you type the letter ‘k’?

  2. I’m not sure I’d agree that it’s tough to break for short messages – maybe you should post a challenge?

    The poem doesn’t seem to do much other than provide a shuffling of a sequence of numbers, perhaps with the advantage that by adding some salt to the message you can significantly change the key. It’s almost a shame it doesn’t play a greater role.

    Homophonic substitution ciphers are a lot more fun to crack though! Much more statistical.

  3. It’s tougher to break short messages because figuring out the frequency of letters and patterns you’d find in English (like ‘e’ being the most common letter by far). The more you have, the closer you get to the actual average. Plus, if you know common English patterns (like ‘qu’) it helps a lot.

    To use an analogy, it’s harder to solve a crime with fewer clues than a complete description.

  4. There are lots of variations on this, too – like using a book, and having the cipher-text pick out phrases or letters from it, while the cryptoanalyst doesn’t know what book you are using. I suspect that today, the serious agencies like NSA have all the books digitized and can just try all of them by computer.

    A fun tale is that of how the Venona codes were broken. These used a provably secure technique – one time pads – but were largely broken by the NSA and its predecessor, over a period of many years. When the NSA published the results some years ago (see link), it also showed that pretty much everyone accused of being a communist spy – such as Ethel Rosenberg – was a communist spy.

    Needless to say, the Soviets didn’t actually use a provably secure technique – they misused one.

    https://www.nsa.gov/news-features/declassified-documents/venona/

Leave a Reply

Your email address will not be published. Required fields are marked *