So-Called Random Numbers And Encryption: Update With Real-World Example

So-Called Random Numbers And Encryption: Update With Real-World Example

Psst. Hey, buddy. Word to the wise: “GQNZWZURUDPBFWDMEGBYIXGBGZRSVNHWOIQPPSH GXUVMZ”.

Keep it secret.

That odd string of letters and spaces (well, once space) contains an encrypted message. In order to read it, you need to decrypt it. And in order to do that, you need to have the key. No computer analyzing the frequency of the letters (or characters) will help. Not even a quantum computer. In this case, only the key will allow the message to be read, for this encryption is an example of a “one-time pad.”

A one-time pad encrypts each character in a message with a separate, unrelated key. So there isn’t a key here, but 46 keys (the length of the string). That is, suppose the message was “HELLO” (which it wasn’t). The “H” would be encrypted with the first key, the “E” with the second key, and so on. Knowledge of any key would give zero information about the structure of any other key, other than the obvious information such as the alphabet used. In this case, the “alphabet” is only capital Latin letters and the space.

One-time pads are thus unbreakable—as long as nobody can predict how the keys were generated.

How do we ensure nobody can guess how the keys were made? Some would say “use truly random numbers.”

But there are no such things as “random” numbers, truly or no. Quanta Magazine sort of thinks there are, though there are’t; not in any ontological sense (thanks to Dan Hughes for the tip). In the epistemological sense, there are infinite collections of random numbers, where random merely means of unknown cause.

What’s the difference? I’m thinking of a number between 5 and 10. What is it? Before you answer, read or refresh your memory with this article: Pick A Random Number From 1-10.

You may get lucky and guess the number I’m thinking of, maybe shading your guess using some information you believe true about me. “Briggs likes primes numbers, but he likes anti-primes more. The only anti-prime in 5 — 10 is 6. So I’ll go with 6.”

The attempt here is to derive or guess the cause of the number I picked. It should be obvious that if you knew the cause, which is to say if you knew the picking mechanism, the number would not be random to you, but known. For somebody who can’t figure the cause, the number is random—in the epistemological sense. The cause still exists, it’s just that you don’t know it.

Quanta Magazine says, “Genuine, verifiable randomness — think of it as the property possessed by a sequence of numbers that makes it impossible to predict the next number in the sequence — is extremely hard to come by.”

Well, that’s it. If you had a string of numbers picked from some “alphabet” (like 0 — 9) r_i, i = 1, 2, …, n-1, then Pr(r_n | r_i, B) = 1/10, where B is any background information that you like, but which includes at least knowledge of the alphabet. And that’s as good as you can do. If you can do better, say Pr(r_n | r_i, B) = 1/10 + epsilon, then the numbers are more predictable than just knowing the alphabet. They are thus said not to be random. It should be clear by now that randomness is entirely an epistemological concept.

Quantum mechanics thus seems ideal as a method of generating unpredictable numbers, since the cause of quantum mechanical events is hidden from us. So far, anyway. If we’re willing to stop thinking locally, predictability might re-enter the picture. Read this article for one possible avenue. As for now, we can’t do any better than guessing (in certain experiments). QM is thus ideal for generating keys, since nobody can know the cause, and if you can’t know the cause, you can’t guess the key. And without the key, you can’t break the one-time pad, even with a quantum computer.

One-time pads are very easy to write. Since we’ve been using R, though it’s no ideal, let’s keep with it. Here is the decryption code.


msg = "GQNZWZURUDPBFWDMEGBYIXGBGZRSVNHWOIQPPSH GXUVMZ"

## decryption algorithm
d.msg = NULL
for (i in 1:nchar(msg)){

  mystery.function(...)

  a = substr(msg,i,i)
  b = sample(c(LETTERS, ' '),27)
  j = which(toupper(a) == b)
  d.msg  = paste0(d.msg,c(LETTERS, ' ')[j])
}

Now this code is not optimized in any way, but I hope it is at least plain. The msg is fed in and the decrypted message is stored in d.msg. The alphabet is c(LETTERS, ' '); i.e. the upper case Latin letters and the space.

The code steps through msg one character at a time, decrypting it by inverting the encryption code. It is a very simple substitution cipher, again where the key changes for each character.

The only mystery is mystery.function(...), which is a one-line built-in R function taking one argument. It’s purpose is to modify the sample() function so that it produces different rearrangements of the alphabet.

That should be enough information for you to figure out what that real mystery function is, or might be. Remember, all numbers have causes, and R has a specific algorithm to cause these rearrangements. You don’t need to know this cause in depth, but you do need to figure out what it is in general.

Here is another huge hint, the encryption code:


s = "THIS IS WHERE YOUR MESSAGE GOES IN UPPER OR LOWER CASE WITH NO PUNCTUATION"

## encryption algorithm
msg = NULL
for (i in 1:nchar(s)){

  mystery.function(...)

  a = substr(s,i,i)
  j = which(toupper(a) == c(LETTERS, ' '))
  b = sample(c(LETTERS, ' '),27)
  msg  = paste0(msg,b[j])
}

The mystery.function(...) is identical in both encryption and decryption algorithms. There, I’ve planted enough hints.

Now once you figure out the mystery.function(...) these algorithms would be useless for real messages, since if you could figure it out, so can your enemy. Unless you can swap R’s built-in function with your own that produces unpredictable numbers. (Which is to say, one that forces sample() to work in unpredictable ways.) Then you could transmit all the actual code except for your own mystery function. The code is, after all, trivial, and only does simple substitutions.

Once you do figure out the mystery function, you will learn the difference between encryption and encoding.

Homework

The alphabet takes only capital letters and spaces. Expand it to take common punctuation and lower case letters.

Even if I didn’t tell anybody I encrypted a message using R, somebody might have guessed that. If they did, explain how they’d go about inferring the cause of the keys.

(Incidentally, the number I was thinking of was 5.2. I reasoned—this is my causal mechanism—that I have played tricks before and that most long-time readers are on to them. I rejected 5.1 as too obvious, and went with 5.2.)

Update Here is an example of what is almost surely a one-time pad being used to transmit a short message over radio.

https://twitter.com/Spy_Stations/status/1143638364182253586

Using the radio is brilliant because it is virtually impossible to know who has received the message.

To support this site using credit card or PayPal click here

9 Comments

  1. Rich

    “The crypts are raiding the liquor store ” (to lower case to get past the spam filter)

    The advantages of retirement …

  2. Dave

    One way to generate truly random numbers is to set up an antenna receiving random noise from home appliances, distant radio stations, and deep space. The noise must be of high enough frequency to oscillate many times between each sample you take of it. Call anything above the midline “one” and anything below it “zero”.

    Later generations of Intel CPUs contain such a “hardware random number generator”, plus a “whitening” algorithm to provide an equal balance of ones and zeroes. But Intel refuses to share any technical details, so you just have to trust that their HRNG doesn’t have any secret backdoors that would allow the NSA to “randomly” generate a key identical to yours.

  3. Bedarz

    Random doesn’t always mean unknown cause. It could mean a fortuitous event, a coincidence.
    And to know whether something is truly a fortuitous, you need to know the causes. Because the fortuitous is defined as the intersection of two distinct chains of causation.
    This random can coexist with maximal knowledge of causes.

  4. A real security company generates random numbers by having a digital camera pointed at a wall full of lava lamps in the lobby. The passage of people through the lobby adds to the randomness.

  5. Nate

    Dang it. My guess was 5.11

  6. True randomness is an elusive concept at best. Real Random has invented a hardware random bit generator that uses physical objects to create a continuously chaotic state in a controlled environment that lives in a stack without the use of an algorithm to deliver Entropy as a Service.

  7. Regarding randomness, very useful, irrespective of causes which often we are not interested in. For example:

    “Randomization is essential; more precisely, any non-trivial privacy guarantee that holds regardless of all present or even future sources of auxiliary information, including other databases, studies, Web sites, on-line communities, gossip, newspapers, government statistics, and so on, requires randomization.”

    “The Algorithmic Foundations of Differential Privacy”
    https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

    Also the DIEHARD battery of tests for nonrandomness are very useful: https://en.wikipedia.org/wiki/Diehard_tests

    oxuvdhwzxrgiqbswqnnb?

    Justin

  8. Ray Kidd

    “The crypts are raiding the liquor store ”

    with trailing spaces

    99% perspiration …1% inspiration!

Leave a Reply

Your email address will not be published. Required fields are marked *