Philosophy

What Random Means In Random Number Generation

Dice throws appear random because their outcomes cannot be predicted naively. But knowing the physics and initial conditions, they can be predicted, and so are not truly random.

Thinking about one-time pads reminded me to point this out.

It’s simple, really. A “random” number generator spits out a string of numbers or characters from some set, say c1, c2,…, cp, where each of these is a character out of p total. Example: c1 = ‘a’, c2 = ‘b’, etc.

Before the generator starts, using only the premises that the generator can produce only these p characters, we deduce the probability that the first character is ci (for any i in 1 to p) as 1/p.

Start the generator. The characters roll out one at a time. To be be considered “random”, this condition must hold:

     Pr( Ct = ci | C1, C2, Ct-1, generator premises) = 1/p for all t and all i,

where I mean by this notation that the probability the next character (Ct) will be one of the set is 1/p no matter how far along in the series we are (short of infinity) and no matter what character we consider. The “generator premises” are those which state only the characters c1 through cp are available, etc.

In other words, if there is any information in the series to date that allows us to calculate any probability other than 1/p for Ct, then the series is not “random.” Be clear that “allows us” means “logically possible” and not necessarily practical or in-practice possible. There may be some existence proof which says something like “Given this generator and an output series like this-and-such, the probability of Ct is x, which does not equal 1/p”. We may never see the this-and-such series, but still if the output series is possible, then the output is not “random.”

Now don’t start going all frequentist on me and say, “Look here, Briggs, you fool. I’ve ran the generator a finite number of times and the relative frequencies of the observed Ct don’t match 1/p.” I’d reply, “Get back to me when you reach an infinite number of times.”

Run the generator for just one character. The observed relative frequencies will be 0, 0, …, 1, …, 0, where it’s 0s for all ci except the 1 character which showed. What does this prove? Next to nothing. Probability just isn’t relative frequency, but relative frequency can match the probability. Probabilities predict relative frequencies. In that spirit, we know the series is not “random” if we can do a better job predicting the series than saying the probability of the next character is 1/p (for any character).

But I see the idea of relative frequency is still alluring. Perhaps this is why. There is in math the idea of a “normal” number, unfortunately named because “normal” in probability means something else. A normal string or number is one in which the digits/characters repeat equally often, yea even unto infinity. Examples: 0.111222333444555666777888999000111222… and ‘abcabcabcabc…’ (we know the numbers are limited to digits 0-9, but here I limit the characters to a,b,c).

These normal numbers are in no sense “random”, because if you know where you are you know with certainty what the next digit or character is. Plus, there are some technical ideas, where a number may be normal in one base (say 10 or 2) but not normal in another base. Here is an example of a number, Stoneham’s constant, which is normal in one base but not another. So “normal” does not imply random, but we have the sense that “random” implies “normal”, which it does.

Truly “random” numbers are probably (I don’t believe there is a proof) normal in any base. Another way to put it, in terms on information, is to say that the number cannot be compressed (in any base); that is, speaking loosely, that it takes just as many characters to compress the number as to display it. The above-linked article gives some hints on the “randomness” of π, which appears normal in many (all?) bases and which cannot be compressed. So where do the digits of π and other transcendental numbers come from? Only God knows.

Last point: many “random” number generators—where by now we see that “random” merely means unknown or unpredictable in some sense—are wholly predictable, they are fully deterministic. Start at a known output and the path can be predicted with certainty. These are called “pseudo-random” generators because the numbers only appear unpredictable.

And what does that mean? Appear unpredictable? Well, it means not-random, that we can prove that a set of premises exist which predict the series perfectly. The difference between a “random” generator is that we can prove no such set of premises exist.

Categories: Philosophy, Statistics

33 replies »

  1. Late last century, there was a suggestion to use those icons of the psychedelic times; Lava lamps as data sources for random number generation.

    I pointed out (in a Usenet newsgroup) that just because they didn’t understand the physics, didn’t make the process random. Of course I got flamed for heresy.

    The encapsulation of a system (the lamp) must reduce the “randomness” of the process because it reduces the influence of “unknown” factors on the system.

    We simplify systems for analysis by notionally or physically excluding (ignoring) factors that are not deemed significant to the analysis. Enclosing a physical system implements selective ignorance unless one remains appreciative of the exclusion of “unknowns”.

  2. There applications of “random” numbers where true randomness (that is, forever unpredictable) is a bad thing. Cryptography is one. If the output of a cryptographic algorithm were truly random then the receiver would not be able to decrypt it.

    Are the digits of pi really random? The third digit is always a 4. You can compute the rest in sequence, yes? Wouldn’t that mean they are predictable?

  3. DAV,

    Of course, if you and I share the same “random” sequence, then we can encrypt and decrypt at will. Being “random” does not mean that you haven’ t memorized the sequence, as people do with digits of pi, but that nobody can predict what comes after the known ones.

  4. Sheri,

    A telepathic one it seems.

    Briggs,

    Yes, a shared list would work until it runs out but the goal in decoding intercepted encrypted messages is to uncover the underlying pattern which is the message. The pattern tends to defeat the encryptor. You have to ensure (well, at least try to ensure) never saying the same thing twice and you also need to overcome the redundancy built-in to language but if redundancy is completely eliminated then the risk of transmission errors garbling the message increases.

    But that’s a bit OT. I wonder, how would you know when you have a truly random number generator? It would seem every one carries the caveat: as far as I know

  5. Sorry. I don’t count that as proof. It could be simply that the underlying pattern in a quantum sequence has yet to be discovered. Is it possible to prove that an underlying pattern in a given sequence is non-deterministic?

  6. Hmm… should have said: Is it possible to prove that a given sequence is non-deterministic. That is, possesses no pattern whatsoever.

  7. DAV,

    See Bell’s theorem. And—this is my fault for not making it clear—don’t forget the difference between epistemological and ontological definitions of “random.”

  8. Oh boy! Homework!

    I only note that Bell’s theorem does not rule out the possibility of a superdeterministic theory in the future.

  9. The invocation of super-determinism to get around Bell’s theorem is more a sign of desperation than a theory. Also Briggs won’t like the complete lack of free will.

  10. The invocation of super-determinism to get around Bell’s theorem is more a sign of desperation than a theory.

    Desperation? How silly and sad.

    My position is that every theory which ignores hidden variables carry’s with it the condition: as far as we now knowargumentum ad ignorantiam. Until it’s conclusively proven that hidden variables do not exist, the possibility of their existence remains.

    It started from the notion that proving a given sequence has no underlying pattern (vs. one imposed by the observer) is impossible. Bell’s theorem falls short because it doesn’t rule out determinism. The claim of inherent randomness is a claim impossible to prove or disprove.

  11. Briggs,

    You state that most random number generators are pseudo random. Aren’t they all pseudo random?

  12. DAV,
    More of a desire than a position as there is no QM theory of super-determinism. “Until it’s conclusively proven”: Are we talking about physics or mathematics? Hey, I’m being to sound like Briggs. If you take that position than we can’t rule out Conan Doyle’s fairies in the garden either. My original comment stands. 🙁

  13. It’s been years, but I do remember that there was a theorem (proved, I believe) that multiplying two random sequences of numbers together could only improve the measures of randomness, or leave it unchanged (with suitable definitions of those terms). In other words, if your random number generator was insufficiently random for your application, you could only improve it by multiplying it by the output of another random number generator. “Do no harm”, as it were.

    I believe that some commercial noise generators took advantage of this principle by muliplying the output of digital (PRBS) noise generator with an analog (noise diode) noise generator.

  14. there is no QM theory of super-determinism.

    Ya don’t say? Who’da thunk?

    we can’t rule out Conan Doyle’s fairies in the garden either

    Nope even if they seen improbable. But no hidden variables is saying, “Gosh! I can’t think of any reason so there musn’t be one!” Bit parochial, no?

  15. Not all random number generators are pseudo random. I have seen random number generators that use thermal (Johnson) noise in a resistor to generate true random numbers. As John Von Neumann said, “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.” Pseudo random number generators are completely deterministic and that’s a good thing. I used to run Monte Carlo analysis of circuits where you perturb the circuit element values to see if the circuit still works OK with worst case component tolerances. If it goes out of spec you change components and rerun the Monte Carlo analysis from the same starting point and it goes thru the same sequence of pseudo random numbers which is exactly what you want. That way you know when you are at the same point when things went awry in the previous run.

  16. How about an engineering definition: a random number generator is one where the NSA can’t predict future digits. Just kidding.

    The crypto folks have done a lot of work on the problem, no doubt mostly secret. Crypto has a clear need for random number generators that match the Briggs’ definition above. It also has a need for pseudo-random generators, where nobody but intended recipients can predict the sequence, but key sharing (required for truly random sequences) is too much of a burden.

    For fun, look up the FBI/NSA Venona project, where an *almost* one time cipher was mostly broken over the course of decades. The reason is wasn’t true one time is a fine tale about the weirdness of communist planning.

  17. Prof Briggs –

    I am a little unclear – surely 0.111222333444555666777888999000111222… is not normal, if I am understanding correctly – as the pattern 111222333444555666777888999000 will appear with probability 1 and all other 30 digit strings will not appear at all ie probability 0 – and so, as long as I am understanding correctly, this fails a vital “normality” test.

    I think I can provide some relevant information – The sequence of pi is algorithmic in base 16.

    Link – http://www.math.hmc.edu/funfacts/ffiles/20010.5.shtml

    If you want to know the 1000th, or any other, digit of pi just type it into the algorithm and it is there.

    Does that answer the question “So where do the digits of Ï€ and other transcendental numbers come from?”

    It obviously doesn’t answer why!

  18. DAV,
    It looks like you are trying to dismiss modern physics with snark. Since that can’t be true I must assume that you are simply failing to make yourself clear.

  19. Rod Montgomery,

    As you can see from the date, this has been known for a long time. It’s a problem inherent in all (or nearly all) algorithmic generators with linear congruential among the worst. See Knuth, The Art of Computer Programming. Volume 2 – Seminumerical Algorithms which was initially published about the same time as the PNAS article.

    Whether a generator is useful depends upon the application. My default is the Mersenne Twister (MT19937) but there are better. As always, there are trade-offs.

  20. DAV: I am just showing my age. Marsaglia’s result was new when I was first learning about pseudo-random numbers and their pathologies. 😎

  21. WMB (aka Major Briggs) — Years ago, when I worked at IBM with a bunch of crazies hired by IBM to build the first ever web site for the Olympic Games [note: usual IBM hiring standards were set aside, the only requirement was “smart guys that can really do it” (the Internet) resulting in our group of crazies and misfits]. Returning to work at 0100 hrs one night, I found my friend, let’s call him Jay for short, tinkering with some odd looking program at his computer station. The three stations to his left were all “blue screened”.

    “Hey, Jay, what’ca doin’?” I asked. “”ahhhh, nothing…” he relies.

    “What happened to those stations?” I ask (pointing to the three blue screens.) “I’m having trouble with this program…a little. I got it from a physics prof at (local university to remain anonymous).” he says.

    “Jay,” says I, “the diskette contains a virus which has destroyed weeks of work on those stations…and you’re trying it on a fourth?” “Hmmmmm…maybe you’re right”, he relies.

    Please note, Jay has an IQ in the stratosphere somewhere, but lacks something in the merely practical realms.

    “Hey, Jay, what’s that wire there?” I ask, indicating a small wire running from the serial port on his computer down into a desk drawer. His reply of “Nothing!” came way to fast for me.

    Opening the drawer, I find a small box, hastily assembled out of plywood, a connector in one end, wire leading to computer.

    “Jay, come on…what’s this thing? What’s in it?” I ask pleasantly.

    “It’s just a little Geiger counter thingy….measuring radioactive decay…gives a tick each time it detects a decay, you see?” Jay offers.

    “Decay of exactly what? Jay?”

    “Ahhh…..ummmm….well, its not very big, really, just a chip sort of, well, not a very big piece, No, look, it’s not dangerous if you put it in a metal desk drawer…..really!” he answers desperately.

    “Jay, you mean radioactive, like uranium or plutonium? Jay, this is a plywood box…plywood, Jay. Plywood does not stop radioactivity…OUT! OUT OUT OUT! Now! Jay, this is IBM International Headquarters for heaven’s sake. Gerstner’s office is 100 yards away. You’ll be fired, the lab will be quarantined,……”

    Once the plywood box had been removed to Jay’s car, moved to the far end of the parking lot, Jay explained carefully that the ONLY source of truly random numbers were the times between decay’s in a radioactive material. He *needed* a truly random number source for his new project…predicting the future with his new program…..if only he could get a random number generator that was well, really random……

  22. “Truly “random” numbers are probably (I don’t believe there is a proof) normal in any base.”

    Is a proof even required since that property is implied in your original definition of ‘random?’ If there is, even if only in theory, a base where the digits are predictable the odds aren’t 1/p and the generator is therefore not random.

Leave a Reply

Your email address will not be published. Required fields are marked *