More On Monkeys Typing Shakespeare & Why Chance Isn’t Enough

More On Monkeys Typing Shakespeare & Why Chance Isn’t Enough

Way back in 2010, I did the Infinite Monkeys theorem, which says an infinite number of monkeys unintelligently whacking away at typewriters would eventually reproduce Shakespeare. But there aren’t an infinite number of monkeys, or an infinite number of anything, so it’s more fun to do the problem with a finite number.

In that, we learned “According to Bennett, Briggs (no relation), and Triola, Shakespeare penned 884,647 words”, which isn’t that many, in the scheme of things. I estimated that accounted for about 6 million characters (letters and punctuation, from a set of 45 possibilities). This gives the probability a single monkey, hitting a keyboard 6 million times, reproduces Shakespeare at about 2 x 10-6,000,000. That is, 2 divided by 1 followed by 6 million zeros.

A single monkey therefore likely won’t do it. But neither will any finite number available to us. Too many characters, not enough time.

Well, now the problem is official, because the New York Times recently wrote an article about a new paper which finds more or less the same as I did. The paper is “A numerical evaluation of the Finite Monkeys Theorem” by Stephen Woodcock and Jay Falleta in Franklin Open (thanks to JC for the tip).

I’m delighted to report the authors have a sense of humor—how rare!—and open their paper with the Bard’s own words: “Alas, poor ape, how thou sweat’st!”. They also have occasion to quote from (the original) Planet of the Apes, which is always a wise move. They come to the probability 10-7,448,357, which is smaller than mine, but they’re more careful on the precise number of characters, and their probability is for one monkey working constantly over his entire life.

Then it struck me that words are not characters. Monkeys think in images, phantasms, which is to say, something like whole words. What if we allow them to have a go using whole words and not characters? And we can also account for the order of Shakespeare’s works, which we don’t care about. That is, if a monkey whacked out Hamlet first and Henry VI last, we wouldn’t care. We also want to give our monkeys a break and have them go it only at working hours.

Again, there are 884,647 words, and these are from a “dictionary”, meaning first the list of words in Shakespeare’s head, including any neologisms. One source says there are 28,829 unique words in that set.

Maybe we can get monkeys to manipulate the words like ideograms in a Chinese-room-type experiment. That way we can ignore punctuation and other niceties. After all, if monkeys can type on a pre-set selection of characters, they can also type by hitting pictures of words. Be a big keyboard though. But if we’re having monkeys trying to reproduce “the pearl of English literature”, an expense like that is no bother.

It’s simpler now. Chance of getting the first word right, given the new information, is 1/28,829, and so the chance of getting the entire corpus right (and in the right order) on a single “try” of hitting the ideographic keyboard 884,647 times is

$$\left(\frac{1}{28829}\right)^{884647} = 10^{-3945375}$$.

Call it 10 to the minus 4 million to account for bad-banana induced mishits. At one key a second, that’s 10.23 days for a try, or 245.7 hours. But that’s having a monkey go at it without cease. Let’s ease up and see what happens.

A work day is, or used to be, 8 hours, with 5 work days a week. Each month has about 4.2 work weeks, and 8 hours a day gives 168 hours a month. So it turns out if we can push the monkeys to have 12 hour days, with weekends and nights off and 15 minute lunches, we can get one try a month out of each monkey. Pretty good! If our monkeys can’t do it, we can use H1-Bs and import some from India—I understand they have a surplus. American monkeys don’t want this kind of work anyway.

One count puts the number of Indian monkeys at a million, with another big chunk of langurs. If we wanted langurs we’d use Harvard graduates, and we don’t want them. It’s monkeys or bust. With current “immigration” rates, most of the monkeys will be in Canada in about 10 years (hitching rides in luggage, etc.), and at least half will have crossed the southern border during that time. We can make up the other half million by plucking them from the caravans coming north from the Global South.

So we have a million monkeys to work with. A million monkeys can thus do 12 million tries a year. How many tries are necessary? Well, as many as it takes to reproduce Shakespeare. We’ve seen the probability of 1 try. What’s the chance of a least one “hit” in 12 million tries? That’s easy. From the binomial, it’s

$$1 – (1-p)^n$$

where p = 10^{-3945375} and n = 12 million. Don’t bother punching it into your calculator. Too big. But we can ask how big n has to be such that the probability of a match is greater than a half. First let q = 3945375. Then to good approximation (set the previous equation to be greater than 1/2, realize 1 – p = 10^a/10^q for some a very close to but smaller than q, take logs, notice $a-q<0$, and solve for n)

$$n> \log_{10}(2)10^{3945375}.$$

Big number. Still not going to get there. There aren’t monkeys in the world, or time enough.

Yet all this assumes we have to get all Shakespeare’s works will not only be reproduced, but in order. According to Wokepedia, our man wrote 39 plays and 154 sonnets and some other poems, all of which go into making up the 884,647 words.

Presumably somebody has the word count for each of these works. I don’t, so I’ll approximate, though it’ll be easy to see how to reproduce what comes next using the proper numbers. One source says sonnets run an average 150 words. Let that be. That gives 150 x 154 = 23,100 words for the sonnets, with 884,647 – 23,100 =
861,547 words left over for the plays. I know they’re not equal length, but let’s here suppose they are. That’s
22,091 words per play, on average.

We can use the same math as above to see that the chance of reproducing a (single) sonnet is

$$\left(\frac{1}{28829}\right)^{150} = 10^{-669}$$.

Do the same kind of thing for each play, and the probability of matching one in a try is

$$\left(\frac{1}{28829}\right)^{22091} = 10^{-98522}$$.

With me?

Since there are 150 sonnets, and we know the chance of getting one right in a try, and we don’t care about the order, the chance of getting all sonnets right, in any order, is

$$\frac{10^{-669\times 150}}{150!} \approx 10^{-100089},$$

where there are 150! ways to arrange the sonnets. (150! = 150 x 149 x … x 1, and I used Stirling’s approximation to get a value of about 10^{261}.)

We can do the same for the plays, which gives

$$\frac{10^{-98522\times 39}}{39!} \approx 10^{-3842312}.$$

And, of course, we must get both sonnets and plays, but there are two ways to arrange these, which gives at last

$$10^{-3842312}\times 10^{-100089} \approx 10^{-3942400}$$

Notice that this is larger than the original probability, ignoring the order; i.e.

$$ 10^{-3942400} > 10^{-3945375},$$

where there is a 10^{2975} times bigger chance. Itself quite a number, but in the scheme of things, it doesn’t help much.

The Real Chance

Every calculation so far hinged on two numbers, the word count of Shakespeare’s oeuvre and the size of his dictionary. Which is to say, the limited set of words he chose to use, or coin. That is the most important number of all, because if we were to ask not about monkeys, but of men, we would not use the same dictionary, but a bigger one.

The list of words of the World Dictionary which Shakespeare drew from is much larger than the 29,000 words he eventually picked. Shakespeare himself added to that World Dictionary with his coinages. Which argues that the size of the WD is, in a sense, infinite. Or at least very large, and a whole lot larger than 29,000.

The obvious conclusion is that if millions of monkeys working gazillions of years cannot produce Shakespeare, the man himself could not produce Shakespeare picking words “randomly”. Intelligence is required. Both to write and comprehend. “Random” molecules bumping into one another isn’t enough to produce this.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

13 Comments

  1. Brian (bulaoren)

    How many words could Joe Biden type, coherently?

  2. I appreciate that they had a sense of humor about it but there is no “Finite Monkeys Theorem”, there’s only an “Infinite Monkeys Theorem”. Maybe I’m taking it too seriously but they open their article (which came out in December) by stating that the “Infinite Monkeys Theorem” is misleading and so here’s a debunking of the “Finite Monkeys Theorem”. It just annoyed me because they wrote an article to set up and knock down a straw man, then used it to conclude that something only tangentially related was “misleading”.

    We all understand that it’s an infinite number of monkeys and an infinite span of time, so we don’t need smarmy statisticians pointing out that there aren’t enough monkeys and there isn’t enough time.

    That’s my curmudgeonly take on it.

  3. Brian (bulaoren)

    In an effort to narrow the gap; between Finite and Infinite, we might try compiling a list of fluent (public) gibberish speakers, though I’m not sure how that would substantiate the monkey/typewriter model.

  4. Reminds me a lot of stuff I used to do for fun – a great way to lose girlfriends, of course, but interesting. Hope you had fun with it too. &FYI: https://www.opensourceshakespeare.org/statistics/

    However.. in that same long ago I horrified an English lit prof (needed arts credits in two languages to graduate) by doing this on a “attributed to” thing by somebody or another to see whether it might have been by Shakespeare. No clear result – but I had been looking at crypto so I used special monkeys trained to hit some symbols more often then others. e.g. more e’s in English than H’s – more “the”s in Shakespeare than “sir”s. I haven’t done it, but I’d bet that properly trained monkeys would be much more likely to produce Hamlet than the H1B types you’re using..

  5. Hagfish Bagpipe

    Thats all well and good, Briggs, and pretty funny, too, but why not just torture the model until she says what you want her to say? You made a good start on that by having your monkeys type in words rather than characters. But monkeys of the better sort could easily type in whole paragraphs, chapters, verses, or scenes. Or instead of monkeys typing with their measly two arms use some of those Indian multi-armed dudes, with each arm windmilling away at hypersonic speed. Or how about millipedes, with a thousand tiny legs tapping tiny typewriters at warp speed. Get creative! That’s how The Science is done.

    If you do that then even a moron, or a scientist, can concoct a model that produces not only Shakespeare, but also Dan Brown, disco, Liberace, diet coke — whatever you like! Except God. Because the model only rearranges matter. Of course, that’s a feature not a bug and who needs God when you have Random? And if there’s any doubt Random’s in control just look about — random chaos everywhere! Not surprising since for every useful thing Random spits out there’s a stupendous mountain of useless gibberish.

  6. Cary Cotterman

    Insider info says a roomful of chimps write Kamala Harris’s public utterances.

    All the superhero movie scripts of the past thirty-five years could be cranked out by a dozen or so monkeys in a couple of weeks, if they had a good supply of bourbon.

  7. Johnno

    But a finite number of monkeys voting an infinite amount of times will eventually produce utopia, right?

  8. Jim Fedako

    Sorry, Briggs. I’ve watched enough new TV shows and movies, plus read enough social media posts, to know there are 200 words, tops, in modern English. With only ten adjectives and adverbs, most being derivatives of a singular word, starting in f and ending in ck. So I now give monkeys a fifty fifty chance of writing a modern version of Shakespeare.

  9. Uncle Mike

    What’s the probability of Ribulose-1,5-bisphosphate carboxylase/oxygenase, aka RuBisCo, arising by chance from the Primordial Goo?

    RuBisCo is the enzyme that catalyzes the conversion of carbon dioxide to glucose in photosynthesis, and is thus fundamentally essential to Life. It consists of ~5,000 amino acids linked in a specific order (if any one is missing or out of order the enzyme won’t function). There are ~20 naturally occurring amino acids although 12 or so are “common”.

    Thus the chance one functional RuBisCo enzyme formed accidentally in the Goo was 1/(12^5,000). But for multiple copies it required a strand of RNA to assign and link them. RNA codes for amino acids in triplets of the bases adenine, cytosine, guanine and uracil. That is, four bases, select 3, in order, is the permutation 4!/(4-3)! = 24. And 24×12 = 288.

    Thus, for the correct RuBisCo-coding RNA strand to form, by chance in the Goo, was 1/(288^5000) = a very small number, or a probability of almost zero.

    And RuBisCo is just one enzyme out of 10,000 or so absolutely required for Life to exist. Ergo and to wit, the chance of Life arising here or anywhere by accident is essentially zero.

  10. Johnno

    What if instead of real monkeys, scientists did the most sciency thing possible, where they opt to run an artificial simulation using AI monkeys with their presumptions built in, then deliver the fixed outcome to the associated press who’ll trumpet the stunning results?!

  11. Brian (bulaore )

    Hey Briggs
    Some monkies ain’t typing at all. What gives?
    While I’m munching on chicharrones and enjoying the afternoon, I Certainly and sincerely, hope you are well, and that you will soon return to your post.
    Best, Bulaoren
    Bulaoren

  12. Cookie

    Could the figure be reduced through incentives.

    A raisin drops down a shute when a word is produced?

    Or will this just produce the same word over and over again?

    You add possible combinations into the problem, and the likelihood of one combination over another and a reward attached?

    The likelihood of getting a monkey to still still at a keyboard in the first place?

    I hate hypotheticals not based in reality.

  13. Uncle Mike’s comment deserves some sort of award.
    Completely off topic: https://blog.bracha.org is an novel new blog platform written in https://newspeaklanguage.org that lets creator embed pieces of _code_ that can be evaluated by the end user (and changed and re-evaluated, a REPL if you will).
    He, Gilad Bracha, another world-famous mathematician/computer scientist, extends this even further in https://blog.bracha.org/primordialsoup.html?snapshot=AmpleforthViewer.vfuel&docName=Ozymandias, a computational notebook.
    Hope you like it (and that all of those links don’t get scrubbed).

Leave a Reply

Your email address will not be published. Required fields are marked *