Moving Away From The Mysticism Of “Random” Numbers

Moving Away From The Mysticism Of “Random” Numbers

ET Jaynes in his must-have Probability Theory: The Logic of Science said, “It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.”

This is profoundly true. And today I bring you an example. (One which I’ve been meaning to get to for a very long time, but events, etc.)

People use a thing called Markov Chain Monte Carlo, and other similar “random”-number generators, to estimate certain mathematical functions. In probability and statistics, these functions are probability distributions. The Gaussian, or “normal”, is a probability distribution. There are an endless number of distributions, and most cannot be computed analytically, or not with any ease.

That is, it’s easy to write down the formula for a “normal” distribution, and use a computer to approximate it analytically. It has to be an approximation because this distribution gives probabilities to every real number, of which there are an uncountable infinity of them, and all computers are finite. But we can get as close as we want to the real answer given time. And, really, for most applications getting “close enough” is close enough.

Problem is there are a host of distributions where the answer is hard to write into a computer. The distributions are unwieldy and mathematically complex. It turns out, though, that these complex distributions can be thought of as being built from several smaller, less complex, and easily computed distributions. The end result, we must keep in mind, is just a boring probability distribution expressing uncertainty in some (in science) observable.

It was discovered that if we acted “as if” we were “drawing” numbers from those simpler smaller distributions that make up the larger complex one, fair approximations to the complex distributions could be made. These are the MCMC and similar “random” number methods.

That story can be read in full here: The Gremlins Of MCMC: Or, Computer Simulations Are Not What You Think. Gist: “random” only means “unknown”, and in these methods we turn a blind eye to the known to pretend to get the unknown, so as to bless the results as if they were “random.” No, it doesn’t make sense.

The problem with these methods is not that they don’t work. They do, but inefficiently. The problem is they induce the false idea that Nature “picks” probability distributions, and makes “draws” from them. The problem is they create the false idea that probability exists.

This is such a strange thought to have. Especially for ad hoc probability models. The idea that there is a “true distribution”, that not only caused the observations in the past, but it poised to cause new ones in the future, if all is aligned just right, is weird.

Never mind all that; that’s what the above linked post is for. Instead, let’s turn to the cheerful news that Jaynes was right yet again. There is a way to eschew the ponderous “random” numbers approach, which in addition to all their philosophical difficulties, are like watching an NRO reader come to realize that that conservative cruise to meet Rich Lowry maybe wasn’t the best way to spend ten thousand dollars. “Random” methods are sloooooooooooooooooow. And resource hogs.

Enter the integrated nested Laplace approximation (INLA). A nonrandomized replacement to approximate those complex distributions.

Now I won’t show the math, since this isn’t a math blog. This article describes it all in nice detail. I will outline the idea, though, because it’s fun and because we have to beat the corpse of “random” methods to a thin a powder as we can to discourage people from the old mistakes.

Reach back and recall your old calculus days. Taylor series approximations to functions ring a bell? Probability distributions are just functions. Idea is you take any complex function and made something like a simple quadratic out of it, tossing out the “higher order” terms. A quadratic is like f(x) = a + bx + cx^2. Easy to compute.

A Laplace approximation is like that, but the general form of a Taylor series. It gets you to close enough scads faster, and orders of magnitude saner.

For a simple regression with a sample size of a 100,000, the INLA took less than a minute’s computation time, but a standard MCMC routine took 148 minutes, which is 14 minutes shy of 3 hours. Slick.

This INLA is new, and much active coding is still in progress. There is a nice package in R. (If you are on linux, this may help you.)

But think how much easier to explain to somebody what is going on. There is no mysticism whatsoever about “randomness.”

Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.

Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here; Or go to PayPal directly. For Zelle, use my email.

14 Comments

  1. Every single bit of this applies to quantum mechanics (particularly the never-to-be-sufficiently-cursed Copenhagen Interpretation). The major problem is that professors have been teaching that probabilities are real and reality is not for generations. The 2022 Nobel Prize was awarded to this exact thing. (Sorry to harp about it. It’s all related, and it burns me up to see so many “smart” people be so self delusional, and to then inflict their delusions upon others.)

  2. PhilH

    I suspect the entire QM field is quackery. There is probably an aether or charge field that explains all without dead cats.

  3. David Kindltot

    Once upon a time I got the advice that if I wanted a string of random numbers to work with, I needed to pull a page from the phone book and type in the last 4 digits of the phone numbers in strict order.

    This is rank pragmatism and probably would not pass the test for any serious application, but it always caught my fancy

  4. Your enemies are stepping up their game. In:

    “This is such a strange thought to have. Especially for ad hoc probability models. The idea that there is a “true distribution”, that not only caused the observations in the past, but it poised to cause new ones in the future, if all is aligned just right, is weird.”

    they seem to have replaced “described/describe” with “caused/cause”.

  5. JH

    The problem with these methods is not that they don’t work. They do, but inefficiently. The problem is they induce the false idea that Nature “picks” probability distributions, and makes “draws” from them. The problem is they create the false idea that probability exists.

    Have you surveyed statisticians about your perceived problems and what it means to say ‘there is a true distribution’ to them? Why would those be problems? I admit I don’t get it, just like I don’t get why it is a problem if I don’t believe in God. If you are arguing philosophically whether a distribution exists, I read somewhere that some philosophers see a distribution as a math object and conclude that it has the status of a right-angle triangle.

    MC methods estimate and the INLA approximates. Undoubtedly, there are studies about biases, how to approximate those pf’s, and for what models the approximations work well.

    Regardless, one still needs to use MC methods to explore the approximate predictive distribution further.

    So much to learn and so little time.

  6. Stewart Basketcase

    If gains in computational efficiency without loss of precision are a consistent feature of INLA then it is a marked improvement over MCMC. That said, the whole “frog and mouse battle” over the epistemological grounding of probabilities leaves me completely cold. Sure, probabilities don’t exist in the real world; but they are a very useful fiction. Perhaps if instead of “probability” we used a neutral word to denote the concept, say “jabberwocky,” it would become less worth of all this philosophical fighting?

    MCMC and INLA are very different in some respects, but in others they are fairly similar. At the end of the day, both are methods of numerical approximation of posterior jabberwocky distributions. It’s just that MCMC is trying to approximate the whole joint posterior jabberwocky distribution whereas INLA is approximating marginal posterior jabberwocky distributions of the parameters.

  7. JS

    MCMC methods have improved greatly over jags. Using STAN (via brms), a linear regression with 100K points fits in about 10 seconds.

  8. JohnK

    Staggering, jaw-dropping, to read some of the learned comments on this post. The question of whether ‘probability’ has causal effects — if not meaningless per se — is ‘philosophical’, trivial, pointless, or irrelevant?

    We’re all doomed.

  9. Briggs

    JohnK,

    To say the least.

  10. JH

    John K,

    I hope your jaw is doing fine. Who questions whether the probability has causal effects or whether a such question is trivial or whatever you said? I know that my imagination is not as stretchy as Briggs’ and the readers’ here. I cannot read between lines either.

    I do remember deepities like “Probability is not decision ” and “Risk is not cause.”

    Does probability have causal effects? What is ‘probability’ to you? If probability doesn’t exist, how can it have causal effects? Are you saying that probability allows us to examine the relationship between cause and effect? Or are you saying that causal effects cannot be cashed out in terms of probibility (distributions)?

    Of course, life is full of uncertainty, I always make the best of the information at hand to make decisions. (Though this is just another deepity, this sums up what academic statisticians have been trying to do. )

    lol.

  11. Jerry

    I’m simulating a digital communication channel with nonlinear distortion and white gaussian noise. I need to generate at least 100 rare events for each value of SNR. Sometimes they occur 1 in 100 million, and this might take 48 hrs. running on a fast computer with efficient code. Would these methods help?

  12. DWSWesVirginny

    People mention quantum mechanics, but what about statistical mechanics? But I would turn the reader’s attention to some interesting work by the late Mark Kac who showed concepts like statistical independence occur in many areas of pure, non-probabilistic–mathematics. I am referring to his magisterial Statistical Independence in Probability, Analysis and Number Theory. What does this signify as regards the “existence” of probability? I’d be interested in what readers have to say.

Leave a Reply

Your email address will not be published. Required fields are marked *