There is nothing magical or mystical about simulations, and “randomness” has nothing to do with any of them. Let’s remove the unrealistic thinking from simulations by understanding them for what they really are.
Uncertainty & Probability Theory: The Logic of Science
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: You must look up and discover a Research Shows paper and see if it conforms to the conditions given below. Have you done it yet?
Lecture
Besides this excerpt from Chapter 6 of Uncertainty, you must read these posts: here and here.
These words from Jaynes are right: “It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.”
We often hear of “simulating random normal” or “creating stochastic” or “synthetic” variables, or perhaps “drawing” from a normal or some other distribution. Such things form the backbone of many statistical methods, including bootstrapping, Gibbs sampling, Markov Chain Monte Carlo (MCMC), and several others. As with every other mistake about randomness, these methods are wrong in the sense that they encourage loose and even magical thinking about causality, and they are an inefficient use of time. If assiduously applied, reasonably accurate answers from these algorithms can be had, but they don’t mean what people think and more efficient procedures are available, as Jaynes said.
The way simulations are said to work is that “random” or “stochastic” numbers are input into an algorithm and out pops answers to some mathematical question which is not analytic, which, that is, cannot be solved by pencil and paper (or could, but at too great a difficulty). Let’s work with an example. One popular way of “generating normals” is to use what’s called a Box-Muller transformation. Any algorithm which needs “normals” can use this procedure. It starts by “generating” two “random independent uniform” numbers $U_1$ and $U_2$ and then calculating this creature:
$$
Z = \sqrt{-2 \ln U_1} \cos(2 \pi U_2),
$$
where $Z$ is now said to be “standard normally distributed.” We don’t need to worry about the math, except to notice that it is written as a causal, or rather determinative, proposition: “If $U_1$ is this and $U_2$ is that, $Z$ is this with certainty.” No uncertainty enters here; $U_1$ and $U_2$ determine $Z$.
As above, random or stochastic means unknown, and nothing more. Yet there is the unfortunate tendency to assume that “randomness” somehow blesses simulations. But since randomness means unknowingness, how can unknowingness influence anything if it isn’t an ontological cause? It can’t. It is felt that if the data being input to simulation algorithms aren’t “random”, or simulated to look “as if” they were random, then the results aren’t legitimate. This is false. Since randomness is not a cause, we cannot “generate” “random” numbers in any sense. We can, of course, make up numbers which are unknown to some people. Example: I’m thinking of a number between 32 and 1400: to you, the number is random, but to me it is generated, i.e. caused, by my feverish brain. (The number, incidentally, is 32.32.)
Since probability is a measure of information, computers cannot generate random numbers (nothing can). What happens, in the context of our math above, is that programmers have created algorithms which will {\it cause} numbers in the interval $(0,1)$ (notice this does not include the end points); not in a regimented way so that first see 0.01, then 0.02, etc., but caused with reference to some complex formula. These formulas which, if run long enough, will produce all the numbers between $(0,1)$ at the resolution of the computer (some will be repetitions): infinite resolution is not possible.
Suppose this resolution is 0.01; that is, our resolution is to the nearest hundredth. Then all the numbers 0.01, 0.02, …, 0.99 will eventually show up (again, many will be repeated; of course, we assume the programmer hasn’t left a whole in this sequence). Because the numbers do not show up in sequence, many fool themselves into thinking the numbers are “random”, and others, wanting to hold to the odd mysticism but understanding the math, call the numbers “pseudo random”, an oxymoron.
If we want to use the Box-Muller algorithm, we can sidestep this self-induced (and unnecessary) complexity and simply write down all the numbers in the sequence, i.e. all the pairs in $(0,1)^2$ (since we need $U_1$ and $U_2$) at whatever resolution we have; with out resolution, this is $(0.01, 0.01), (0.01, 0.02), …, (0.99, 0.99)$ (this is a sequence of pairs of numbers, of length 9801). We then apply the determinative mapping of $(U_1, U_2)$ to $Z$ as given above, which produces (3.028866, 3.010924, …, 1.414971e-01). What it looks like is shown in Figures 1 and 2.
Figure 1: The 9,801 values of $Z$ for each pair of ($U_1,U_2$) starting from (0.01, 0.01) and progressing to (0.99, 0.99).
Figure 2: The “true” value of the cumulative standard normal distribution in a dashed line, and the value of the ECDF for $Z$ given by the mechanical approach.
Figure 1 shows the mappings of the pairs $(U_1, U_2)$ to $Z$, along the index of the number pairs. If you’ve understood the math above, the oscillation, size, and sign changes are obvious. In not, spend a few moments with this and it will become clear. Figure 2 shows the empirical cumulative distribution (ECDF) of the mapped $Z$ (solid), overlayed by the (approximate) analytic standard normal distribution dashed), i.e. the true distribution to high precision (as given by the R function pnorm which itself relies on an analytical approximation). It is difficult to see the deviation in the two plots.
There is tight overlap between the analytical approximation and the mechanical mapping, except for a slight bump or step in the ECDF at 0, owing to the crude discretization of $(U_1, U_2)$. Computers can do better than the nearest hundredth, of course. Still, the error even at this rough resolution is small. I won’t show it, but even a resolution 5 times cruder (nearest 0.05; number sequence length of 361) is more than good enough for most applications (a resolution of 0.1 is pushing it, but the reader should try it). This picture gives a straightforward, calculate-this-function, pen-and-paper-like analysis, with no strangeness about randomness—and it works.
What use is $Z$? Suppose we were after, say, “What is the probability that $Z$ is less than -1?” All we have to do is ask: calculation from the ECDF involves counting the number of “generated” $Z$ as less than -1 divided by (in this case) 9801, the length of the pairs of $U$. Simple as that. There are no epistemological difficulties with the interpretation, it comes right from the rule that all probability is conditional on the information supplied.
The analytic approximation to the probability $Z<-1$ is 0.159 (this is our comparator; calculated from R’s standard approximation). With the resolution of 0.01, the direct method shows 0.160, which is close enough for most practical applications. A resolution of 0.05 gives 0.166, and 0.1 gives 0.172. I’m ignoring that we could have shifted $U_1$ or $U_2$ to different start points; I’m not attempting to provide optimal algorithms, only to show that the traditional “random” interpretation is wrong.
None of these answers have plus or minuses, though. With the 0.01 resolution, all we have is the answer with no idea of the size of the approximation error. In essence, there is no error, because the probability we have is conditional on the evidence we accepted. However, it is the evidence itself which is questioned. If this evidence is not logically equivalent to the function under consideration, then an approximation exists. The approximation is the accepted premises (resolution, the mapping, and so on) to the function of interest. Given our setup (starting points of 0.01 for $U_1$ and $U_2$, and the mapping function above), these are the answers. There is no probability attached to them, because none need be: we are certain of these answers given these premises. But we would like to have some idea of the error of the approximation. We’re cheating here, in a way, because we know the right answer (to high degree), which in actual problems we won’t. In order to get some notion how far off that 0.160 is we’d have to do more pen-and-paper work, engaging in what might be a fair amount of numerical analysis. Of course, for many standard problems, just like in MCMC approaches, this could be worked out in advance.
Contrast the determinative, fixed-pair method to the standard mystical or “simulation” approach. For the latter, we have to specify something like a resolution, which is the number of times we must “simulate” “normals”, which we then collect and form the estimate of the probability of less than -1. This is done by counting like the fixed-pair method. To make it fair, pick 9801, which is the length of the 0.01-resolution series.
I ran this “simulation” once (using R’s runif function and the Box-Muller transform) and got 0.162; a second time 0.164; a third showed 0.152. There’s a new problem: each run of the “simulation” gives different answers. Which is the right one? They all are; a non-satisfying but true answer: they are all local or conditional truths. So what will happen if the “simulation” itself is iterated, say 5000 times, where each time we “simulate” 9801 “normals” and each time estimate the probability $Z<-1$, keeping track of all 9801 estimates? That kind of thing is the usual procedure. Turns out 90\% of the results are between 0.153 and 0.165, with a median and mean of 0.159, which equals the right answer (to the thousandth). It’s then said there’s a 90\% chance the answer we’re after is between 0.153 and 0.165.
This or similarly constructed intervals are used as error bounds, which are “simulated” here, but could and should be calculated mechanically, as in the mapping approach. Notice that the uncertainty in the mystical approach feels greater, because the whole process is opaque and purposely vague. The numbers seem like they’re coming from nowhere. The uncertainty is couched probabilistically, which is distracting.
It took 19 million calculations to get us the simulation answer above, incidentally, rather than the 9801 calculations (more or less) from the mechanical-causative approach. But if we increase the resolution to 0.005 in that approach, the answer is 0.159 at a cost of just under 40,000 calculations. Of course, MCMC fans will discover shortcuts and other optimizations to implement in their procedure; the 19 million may be substantially reducible. The number of calculations is a distraction here, anyway. Because we want to understand why the “simulation” approach works. It does (at some expense) give reasonable answers, as is well known. If we remove the mysticism about randomness and all that, we get Figure 3.
Figure 3: The upper-left plot shows the realized values of $U_1$ from R’s runif function ($U_2$ looks the same), and the upper-right shows the ECDF of the calculated values of $Z$ (solid) and the true standard normal (dashed). The bottom-left shows the sequence of $U_1$ ($U_2$ is the same), and bottom-right shows the ECDF of the calculated and true values of $Z$. Both the simulation and mechanical approach give nearly identical answers to reasonable accuracy.
The upper two plots are the results of the “simulation”, while the bottom two are the mechanical-causal mapping. The bottom two show the empirical cumulative distribution of $U_1$ ($U_2$ is identical) and the subsequent ECDF of the mapped normal distribution, as before. The bump at 0 is there, but is small.
The top left ECDF shows all the “uniforms” spit out by R’s runif() function. The only real difference between this and the ECDF of the mechanical approach is that the “simulation” is at a finer resolution (the first $U$ happened to be 0.01031144, 6 orders of magnitude finer than the mechanical method’s purposely crude 0.01; but the $U$s here are not truly plain-English uniform as they are in the mechanical approach). The subsequent ECDF of $Z$ is also finer. The red lines are the approximate truth, as before.
Here’s what’s revealed by these pictures, the big secret: the “simulation” just is the mechanical approach done more often! After all, the same Box-Muller equation is used to map the “uniforms” to the “normals”. That’s the secret to the success of simulations: the two approaches, causal and simulation, are equivalent!
Which is now no surprise: of course they should be equivalent philosophically. We could have taken the (sorted) $U$s from the “simulation” as if they were the mechanical grid $(U_1, U_2)$ we created and applied the mapping, or we could have pretended the $U$s from the “simulation” were “random” and then applied the mapping. Either way, same answer, which they had to be because there is nothing mysterious in “random” numbers.
The only difference (and advantage) seems to be in the built-in error guess from the “simulation”, with its consequent fuzzy interpretation. But we could have a guess of error from the mechanical algorithm, too, either by numerical analysis means as mentioned, or even by computer approximation. One way is this: estimate quantities using a coarse, then fine, then finest grid and measure the rate of change of the estimates; with a little analysis thrown in, this makes a fine solution of the error. Much work can be done here.
The benefit of the mechanical approach is the demystification of the process. It focuses the mind on the math and reminds us that probability is nothing but a numerical measure of uncertainty, not a live thing which imbues “variables” with life and which by some sorcery gives meaning and authority to results.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.