Nothing Is Distributed: So-Called Random Variables Do Not Follow Distributions

Wow is this wrong, but common, common.
Wow is this wrong, but common, common.

People say “random” variables “behave” in a certain way as if they have a life of their own. To behave is to act, to be caused, to react. This is reification, perhaps caused by the beauty of the mathematics where, literally, the equations undergo biogenesis. The behavior of these “random” creatures is expressed in language about “distributions.” We hear, “Many things are normally (gamma, Weibull, etc., etc.) distributed”, “Y is binomial”, “Height is normally distributed”, “Independent identically distributed random variables”.

I have seen someone write, “[Click here to] see a normal distribution being created by random chance!” Wolfram MathWorld writes, “A statistical distribution in which the variates occur with probabilities asymptotically matching their ‘true’ underlying statistical distribution is said to be random.” Examples abound.

All of this is wrong and indicates magical thinking. It is to assume murky, occult causes are at work, pushing variables this way and that so that they behave properly. To say about a proposition X that “X is normal” is to ascribe to X a hidden power to be “normal” (or “uniform” or whatever). It is to say that dark forces exist which cause X to be normal, that X somehow knows the values it can take and with what frequency.

This is false. We are only privileged to say things like this: “Give this-and-such set of premises, the probability X takes this value equals that”, where “that” is a deduced value implied by the premises. Probability is a matter of ascribable or quantifiable uncertainty, a logical relation between accepted premises and some specified proposition, and nothing more.

Let S = “Sally’s grade point average is x”. Suppose we have the premise G = “The grade point average will be some number in this set”, where the set is specified. Given our knowledge that people take only a finite number of classes and are graded on a numeric scale, this set will be some discrete collection of numbers from, say, 0 to 4; the number of members of this set will be some finite integer n. Call the numbers of this set g_1, g_2,…, g_n.

The probability of S given G does not exist. This is because x is not a number; it is a mere placeholder, an indication of where to put the number once we have one in mind. It is at this point the mistake is usually made of saying x has some “distribution”. Nearly all researchers say or assume “GPA is normal”; they will say “x is normally distributed.” Now if this is shorthand for “The uncertainty I have in the value of x is quantified by a normal distribution” the shorthand is sensible—but unwarranted. There are no premises which allow us to deduce this conclusion. This is pure subjective probability (and liable to be a rotten approximation).

When they say “x is normally distributed” they imply that x is itself “alive” in some way, that there are forces “out there” that make, i.e. cause, x to take values according to a normal distribution; that maybe even the central limit theorem lurks and causes the individual grades which comprise the GPA to take certain values.

This is all incoherent. Each and every grade Sally received was caused, almost surely by a myriad of things, probably too many for us to track. But suppose each grade was caused by one thing and the same thing. If we knew this cause, we would know the value of x; x would be deduced from our knowledge of the cause. And the same is true if each grade were caused by two known things; we could deduce x. But since each grade is almost surely the result of hundreds, maybe thousands—maybe more!—causes, we cannot deduce the GPA. The causes are unknown, but they are not random in any mystical sense, where randomness has causative powers.

What can we say in this case? Here is something we deduce: Pr(x = g_1 | G) = Pr(x = g_2 | G), where x = g_1 is shorthand for S = “Sally’s GPA is g_1” (don’t forget this!). This equation results from the so-called symmetry of individual constants, a logical principle. The probabilities are equivalent to G = “We have a device which can take any of n states, g_1, …, g_n, and which must take one state.” From the principle we deduce Pr(x = g_i | G) = 1/n.

“Briggs, you fool. That makes GPAs of 0 just as likely as 4. That isn’t possible.”

Is it not? I see you haven’t taught at a large state university. Anyway, the probabilities deduced are correct. What you are doing in your question is adding to G. You are saying to yourself something like “Pr(g_n | G & What I know about typical grades)” which I insist is not equal to Pr(g_n | G). Either way, x does not “have” a distribution.

Homework 1: discover instances of abuse. Homework 2: What’s wrong with the phrase “independent identically distributed random variables”? Hint: a lot.

25 Comments

  1. James

    I used to think that many of your posts were overly pedantic; finding nuances in language that everyone used as shorthand while really understanding the truth. I assumed that most people didn’t really reify these things and perform magical thinking. After lots of commenting at various science sites around the internet, pointing out what you call the epidemiologist fallacy (measuring Z and Y, then claiming that X affects Y due to some “possible” relationship, but never actually touching X with a ruler or a scale), I’m coming around to the belief that you aren’t being unnecessarily pedantic at all!

    I’m always astonished when someone says “well, X is too hard to measure, so we have to use a proxy to X.” When pressed for any evidence that the proxy is actually a proxy, and none can be given, I see retreats to “it’s common sense”. This is especially pronounced when commenting on climate studies. Apparently no one thinks that control groups to actually separate potential causes is needed. The magic of statistics with “controls”, “hierarchical models”, and other alchemical phrases will be able to perfectly separate cause and effect!

    I see your point in this post. Too much thinking of “random variables” as real, directed things will lead to thinking that the math that manipulates those things is equally as real. How else does one explain the reactions I see to what should be common sense (see the Feynman video) statements against overly certain statistical modeling?

    As for homework 2, the initial problem I see is how do we know if things that are random (uncertain!) are identical? That sounds pretty certain to me.

  2. Sheri

    James: So agreed! I spend a lot of time asking global warming advocates how the proxies were deduced and if there is any evidence whatsoever that the proxies and reality ever intersect. I usually get no response or a rude response. Same for their interpolation methods for weather stations that don’t exist.

    Homework 1: Not from today, but in the ancient past when students were graded for reasons other than increasing self-esteem, some teachers would grade on a “curve”, that generally being the normalized curve. In a class of very smart students, one could actually get 80% right on a test and still recieve an “F” because you were the lowest score. I think that may have been the origin of the practice of beating up the smart kids! Anyway, we don’t use it now, and it’s probably for the better.

    Homework 2: As noted, random is uncertain at best. Identical would apply only to number sets, I would think. A random distribution elsewhere would have too many underlying factors to be identical, especially if we are describing people and climate, two places this should never be used. It’s also unclear why one would even want “identical” random distributions. I thought random was to get varied distributions. Maybe to make sure these don’t occur?

  3. Gary

    Lately I’ve been getting a lot of ‘500 Internal Server’ error messages here. Do you know why? Better, can you fix it?

  4. Briggs

    Gary,

    Yahoo servers are having problems. Also effecting the spam filters. Unfortunately, not much I can do about it. I can try and write customer service again, but all I get is some untrained guy in India mailing me back the FAQ.

    James,

    I often feel that same way myself. But I don’t how else to get across the points I wish to make. The abuses in statistics are so many and so widespread that the only thing that will work is to start over.

    Sheri,

    I’ll answer your and others’ homework answers later.

  5. Working in communications systems, by their nature besought by certain physical processes which are typically well-modeled by certain distributions (“Gaussian” noise, “Rayleigh” fading, “Exponential” traffic, “Log-Normal” shadowing), I have committed many of the aforementioned sins. What must I do to repent?

    Yes I have uncertainty, but uncertainty has an uncanny way of taking certain shapes when one looks at large amounts of data.

    When I say height (or GRE scores or general intelligence) is “normally distributed”, I usually mean a) when you look at measurements, it looks pretty normal; and b) this is probably due to the fact that such features are the sum of a large number of random unknown variables, which tends to produce data that looks pretty normal. I do not intend to imply that some Great Gaussian Generator in the Sky is causing samples to be a certain way. Tho’ I wouldn’t rule that out from a proper theological perspective.

  6. Excellent job.

    I would describe this problem and its solution a bit differently by stating that the members of the set g_1, g_2,…, g_n are “the ways in which the outcome of an event can occur.” The principle of entropy maximization states that “the entropy of the ‘ways’ is maximized under contraints expressing the available information.” Given that the available information is nil, the entropy is maximized without constraints. This results in assignment of equal probability values to the various ‘ways’.( If the available information were perfect, the probability value of one member of the set would be 1 and the values of the other members would be 0.) Also, rather than stating that the probability values are “deduced” I would state that they are “induced.” Deduction properly describes situations in which the available information is complete thus not describing the situation in which the available information is nil.

  7. Briggs

    Nick,

    There is not, of course, any such thing as “Gaussian noise”; there are only signals, each caused by one thing or by many things. What we mean is that the causes of the signal before us have certain characteristic features. Of course the mathematics of probability may be used to represent our uncertainty in the signal. But it says nothing about what caused that signal. Randomness is never a cause.

    IQ as measured by test scores, like GPAs, cannot “be” normal. You can use a normal to approximate the uncertainty you have in any person’s IQ—if all you know is that distribution’s parameters (once you know the person’s test score, probability is no longer needed)—and this approximation will be more or less reasonable (and beware of probability leakage).

    IQ test scores are caused, almost surely by many things, most of which are unknown to us. It could be that IQ (or height or the signal, etc.) is caused by different things in different people. So IQ is not a “sum” of unknown “variables.” It is the result of various forces, which might not be the same in person to person, and which or might not be additive.

    Probability is a way to wave our hand and dismiss causes—and then to misascribe those causes downstream, as we see happen all the time.

  8. Indeed: A score is a score, neither random nor unknown. But looking at a large collection of scores gives us a way of thinking (with the usual caveats and quid pro quos) about the population that scored them. If I know that mean male Hmong height is 5’5″ with sigma 2″ (I do not BTW), then that tells me something about male Hmong. Especially about large numbers of them. Of course, any individual Hmong male I meet will have the height he has, which will have been caused by perfectly natural forces. But knowing whether he is near the middle or much higher or lower than the mean still tells me something about him. He might be “pretty tall”… for a Hmong. If I know that male Danes have an mean height of 6’1″, that might give me a clue where to look for good basketball players.

  9. Briggs

    Nick,

    Just so. But we didn’t need normal distributions for that. The danger there is over-certainty. All it would take is a simple look at a few Hmongs to discern their characteristic height. And you don’t need a normal to say Harry the Hmong is tall for a Hmong. As Goldberg says, most stereotypes are true, but the stated causes of those stereotypes are often in error. Why are Hmongs shorter on average than Danes? I have no idea. Diet? Genes?

    Of course if you were a Hmong clothing manufacturer, it would behoove you to go out and measure. (But you’ll find a discrete frequency count superior to a normal approximation.)

    Are there any good Danish basketball players?

  10. DAV

    500 Internal Server

    Seems to be when the site is busy. When I get it, there is often a new comment or post after I finally succeed. Perhaps the number of users at any given time is set too low for the traffic here or posting is blocking all other traffic.

    The site has been slow recently though better than WUWT. Often it’s waiting on twitter. Same with WUWT but there, something at google seems to be the initial bottleneck.

  11. Briggs

    DAV, All,

    I just deactivate the plugins “All in One WP Security” and “W3 Total Cache”.

    Let me know, if you can, if there is any improvement.

  12. Willis Eschenbach

    Matt, usually I can follow your logic, but this one, I don’t get. You say:

    All of this is wrong and indicates magical thinking. It is to assume murky, occult causes are at work, pushing variables this way and that so that they behave properly. To say about a proposition X that “X is normal” is to ascribe to X a hidden power to be “normal” (or “uniform” or whatever). It is to say that dark forces exist which cause X to be normal, that X somehow knows the values it can take and with what frequency.

    I don’t understand the distinction. When I say that the time between breakdown of radioactive particles is mathematically described to an arbitrary accuracy by a normal distribution, I am making a statement of fact. Are you saying that it doesn’t follow a normal distribution? And if so … why?

    The reality is that radioactive substances have a “half-life”. But this doesn’t posit “dark forces” that make radioactive substances decay in a certain way. It doesn’t mean they “know the values they can take”.

    It is merely a mathematical description of observations of the world. It says nothing other than that the math can be used to predict and understand the real world.

    As to whether something “follows” a normal distribution, you seem to be hung up on meaningless semantics. No one I know believes that dark forces make coin flips end up with a binomial distribution, or that coins are deciding to land heads or tails. Saying that coin flips follow a binomial distribution is merely a shorthand statement that coin flips can be described accurately by a binomial distribution. Surely you don’t expect the English language to reflect all the mathematical subtleties of the world, do you? We use such shorthand all the time.

    What am I missing here? What does your formulation add to the idea that natural events occur in what we call “distributions”, and that certain natural observations fit one distribution better than they fit all the rest?

    As always, my best to you,

    w.

  13. Briggs

    Willis,

    Of course nowhere do I claim that mathematics can’t be used to model or approximate reality.

    No, what happens—what the real danger is—is the misattribution of causes. That you don’t know folks who think randomness or chance is an operative force is in your favor and says you know how to pick friends. But I assure you it is common, especially in the so-called soft sciences.

    The problem occurs when somebody says something like “Height is normally distributed” and he expects heights to behave according to the precepts of that mathematical entity. Values of height which do not are at somehow fault, are “outliers”, or are forgotten; the data is blamed but not the model.

    What I owe you are better examples, which I could not fit into 750 words.

    As far as the radioactive breakdown, yes. I do say it does not follow a normal (or any) distribution. Something causes each breakdown. What we can and should say is that our uncertainty in the breakdown is represented by, or is quantified by, a normal (or whatever) distribution. This isn’t meaningless semantics. It keeps probability in its proper place, which is away from the language of causality.

    I’ll give an example of the abuse of causality tomorrow when I review the paper “Nonpolitical Images Evoke Neural Predictors of Political Ideology”.

  14. A Gaussian distribution is normal, or at least what tends to occur, when there are a large number of small independent perturbations all changing relatively fast compared to the measurement and where their value is additive in affects because the system is linear and for which the outcome is not constrained (within practical limits).

    In all the other cases the normal distribution is abnormal.

    From a climate perspective, the biggest issue is that it’s not simple noddy “Gaussian” variation but 1/f. This means random changes happen on timescales larger than the quantisation of measurement (I just made that up to sound impressive!). In other words, after you start measuring, the distribution changes. So, one starts with a relatively simple distribution, and then that gets bigger and bigger, the longer the period we are considering.

  15. Willis Eschenbach

    Briggs (Post author)
    13 NOVEMBER 2014 AT 1:04 PM

    As far as the radioactive breakdown, yes. I do say it does not follow a normal (or any) distribution. Something causes each breakdown. What we can and should say is that our uncertainty in the breakdown is represented by, or is quantified by, a normal (or whatever) distribution. This isn’t meaningless semantics. It keeps probability in its proper place, which is away from the language of causality.

    Huh? It seems you object to the word “follow”, as though it implied agency or causality. But if I say “A carbon atom follows a complex path through the carbon cycle”, very few people would take that to mean that the carbon atom is deciding where to go. Nor does it imply any kind of causality.

    Similarly, if I say that the measured time intervals between radioactive breakdown “follows” a normal distribution, I’m not saying that the radioactive atoms decide when to break down. “Follows” is just shorthand for saying that the observations are best described by a normal distribution. It says nothing at all about causality.

    Your objection strikes me like the objections to supermarket signs saying “10 items or less”. Yes, it’s incorrect English, but so what? It is perfectly understood by the hearer, which is the point of a language.

    So I’m still not clear what your point is here. How is saying “radioactive decay times are best described by a normal distribution” better than saying “radioactive decay times follows a normal distribution”? Would you object to saying “Longwave emissions follow Wien’s Law”?

    Still in mystery, my friend,

    w.

  16. DAV

    Willis,

    The distinction is in understanding that it’s the uncertainty of prediction that is following a particular distribution and not the values themselves following the distribution.

    In some ways, it’s akin to saying a ball on top of the hill wants to roll downhill. While (hopefully) there are few people who think a ball actually has wants, there seem to be many people who think the values themselves are following the distributions instead of the uncertainty in the predictions of the values. The distribution is an artifact of the model being used for prediction. Explicitly stating that it’s the uncertainty avoids the reification problem which (inevitably, it seems) leads to thinking the model is the reality.

  17. Willis,

    I have no idea what “Wien’s law is”. I would object to too much sincerity about saying “follows”. There is the general sense that everyone reading Briggs most likely gets. Then there is the specific sense that gets embraced by zealots. They fixate on the one meaning and forget the uncertainty. We can never forget the uncertainty. That uncertainty is why I stutter when getting into conversation about global warming, cooling, climate change, deviation, whatever. I look at the raw data and see uncertainty. I expect the great minds with the big sticks to see it also. Only it appears that they don’t. Briggs sees it. You see it. At least your posts make me feel the same understanding I feel from Briggs, Brignell, et al.

    With temperature I point to enthalpy and scream internally wondering why the charts aren’t in units of enthalpy instead of Anomalous temperature. I wonder how it is that this big beautiful world can be reduced to an anomalous temperature point with error bars smaller than a tenth of a degree. Muller assures me that they are on the right path. Ha. Anyone attempting to analyze a water cycle without enthalpy is a fool.

    But holding tight to specifics is also foolish. But Briggs, Eschenbach, Watts, Brignell, Milloy, and the rest make me think there is still sense in the world.

  18. JohnK

    First, I wonder if physicists and engineers are thinking about the Central Limit Theorem when they loosely say an observed phenomenon “follows” a normal distribution. Matt discusses this in an older post, which I will also link below.

    Second, the probability of seeing a particular radioactive decay time does NOT follow a normal distribution, for at least one, possibly two, reasons.

    a) I don’t know whether physicists believe that a negative, a negative infinity, or a positive infinity time for radioactive decay is possible, but that is what a normal distribution gives. If physicists are OK with negative, and infinite, radioactive decay times, then this issue is not pertinent.

    b) Because of this quirk of mathematics, when using the the normal to quantify probability, the probability of any number is zero in all problems [whose probability is quantified by the normal]. Unless physicists think that the probability of seeing ANY and ALL radioactive decay times is zero, this second objection is decisive. See the link for more elaboration.

    In my limited personal experience, physicists do not necessarily understand statistics any better than Medical Doctors. For example, they seem just as likely to not grasp the profound difference between the calculation of a parameter and the determination of any actual probability.

    Experimental replication, not advanced statistics acumen, appears to be the cause of greater confidence in the results of physics research. (Though I do remember the 1970s, when quarks, then still highly speculative experimentally, were seen to be regularly appearing, and then disappearing, from high energy labs).

  19. Rich

    If I were to say, “radioactive decay times really follow the exponential distribution not the normal” then I believe I would be making the mistake Briggs describes. And the correction would presumably be, “Our uncertanty in radioactive decay times is better described by the exponential distribution”. The first contains the implication that there is, out there, somehow the ‘real’ distribution and the second doesn’t. I think.

  20. Willis Eschenbach

    DAV
    13 NOVEMBER 2014 AT 2:41 PM

    Willis,

    The distinction is in understanding that it’s the uncertainty of prediction that is following a particular distribution and not the values themselves following the distribution.

    Huh? I don’t get that at all. There’s only one thing that we can measure. That is the values themselves. We measure those values, and we note that they are best described by some certain distribution. In shorthand, we say that they “follow” that distribution.

    However, we can’t measure the “uncertainty of prediction”. All we can measure are the values. So how can the “uncertainty of prediction follow a particular distribution”?

    w.

  21. DAV

    Willis,

    Think about it. The distribution is with respect to a line defined by the model being used which are (or should be) the most probable values given the model. If the model were perfect (that is, it perfectly predicts) then the observations would be always be on the line predicted by the model.

    The observations aren’t “following” any distribution. That they appear to do so is merely an artifact caused by using an imperfect model. The distribution is a description of the uncertainties of the model.

  22. Willis Eschenbach

    DAV
    14 NOVEMBER 2014 AT 1:32 AM

    Willis,

    Think about it. The distribution is with respect to a line defined by the model being used which are (or should be) the most probable values given the model.

    Thanks, DAV. Here’s the procedure.

    I stand there and wait for the radioactive decay. I measure the length of the interval since the last time there was a decay. I do that for a long time. I end up with a list of numbers.

    Then I make a histogram of these numbers. The SHAPE of that histogram is a physical representation of what we call the “distribution” of the data.

    Now, that distribution is inherent in the list of numbers that I wrote down. There is no “model”. There is no “line” around which this distribution is built. The distribution of data is the shape of the histogram, no model required.

    Now, we can squint at that shape and say mmm, looks like a Poisson distribution, or we can say that it looks like a normal distribution. Of course, in the real world it’s usually not a “pure” distribution. It may, for example, be bi-modal, with each of the modes having its own distribution. Now, each of those is a model which fits the observations to a greater or lesser degree, but rarely 100%.

    If the model were perfect (that is, it perfectly predicts) then the observations would be always be on the line predicted by the model.

    Not in my world, because the model is the distribution, not a line. In my world, if the model were perfect, the shape of the modeled distribution (normal, Weibull, gamma, whatever) would be identical to the shape of the observed distribution.

    The observations aren’t “following” any distribution. That they appear to do so is merely an artifact caused by using an imperfect model. The distribution is a description of the uncertainties of the model.

    I don’t see that. The model is something on the order of “normal” or “Poisson”. The distribution of the observations is independent of all models.

    However, I think perhaps I finally get your meaning. For you guys, saying “the observations follow an exponential distribution” somehow implies causation. To you, saying it that way means that the observations are somehow constrained to be a certain shape, say log-normal. The distribution somehow leads in that world, and the observations are somehow forced follow that lead.

    I get that, but it doesn’t mean that to me. Perhaps it’s because I never took a statistics class in my life, I don’t know. But for me, saying the observations “follow an exponential distribution” doesn’t imply causation. To me that’s just shorthand for saying that the shape of the histogram of the observations is best approximated by a normal distribution.

    Now, the phrase “the shape of the histogram of the observations is best approximated by a normal distribution” is clumsy as hell. Languages throw away expressions like that and express them in shorthand, like “follow”. Here’s the important part.

    Often, the shorthand IS NOT LOGICALLY CORRECT … and despite that it remains in common use, because we understand what it means.

    For example, we say “a hundred times smaller”. Everyone who speaks English knows what that means, a hundredth of the size. However, logically, it’s junk. A hundred times anything has to be bigger, not smaller. So why does the phrase endure, when phrases are evolutionary and die when they don’t work?

    Because everyone knows what it means, so the obvious lack of logic doesn’t matter.

    Now, is this the case with the statement “the results of the flipping of 1,000 coins follows a binomial distribution”? Do people understand it doesn’t mean the observations are forced to follow that distribution?

    I don’t know. I know (or at least I think I know) what that means. And I can see that some people might really think that there is some voodoo power that forces random variables to follow some given distribution. Not sure why that might be important, given that people have all kinds of strange ideas about chance and randomness. If that’s the biggest of their misconceptions about chance and randomness, I’d say they’re doing well.

    Anyhow, further comments and suggestions welcome. And Briggsie … as usual, you be styling, a most fascinating topic …

    w.

  23. DAV

    Willis,

    What is a model but Y=F(X) where Y and X could have one or more dimensions? I could argue that you have a model in mind, dT = F(Events), and you are simply trying to get its parameters by forcing Events=1. You may not think of it as a model however it has the same form as one but let’s not quibble. Do note, though, that the histogram is of a number of observations of the time for a single event.

    Assuming the observations weren’t for yuks and using what you’ve learned in this step, the simplest model would be assume what you’ve already seen is a general rule and thus predict the time to another event using the predicted time equal to the most common value observed. The histogram would give you the uncertainties in that prediction. When you select a distribution (Normal, Poisson, whatever) to express the histogram you are modeling your (and your model’s) uncertainty and not a the distribution of the time to a single event which can only be a single value.

    This is no different than what is involved in determining the parameters of any model.

  24. Briggs, you say “As far as the radioactive breakdown, yes. I do say it does not follow a normal (or any) distribution. Something causes each breakdown. ”
    What reason do you have for believing that last claim?

    With regard to the decay times being predicted to “follow” a distribution (actually not normal but exponential), what this is intended to mean is just that if we use a geiger-counter to observe a sample of a particular radioactive material at two times separated by one half-life then the number of counts per second in the second measurement will probably be about half of what it was in the first.

  25. Nullius in Verba

    “And I can see that some people might really think that there is some voodoo power that forces random variables to follow some given distribution. Not sure why that might be important, given that people have all kinds of strange ideas about chance and randomness.”

    There is a distinction but it’s not important – it’s one of those philosophy questions that give the subject a bad reputation with the layman – like solipsism.

    We have physical reality in which processes happen giving rise to observations that observers observe. We have a mathematical model in which processes obeying defined rules are modeled to happen, giving rise to modeled observations observed by modeled observers. If you set the rules up a certain way, what the modeled observers see looks pretty much exactly like what we physical observers see in reality. We don’t know how reality really works behind the scenes – all we have access to is our observations of it – but it is a long-established and so-far reliable observation that the models do give the same results, and can be used to make predictions which are really all we are interested in. Absolute Truth is all very nice, but it doesn’t put dinner on the table.

    Since real observers have no access to the Absolute Truth of the reality behind the observations, it’s not a particularly useful concept for practical purposes. For this reason, our brains are model-driven: we treat the models we construct temporarily as reality. We can also switch smoothly between many different models, sometimes without even realising it, choosing the best one for our current purposes. The entirety of what we call ‘Reality’ is built of such models, without them all we experience consists of meaningless coloured lights and sounds.

    The mathematics of probability is built around one such model. We model processes following certain rules and observers observing them, as described above. We try to work out how much the modeled observers can deduce about the processes – what rules they follow. This deduction is never perfect, and the observer’s knowledge is subjective, limited by what they happen to have observed. We try to come up with the most efficient and accurate methods possible for them in the modeled universe. And then we apply those same techniques back here in reality to our real-world observations. This leads to subjective, observation-limited models that we imagine/hope reflect what’s going on behind the scenes in reality. But we can never know.

    So, in the modeled universe we have random variables following the axioms of probability that – in the modeled world – have ‘true’ parameter values and probabilities. Also in the modeled universe, we have the subjective deductions the observers can make about those true probabilities and parameters, which are the subjective probabilities of our beliefs. In decision theory this concept is called ‘Bayesian Belief’ to distinguish it from probability, see for example the work done in AI on ‘Bayesian Belief Networks (BBNs). Although Bayesian Belief obeys the same axioms as true probabilities, and can be calculated with and manipulated using the same calculational methods, it’s a distinct concept. The ‘true probabilities’ (in the modeled universe) describe the process generating the observations, the ‘subjective probabilities’ describe the deductions the modeled observers make about them. They are a measure of the observer’s uncertainty.

    And of course, back in reality all we have access to is the ‘subjective probabilities’ that measure our real uncertainty. We don’t even know for certain whether there are any actual ‘true probabilities’ in reality. It’s for sure that none of those numbers we label as ‘probability’ are ‘true probabilities’ in the above sense when we’re talking about real events.

    Brigg’s has a bit of a bee in his bonnet about this – insisting that only subjective probabilities really exist and all talk of absolute probability is a misleading reification of unrealistic or physically impossible models. We are all watching the movie projected on the wall of Plato’s cave, trying to suspend belief and follow the story, but Briggs is the guy in the back of the movie theatre shouting: ‘It’s a trick! A trick, I say! It’s no more than changing patterns of light reflected off the wall!’ Well, yes. But so what?

    For the purposes of predicting whether Humphrey Bogart gets the girl, we had might as well assume it’s real – that he’s up there in front of us. It makes no practical difference whether our model of the world truly reflects the underlying mechanisms that are causing our observations – without the models, we are left alone in a solipsist hell of meaningless lights and sounds. We build our own reality inside our heads.

    Nevertheless, we believe there must be something out there, to project those lights and shadows, and that it follows knowable rules. How else could the models work?

    The only point to all this philosophical meandering is to remind us that there may be more than one model to explain the observations, and that different models have differing degrees of fidelity – a lot of models are simplified to varying degrees, to make calculation easy, and none of them are perfectly accurate. What we can deduce about reality is always approximate. But one can say that without being controversial, or so contemptuous. Not that I’m complaining, mind – it makes the internet more interesting.

    Life’s but a walking shadow, a poor player
    That struts and frets his hour upon the stage
    And then is heard no more. It is a tale
    Told by an idiot, full of sound and fury
    Signifying nothing.

Leave a Reply

Your email address will not be published. Required fields are marked *