**Last Updated** 3 December 2012, 7:24 AM EST.

**What’s the difference between machine learning, deep learning, big data, statistics, decision & risk analysis, probability, fuzzy logic, and all the rest?**

- None, except for terminology, specific goals, and culture. They are all branches of probability, which is to say the understanding and sometime quantification of uncertainty. Probability itself is an extension of logic.

**So what’s the difference between probability and logic?**

- Not much, except probability deals with uncertainty and logic with certainty. Machine learning, statistics, and all the rest are matters of uncertainty. A statement of logic is a list of premises and a conclusion, and either the conclusion follows validly and the conclusion is true, else it is false. A statement of probability is also a list of premises and a conclusion, though usually the conclusion does not follow with certainty.

**In mathematics there are many “logic” theories that have more than one truth value, and not just one universal “logic.” What’s up with that?**

- The study of “logics” is just one more branch of math. Plus, these special many-valued “truth” logics are all evaluated with the standard, Aristotelian two-value logic, sometimes called “meta-logic”, where there is only truth and falsity, right and wrong, yes and no. There is only one logic at base.

**Is probability a branch of philosophy, specifically epistemology?**

- Of course probability is part of epistemology, as evidenced by the enormous number of books and papers written by philosophers on the subject, and over the period of centuries, most or all of which remain hidden from mathematical practitioners. See
*inter alia*Howson & Urbach, or Adams, or Swinburne, Carnap, Hempel, Stove, that guy who just wrote a book on objective Bayes whose name escapes me, and on and on for a long stretch. Look to this space for a bibliography.Probability can also be pure mathematical manipulation: theorems, proofs, lemmas, papers, tenure, grants. Equations galore! But the very instant you apply that math to propositions (e.g. “More get better with drug A”) you have entered the realm of philosophy, from which there is no escape. Same applies for

*applied math*: it’s pure mathematics*until*it’s applied to any external proposition (“How much weight will this bridge hold?”).

**Isn’t fuzzy logic different than probability?**

- No. It sometimes has, like mathematics, many-valued “truths” (but so can probability models), but the theory itself is also evaluated with standard logic like probability. Fuzzy logic in practical applications makes statements of uncertainty or of things which are not certain, and that makes it probability. Fuzzy logic is one of the many rediscoveries of probability, but the best in the sense of possessing a cuddly slogan. Doesn’t
*fuzzy*logic sound cute? Meow.

**What is a model?**

- A list of premises said to support some conclusion. Premises are usually propositions like “I observed x
_{1}= 12″ or “My uncertainty in the outcome is quantified by this probability distribution”, but they can be as simple as “I have a six-sided object, just one side of which is labeled 6, which when tossed will show only one side.” The conclusions (like premises) are always up to us to plug in: the conclusion arises from our desires and wants. Thus I might choose, with that last proposition in mind, “A 6 shows.” We now have a complete probability model, from which we can*deduce*the conclusion has probability 1/6. Working probability models, such as those described below, are brocaded with more and fancier premises and complex conclusions, but the philosophy is identical.Physical models, that is, models of physical systems, are squarely within this definition. There is nothing in the framework of a model which insists outcomes must be uncertain, so even so simple a (deterministic) equation

*y = a + b*x*(where a and b are known with certainty) is a model. If the parameters (a and b) are not known with certainty, the model switches from deterministic to probabilistic.

**Surely exploratory data analysis (EDA) isn’t a model?**

- Yes it is, and don’t call me Shirley. Once a picture, plot, figure, table, or summary is printed and then it is acted on it in the sense of explaining the uncertainty of some proposition, you have a premises (the pictures, assumptions) probative toward some conclusion. The model is not a formal mathematical one, but a model it still is.

**What is reification?**

- This is when ugliness of reality is eschewed in favor of a beautiful model. The model, created by great credentialed brains, is a jewel, an object of adoration so lovely that flaws noted by outsiders are seen as gratuitous insults. The model is such an intellectual achievement that reality, which comes free, is felt an intrusion; the third wheel in the torrid love affair between modeler and model. See, e.g.,
*climate models*,*econometrics*.

**What’s the difference between probability and decision analysis?**

- A bet, which if made on an uncertain outcome, becomes a decision. The probability, given standard evidence of throwing a 6 with a die is 1/6, but if you bet a six will show you have made a decision. The amount of money wagered depends on a host of factors, such as your total fortune, the level of your sanity, whether it is your money or a taxpayer’s, and so forth. Decision analysis is thus the marriage of psychology with probability.
Probability models (in all their varied forms) sometimes become decisions when instead of telling us the uncertainty of some outcome, the model insists (based on non-deducible evidence) that the outcome will

*be*some thing or that it will take a specific value or state. See*machine learning*below.

**Is all probability quantifiable?**

- We had a saying in the Air Force which began, “Not only no…” This answer applies here with full force. The mad rush to quantify that which is unquantifiable is the primary cause of the fell plague of over-certainty which inflicts mankind.
Example? Premise: “Some X are F & Y is X”. Conclusion: “Y is F”. Only an academic could quantify that conclusion with respect to that (and no other) premise.

**What is a statistical model?**

- Same as a regular model, but with the goal of telling us not about the conclusion or outcome, but about the premises. In a statistical model, some premises will say something like, “I quantify the uncertainty in the outcome with this distribution, which itself has parameters a, b, c, …” The conclusion(s) ignore the outcome
*per se*and say things instead like, “The parameter a will take these values…” This is well and good when done in a Bayesian fashion (see*Bayesian*and*frequentism*below), but becomes a spectacular failure when the user forgets he was talking about the parameters and assumes the results speak of the actual outcome.This all-too-common blunder is the second great cause of over-certainty. It occurs nearly always when using

*statistical models*, but only rarely when using*machine learning*or*deep learning*models, whose practitioners usually have the outcomes fixed firmly in mind.

**What is a neural network?**

- In statistics they are called non-linear regressions. These are models which take inputs or “x” values, have multitudinous parameters associated with these x values, all provided as functions of the uncertainty of some outcome or “y” values. Just like any other statistical model. But neural nets sound slick and mysterious. One doesn’t “fit” the parameters of a neural network, as one does in a non-linear regression, one lets the network “learn”, a process which when contemplated puts one in mind of Skynet.

**What is machine learning?**

- Statistical modeling, albeit with some “hard code” written into the models more blatantly. A hard code is a rule such as “If x
_{17}< 32 then y = ‘artichoke’.” Notice there is no uncertainty in that rule: it’s strictly if-then. These hard codes are married to typical uncertainty apparatuses, with the exception that the goal is to make direct statements about the outcome. Machine learning is therefore modeling with uncertainty with a direct view to making decisions.This is the right approach for many applications, except when the tolerance for uncertainty of the user does not match that of the modeler

**What is big data?**

- Whatever the labeler wants it to be; data that is not small; a faddish buzz word; a recognition that it’s difficult to store and access massive databases; a false (but with occasional, and temporary, bright truths) hope that if characteristics down to the microsecond are known and stored we can predict everything about that most unpredictable species, human beings. See this
*Guardian*article. See also*false hope*(itself contained in the*hubris*entry in any encyclopedia).Big data is a legitimate computer science topic, where timely access to tidbits buried under mountains of facts is a major concern. It is also of interest to programmers who must take and use these data in the models spoken of above, all in finite time. But more data rather than less does not imply a new or different philosophy of modeling or uncertainty.

**What is data mining?**

- Another name for modeling, but with attempts at automating the modeling process, such that fooling yourself happens faster and with more reliability than when it was done by hand. Data mining can be useful however as the first step in a machine learning process, because if the user has
*big data*going through by hand is not possible.

**What is “deep learning”?**

- The opposite of shallow learning? It is nothing more than the fitting or estimating of the parameters of complex models, which are (to repeat) long lists of human-chosen premises married to human-chosen conclusions. It is also a brilliant marketing term, one of many which flow from the fervid, and very practically minded, field of computer science.
The models are usually a mixture of

*neural networks*and hard codes, and the concentration is on the outcomes, so these practices are sound in nature. The dangers are when practitioners either engage in*reification*(man is a loving creature) or when they start believing their own press, as in “If the*New York Times*thinks I’m a genius, I must be.”The latter is all to apt to happen (and has, many times in the past) because it is to be noted that “deep learning” applications are also simple in the sense that (e.g.) when a human being mouths the sounds

*flee*it’s a 50-50 bet the model predicts*free*(perhaps, too, the locutor is saying*free*with an accent; as in what do you call a Japanese lady with one leg shorter than the other?*Irene.*). Accomplishments in this field are thus over celebrated. In contrast, no model, “deep learning” or otherwise, is going to predict skillfully where and when the next wars will occur for the next fifty years. See*artificial intelligence*, or*AI*.

**What is artificial intelligence?**

- Another name for probability models (but with much hard coding and few statements on uncertainty). Also, See
*neural nets*or entries under*New and Improved!*.

**What is Bayesianism?**

- Another name for probability theory, with hat tip to the God-fearing Reverend Thomas Bayes who earned naming rights with his Eighteenth century mathematical work.

**What is frequentism?**

- A walking-dead philosophy of probability which, via self-inflicted wounds, handed in its dinner pail about eighty years ago; but the inertia of its followers, who have not yet all fallen, ensures it will be with us for a short while longer.

**What is a p-value?**

- Something best discussed with your urologist? Also an unfortunate frequentist measure, which contributes mightily to the great blunder mentioned above.

**Where can I learn more about these fascinating subjects?**

- Glad you asked: click here for the world’s wonders to be described (scroll down to statistics).

*This is only a draft, folks, with the intention of being a permanently linked resource. I’m bound to have forgotten much. There is even the distinct possibility of a typo. Remind me of my blindnesses below. Still to come: within-page HTML anchors. *

2 December 2012 at 1:01 pm

Predictive Analytics?

Frequentism: study of the long-run performance of short run calcuations.

P-value: originally: the standardized rank of your outcome from a randomized experiment, calculated by considering all possible outcomes of the randomization.

2 December 2012 at 1:03 pm

I’m missing something explicit about exploratory data analysis (as opposed to modelling) in this context. The statistics of describing or summarising data rather than “explaining” data or using them to make predictions or decisions.

2 December 2012 at 2:16 pm

As near as I can tell, the only real fundamental difference between Bayesianism and Frequentism is their starting assumptions. Neither set of those assumptions can be compared to what actually is the case in reality. It is rather like: you make your best guess, place your bet, and turn the wheel. In the end, reality makes the decision. You are left with only a hope that your guesses will pay off more than they lose. Not a very good way to build a bridge that stands or an engine that works.

Unfortunately the assumption that burns either approach is the general notion that you cannot know anything for certain but that, at the same time, you can know your uncertainty for certain. Hmmmm…. Uncertainty plus certainty equals 1 every time but their ratio is any where between zero and infinity. Place your bets and spin the wheel….

One might say that either approach to estimating the probability of an event will be useful under some circumstances. However, since you cannot be certain of anything, you must compute the probability of them being useful in a given situation. Ah…the ugly head of infinite recursion rises above the dark waters of an impenetrable verbal miasma. The calculation fades to the vanishing point or a stack overflow – whichever comes first. There is no certainty to be found in uncertainty.

One observes, that out of a stone age tribal past, the industrial revolution occurred and produced a marvelous technological civilization. Did it happen just by accident or was it developed and maintained based upon many someones actually knowing a lot of somethings and being capable of using that knowledge? The buildings stand, the rockets reach outer space, the Internet works, and we can all watch stupid reruns on TV until our eyes bleed. None of which was possible for a stone age tribal member who lived in a mud hut or an opportunistic cave. Does this not give evidence that knowledge is possible to man and that statistics (the quantification of uncertainty) is most useful, or at least much used, for its ability provide a living for academic dilettantes who teach and write papers about it?

2 December 2012 at 2:22 pm

Excellent post! And much needed! Thanks!

2 December 2012 at 2:30 pm

What is reification?

From a comment you made about a year ago I thought you were going to write an articleon the subject. I’m eagerly waiting.

2 December 2012 at 3:57 pm

Graphical presentations of data in statistics are not probabilistic. Optimization, which is at the heart of machine learning, involves more than just probability. It’s too simplistic to say they are all branches of probability, imo.

2 December 2012 at 4:23 pm

Wow. This is a really impressive start, Matt; impressive in both depth and breadth.

A meta-level that your presentation makes both more possible and more explicit is the distinction between humans, as it were, being used by models (or using such on others), and human moral responsibility for our models, analysis, and the like.

Which is to say, these forms of analysis are NOT some a-human mechanism, to which we automatically must bow once we, or more likely, some expert, hits ‘Enter’. Statistics, even to begin, inherently demands that WE make some choices. They may only be choices about numerical values for the parameters of a density function – but they’re still choices.

And real choices always have a moral component; that is, a real choice is something that has human responsibility attached to it.

The intellectual quest to firmly identify ALL the choices that we in fact make as we engage in inquiry is also part of the quest to find our human responsibility for both our inquiry and our conclusions.

2 December 2012 at 5:36 pm

Your description of deep learning isn’t correct. The term typically refers to stacking models on top of each other. It’s typically done with neural networks for two reasons: it’s easy to feed the output one into the inputs of another, and there are techniques to backpropagate model fitting from higher models to the lower ones. However, it can also refer to stacking other sorts of modelling techniques.

Also, machine learning these days is more just a focus on optimization methods rather than “hard codes.” Rule based systems that embody hard codes have fallen out of heavy use, though they are still around. Suffice to say, machine learning is no longer heavily focused on hard codes.

2 December 2012 at 5:49 pm

Alex,

Stacking models is still just modeling, of course. The resultant is identical. Machine learning in its optimizing sense also has the same goal, which is the quantification of uncertainty and the making of decisions. I take your word on the mechanics, of course. The philosophy, however, remains unchanged.

JH,

What you say is true, just as it is true that, say, writing code in R or SAS isn’t probability. It’s only when you seek to interpret the graphs or output when it becomes probability.

Cees de Valk,

See the comment for JH. But this is a good one that I’ll add to the list above. EDA is just a less formal way of understanding uncertainty. In some cases, it’s even better than formal methods, because EDA often leaves things unquantified.

2 December 2012 at 6:43 pm

Your reply reminds me of how some mathematicians and probabilists used to believe that statistics was a branch of mathematics and that they could teach statistics also. Well, after all, one may argue that probability theory is a branch of mathematics and statistics employs probability. Those mathematicians and probabilists could gracefully plug and chug into formulas when teaching intro a statistics class, and tidily write all the probability equations on the blackboard in mathematical statistics courses. Yet, they couldn’t teach how to postulate a model, analyze the data and interpret the results!

2 December 2012 at 7:14 pm

JH,

Amen, sister; preach it! Statistics, which is to say

probability, isnota branch of mathematics, though much pretty math can be had by it. Probability properly is a branch of philosophy, of epistemology.2 December 2012 at 8:12 pm

The problem with probability as a branch of epistemology is that epistemology is about how we know what we know. Probability is about what we don’t know. As such probability cannot bridge the gap between knowing and not knowing. It is nothing more than make a guess, place your bets, and then spin the wheel. You can’t even compute the probability of your probability model being a reliable representation of reality. It too is nothing but a guess. It is guesses all the way down and there is no bottom to the guessing.

Presumably, if enough people make enough guesses often enough someone will be right sometime. One catch though, how will they know their guess was right? They won’t. Especially if they don’t know how to know anything for certain. It is simply piling on more guesses on top of a mountain of guesses.

If you can know something for certain, why not start there and build upon it rather than building on what you don’t know? Oh, that won’t work because then the academics who make a living teaching that we probably don’t really know anything would have to get a real job and actually earn their salaries for a change.

2 December 2012 at 9:17 pm

Mr. Briggs,

I don’t think probability properly is a branch of philosophy. Just like mathematics and philosophy of mathematics, there are differences between mathematical theory of probability and philosophical theory of probability.

Statistics is not a branch of mathematics, but without solid backgrounds of mathematics and theories of probability, one would not be able to thoroughly understand some of the nitty-gritty behind statistics and to possibly come up with mathematical solutions to problems. They overlap each other, but none is a subset of the other.

Preach it? This is not politics. Plus Mathematicians don’t go around telling people that statisticians can’t teach combinatorics, ^_^

3 December 2012 at 1:29 am

Dag gon it Briggs.

You wait for the day my wife falls off a ladder to post something I need to read carefully.

Hope to get to this tomorrow.

Got to “model” before I disagreed but won’t fall in the trap of responding too soon.

et alia,

Bet he still does not come close to telling me how to teach a computer the difference

between an outlier and corrupt data!

3 December 2012 at 8:07 am

I’m with Lionell. At the very first iteration, every Bayesian becomes a frequentist.

3 December 2012 at 8:11 am

JH, Lionel G,

But of course probability is a subject for epistemology, as evidenced by the enormous number of books and papers written by philosophers on the subject, and over the period of centuries.

(I’ll post this above.) Probability can also be math, pure mathematical manipulation: theorems, proofs, lemmas, papers, tenures, grants. Equations galore! But the very instant you try to apply that math to propositions (e.g. “More get better with drug A”) you have entered the realm of philosophy, from which there is no escape. Same applies for

applied math.Bill S,

We all pray your wife is well.

3 December 2012 at 8:14 am

Bill: “Bet [but?] he still does not come close to telling me how to teach a computer the difference between an outlier and corrupt data!”

First, you have to teach it to know the difference between a number and data. Unfortunately, that means you must teach it to understand the full context under which the number was derived and all the treatments you gave the number before you entered the number. Unfortunately, you usually neither know nor understand enough of that context and must discover it and understand it before you can teach it.

Now, it is hard enough to get HUMANS to understand something. Even then, we aren’t very clear on understanding how humans understand or even what understanding means. How then can we teach a computer how to understand anything? We can’t. All we can do is teach it very long lists of what to do with a lot of one’s and zero’s and then give it long lists of one’s and zero’s to work with.

Admittedly some of those lists of one’s and zero’s are useful but they along with the instructions do not represent the computer understanding anything. Whatever is there, we programmers and users put there and we are fully responsible for the result. Then we say “the computer said so” to try to shift the blame to a complicated machine that can only do what it is instructed to do. Most of the time it is nothing but a giant game of make believe.

I have made my living as a professional computer engineer for over 40 years. During that time, I have written some very useful software that was able to accomplish some amazing things. None of it represents the computer understanding the difference between long strings of one’s and zero’s and data. The computer did, exactly and only what I told it to do and not ever what I wanted it to do.

3 December 2012 at 8:22 am

Briggs: But of course probability is a subject for epistemology, as evidenced by the enormous number of books and papers written by philosophers on the subject, and over the period of centuries.

Yes, probability is a SUBJECT FOR epistemology because everything is a subject for epistemology. However, it is NOT a branch of epistemology as initially asserted. Quote: “Probability properly is a branch of philosophy, of epistemology.”

3 December 2012 at 8:27 am

Lionell Griffith,

Is too. See entry above. Bibliography coming.

3 December 2012 at 8:51 am

bob sykes,

You cannot become what cannot be. That is, since frequentism is false, you must be interpreting its methods in a Bayesian way (all the while denying it, of course). Indeed, all frequentists are Bayesians: everybody is and must be. Weak analogy: two parties, anti and pro gravity. Anti gravity party swears gravity false, develops terminology methods which assert and assume it. But when these fellows lean out the window, they all become secret pro gravity partisans.

Of course, you can claim to be a frequentist and use exclusively its methods, but only in a mechanical way. You must in the end interpret them as Bayesians. I nominate confidence intervals for the frequentist procedure which generates the most hilarity (see classic posts page for entries on this curious object).

You can make mistakes, of course, as many people do (esp. using p-values). The number of ways of making mistakes is limitless.

3 December 2012 at 10:40 am

Briggs,

I don’t disagree that probability is a subject for epistemology!!!

Just because the interpretation of probability results involves the meaning of probability doesn’t make probability a

branchof philosophy. You are basically ignoring the mathematical theory part that leads you to the conclusions.You might as well say that, let ignore all other ingredients, every subject involving numbers or any math is a branch of mathematics.

Don’t you think that the MCMC sampling methods used in Bayesian analysis are based on frequentist theories?

3 December 2012 at 10:46 am

JH,

Funny you should bring up MCMC, which sounds (and really is) frequentisty. So you have a choice: abandon MCMC or embrace it. More on this coming…

3 December 2012 at 10:54 am

So since “frequentism is false”, I guess you don;t using MCMC methods. That is, do you use conjugate (cookbook) priors only? Otherwise, in a way, bob sykes is correct!

3 December 2012 at 10:57 am

JH,

No, ma’am, he’s not. You skipped over the part where the interpretation must be Bayesian. And the part where I said “abandon.”

3 December 2012 at 12:01 pm

Briggs, no, sir, I didn’t skip anything. I asked you a question.

3 December 2012 at 1:47 pm

JH,

The answer is no.

3 December 2012 at 9:52 pm

Probability? Bah, that is just an incidental spin-off the theory of sigma-additive measures on sigma-complete Boolean algebras.

(runs for cover)

4 December 2012 at 10:47 am

Stacking models? Is that like multiple regression to simple univariate regression (in all of its various incarnations)? Perhaps next they will discover Type III SS. Or is that like models on those parameter thingies? Sort of like structural or causal equations?

8 December 2012 at 4:02 pm

Bill Raynor: stacking models can be very useful. Think encoder->decoder. You first build an encoder (using a neural net or whatever you like) then use the output of that to build a decoder. Then, treat both as a single model and “tweak” them so that you get a better encoder for your decoder/decoder for your encoder. It works insanely well, and can be used for both dimensionality reduction and classification problems.

Another method is to take a crappy model that does better than guessing, and build another model on top of it that improves the outcome only slightly. Repeat a bunch of times and you should end up with a fairly robust and very accurate mega-model. Basically, you build models to model the residual of the previous model.

Once you go stacked you’ll never go back…? Go boosted or go home? :)

10 December 2012 at 8:06 am

Maybe in

“…that guy who just wrote a book on objective Bayes whose name escapes me…”

you were referring to Edward Jaynes and his book “PROBABILITY THEORY: THE LOGIC OF SCIENCE”:

http://omega.albany.edu:8008/JaynesBook.html

10 December 2012 at 9:19 am

“Isn’t fuzzy logic different than probability?

No. It sometimes has, like mathematics, many-valued “truths” (but so can probability models), but the theory itself is also evaluated with standard logic like probability. Fuzzy logic in practical applications makes statements of uncertainty or of things which are not certain, and that makes it probability. Fuzzy logic is one of the many rediscoveries of probability, but the best in the sense of possessing a cuddly slogan. Doesn’t fuzzy logic sound cute? Meow.”

I’m sorry, but this is completely wrong. Fuzzy logic has nothing to do with probability. The easiest way to see this is to consider the following example. Suppose someone gives you two drinks: drink one has a 10% chance of being poison and drink two is poisonous to degree 0.1. Which one would you rather drink? I would think drink number two, since it is poisonous to degree 0.1 and thus not very poisonous at all. Drink number two on the other hand has a 10% chance of being poison, hence if you are unlucky enough, you drink poison and die.

The point here is that in probability events are either true or false, i.e. if P(X=x) = p, then X is equal to x in p*100% of the cases. So if someone is big with probability 0.2, then 20% of people are big. In fuzzy logic on the other hand, if F(x) = f, then X is equal to x to degree f in all the cases. Note that I don’t use F(X=x) here since it does not make sense as there are no random variables in fuzzy logic. A fuzzy set is just a mapping from a single object (not a variable) to the degree to which the object corresponds to the concept. If someone is big to degree 0.2, he is not very big. Unlike probability, this does not state anything about the population. In other words, fuzzy logic does not express uncertainty, but rather allows one to calculate with vague quantities, which is something completely different from probability theory.

10 December 2012 at 9:30 am

I find your view somewhat simplistic.

Whilst I agree that a lot of things like fuzzy logic and probability are almost exactly the same, and often other things are just different sides of the same coin, to conflate them all is to rather miss the point and throw lots of babies out with bathwater.

There are often distinct benefits in having these different branches, and each brings additional learning to its area. To consider them all the same means to lose that value.

I don’t know whether or not you have ever read Lacatus’ Proofs and Refutations, but I think that he rather makes the argument fairly well.

For me there is still a fundamental difference between saying something IS, and saying something IS LIKELY TO BE.

There are also additional differences in there, for example to say that Bayesiansim is exactly the same as classical probability indicates that you have never met a true Bayesian.

Anyway, for all that, there is still some benefit in looking for the similarity in each of these things, and especially to be able to overcome the bull that academics often put on their area to make it sound unique and thus sell books etc.

10 December 2012 at 10:49 am

Will: Thanks for your clear explanation. My earlier post was refering to an old way of fitting models: Fit Y by X1, get the residuals r1 and prediction Y1 . Fit R1 by X2 and get residuals R2. repeat as needed and combine the Yi to get your forecast. That is the basis for Type I SS. The X’s could be separate variables or decompositions of a smaller set of X’s. This is how I learned to do regression back in the ’60’s when one did this by hand. Correlated X’s required more steps.

Likewise for more general models there is Deming & Stephens raking, which tossed in reweighting (e.g boosting). That is the basis of IRLS. Again it could be done by hand. The objective was to fill in a lookup table (e.g. prediction), not parameter estimation. Demographers, Actuaries, and others had all sorts of tricks up their sleeves.

Finally SEM and path models took the output from one model and fed it into another model as parameters for the next level. Wright was doing that in the 1920’s

It’s much easier and faster with computers. The least squares and normality assumptions just justified the easier approaches. People knew about the existence of other algorithms, but didn’t have computers to explore them (e.g. most of nonparametrics, exact inference.)

10 December 2012 at 1:37 pm

“The amount of money wagered depends on a host of factors, such as your total fortune, the level of your sanity, whether it is your money or a taxpayer’s, and so forth.”

hahaha, subtle

10 December 2012 at 7:37 pm

I’m not sure your understanding of machine learning is correct.

12 December 2012 at 9:09 am

Sean: Agreed, it is probably not in detail. My background is in statistics. Really interesting development have been coming out of the ML areas. The general algorithms have been out there for a while though. Everything old is new again.

14 December 2012 at 8:01 am

“Isn’t fuzzy logic different than probability?

No. It sometimes has, like mathematics, many-valued “truths” (but so can probability models), but the theory itself is also evaluated with standard logic like probability.”

This is not true. In probability, the definitions of sets are clear-cut: a thing either is or is not a member of a set. Probability grades measure the likelihood that a thing belongs to a set, given that set membership of that specific object is not known because of ignorance or time (we are waiting for a revealing event, such as the end of a horse race).

In contrast, in fuzzy logic, it is the set membership which is graded. For instance, some temperatures may be defined as having greater or less membership in the set of warm temperatures. Fuzziness has to do with the graded nature of the definitions of sets. No revealing event or passage of time will change that fuzzy set membership.

14 December 2012 at 10:03 am

Will Dwinnell (and others),

The definition I gave is correct and fuzzy logic is just a rediscovery of probability. The difficulty can be seen by what you say, “the definitions of sets are clear-cut” etc. This is common enough, and I think it is because many feel that all probabilities are quantifiable, and that just is not so. Here’s a fuzzy-probability example, showing that all probabilities are not quantifiable:

Evidence = “Some D are G”. Object = “This d is G”. The probability Pr (Object | Evidence) = not a number. Or if it’s a number, it’s a set; here (0, 1). “Membership” just is probability.