Feb 09 2010

Predictive Statistics: GPA Case Study, Part I

Published under Statistics

This is a follow up to the Quirk’s article.

Predictive statistics differs from classical (frequentist and Bayesian) practices because it focuses on observables and not metaphysical entities. Observables are the data we can see, touch, smell, feel; things that we can measure.

But what are “metaphysical entities”? Things you cannot see, touch, smell, or feel; things that cannot be measured. “Null” and alternate hypotheses and parameters are among them. Too much attention are paid to these, which limits our abilities to make decisions about real data and cause us to be more certain than is warranted.

Here is a simple example, geared to those who have had experience with linear regression.

We want to predict what a student’s college grade point average (CGPA) would be given we know his high school GPA (HGPA), SAT score, and total score (0-10; higher is better) assigned to his letters of recommendation (LTRS). Before looking at the data, conditional on our experience, we would guess that higher HGPAs, SATs, and LTRSs are associated with higher CGPAs.

Always begin by looking at your data:

College Grade Point

The diagonal plots are the variables’ histograms. We already know that LTRS is restricted to 0-10. But CGPA is restricted to 0-4, and HGPA is restricted to 0-4.25 (for this data set), and SAT is restricted to 400-1600 (these are pre-written-component SAT scores). Remember these restrictions.

The off-diagonal plots work like this: the variable in the row is on the y-axis, and the variable in the column is on the x-axis. The important row is the top row. The first plot to the right of the histogram in that row is CGPA (y-axis) by HGPA (x-axis). As we guessed, higher HGPAs are associated with higher CGPAs. You can figure out the rest easily.

Those red lines are guesses at the associations, and since these are close to straight lines we are comfortable with using linear regression to model the data. That’s the usual way to do this: but we should not be comfortable, as we shall see.

This isn’t the place to teach regression—I’m assuming you know it—but if you don’t, you might guess how it works from the way we write it mathematically:

    CGPA = b0 + b1 HGPA + b2 SAT + b3 LTRS + noise

This says that a person’s CGPA is a linear function: start with the number b0 and add to it b1 times his HGPA and add to that b2 times his SAT and so forth. A little “noise” is added to make the whole thing probabilistic. That noise contains all the stuff we did not observe—but could have—and makes up for the difference between the observed CGPA and the first part of the equation.

Anyway, as the sociologists say, “Let’s submit this to SPSS and see what we get.” I’ll use R, but the answers will be the same. What pops out is a table that looks something like this (R, like all software, shows too much precision):

Estimate   Std. Error   t value   Pr(>|t|)
(Intercept) -0.1532639   0.3229381   -0.475   0.636156
HGPA 0.3763511   0.1142615   3.294   0.001385
SAT 0.0012269   0.0003032   4.046   0.000105
LTRS 0.0226843   0.0509817   0.445   0.657358


By “(Intercept)” R means the b0 in our model; and by HGPA it means b1, and so forth. The estimate is just that: a guess of value of each bi. The “Std. Error” we can ignore, because it’s incorporated into the “t value”, which is used to test the “null hypothesis” that each bi = 0: and I mean equals, precisely zero. The “Pr(>|t|)” is the infamous p-value of this test.

Popular mysticism insists that the p-value should be strictly less than 0.05 to be publishable, so that we can authoritatively state the variable is associated with CGPA. And we’re in luck: The p-values for HGPA and SAT are nice and small. We could write our paper and say there was a “statistically significant” association between HGPA, SAT and CGPA. But not for poor LTRS, which has a depressingly large p-value.

Most would stop the analysis here. Some might push just a bit further and conclude that SAT is a “better” predictor because it has a smaller p-value. The more conscientious would glance at the diagnostics (residual plots, R2, AIC, etc.). I won’t show them, but I assure you all these are fine.

Are we happy? Handshakes all around? Break out the organic cigars? Not yet.

Predictive statistics begins with the idea that the data we have in hand is not unknown—which should be trivially obvious. What we do not know is what future data will look like (by “future” I only mean data that we haven’t yet seen; it could have been collected in the past).

Thus, we are not interested in whether or not, say, b2 = 0, or any other value. We want to know answers to questions like this: Given a person has a high HGPA, decent SAT scores, and an average letter-of-recommendation score, what is the probability he will have a high CGPA?

That question is entirely observable: the data that is given—the conditions which form our questions—and the observed outcome are all measurable, real things. We’ll never know—we can never know—whether b2 is zero, or any other value. But we can know whether our prediction of a person’s CGPA is good.

Remember when I said we should not be comfortable with the regression assumptions? Tomorrow we see why.

No responses yet

Feb 08 2010

Monday Moanin’

Published under Fun

I stole that title from the late Bob Talbert, an everyman columnist from the late, or possibly undead, Detroit Free Press.

He would use that title whenever he had a column to write but didn’t have sufficient time to put enough similarly themed words together to make 800. It also allowed him to insert pet peeves in print that would otherwise have no home.

So I watched the Superbowl and forced—forced—myself to sit through the entire halftime show. About which I can only say thank the Lord for beer and loud crowds. For without those, what came out of the television speakers would have been unbearably clear.

I didn’t know the name of the fellow in the black hat, but he should be strongly discouraged from ever approaching a microphone again. He was so bad that the only thing that kept the chicken wings I had ingested from making a reappearance was the drunken fellow next to me, who possibly was trying to sing along. Anyway, he was making a lot of noise. But he was loud and I was grateful for him.

I already know the counter argument: “Briggs, you fool. Black Hat used to be a star. He was a Rock God. Have some respect.”

To which I respond: I notice you speak in the past tense. Good he may have been, but good now he is not. Why should all of us be subjected to his caterwauling? Respect? If you are determined to highlight music from bygone days, and wanted to show respect, we would have been infinitely better off had the NFL wheeled out Cab Calloway’s ashes and put on a scratchy recording of “Minne the Moocher.” I would have sung along to that.

The Tim Tebow ad was ho hum. It started with his old ma talking to the camera. I couldn’t hear it clearly and thought at first it was a soap commercial. I hadn’t realized my mistake until after she had recovered from the blind-side tackle, son and ma lovingly embracing, and the Focus on the Family website was on screen. This was controversial?

Mencken said that the best poems are those about something we know is false but want to be true. It’s the same with TV commercials. This is why ads for “light” beer or “diet” pop always tout their “great” taste.

Let any diet beer inch above freezing even slightly so that it isn’t cold enough to numb your taste buds and you know the ads lie. And never was there a sugar-free pop that didn’t go down like chemical soup.

It’s not just TV. The fast-food chain Panda Express boasts that it serves, “Gourmet Chinese Food.” In their favor, it might be a mistranslation of “glutinous.”

Many people are laughing at this guy, a TV meteorologist from AccuWeather. The title of the video is “Snowpocalypse Now! Meteorologist Freakout.”

What people don’t understand is how weathermen live for—lust after—storms. He is not freaking out. He is loving every minute of it.

He wouldn’t have been so excited had he not lived in such a dull area, meteorologically speaking. How many different ways can you say “Humid and overcast with a 30% chance of afternoon showers”? He finally had something meaty to talk about. “Fourteen to twenty-two inches of snow!” Back in Northern Michigan, we called this a dusting.

I recall this freshman coming to our meteorology program who had saved every weather clipping—those maps with temperature gradients in the back of the newspapers—for five years. He kept them in a binder which he carried everywhere.

If today’s weather conditions were this many millibars, this hot, with the wind coming from that quarter, this kid knew the historical analogue. “Today’s 500 mb flow reminds me of the ‘77 Memorial Day storm that dumped over three inches of precip.”

This young man was not uncommon. He type is known as the weather weenie. These guys come in thinking that all meteorologists do is sit around and talk about the weather—which is true. They do. But they first have to slog through all the math. Years of calculus, physics, PDEs, thermodynamics, equations of motion. Many weenies drop out and change their majors to communications.

It is there they learn their bad habits.

10 responses so far

Feb 07 2010

Ayn Rand and the Differences Between Groups

Published under Philosophy, Statistics

Roger Kimball is causing a stink, a predictable yet enjoyable stink, by publishing Anthony Daniels’s review of an Ayn Rand biography in this month’s The New Criterion.

There are two enduring internet-subjects on which if any negative criticism appears, no matter how slight, can be counted on to generate tea-cup furies.

The first is Apple computer. For example, question the hubris of Steve Jobs, who last week introduced Apple’s tablet under a looming picture of a stone-carrying Moses descending Sinai, and legions of fanboys will descend upon your site and explain to you just how stupid you are, and why you will always be so since you cannot comprehend the simple logic of how the Israelites would have spent 50% less time wandering had they only been presented their Commandments via the iPad. (In colour!)

It used to be that any negative press of Ron Paul or Obama would produce the same attacks of splenetic fever. But Ron Paul is long dead (so I’ve heard) and the growing perception is that Obama has read from one teleprompter too many.

So we are left with Ayn Rand. Daniel knew the danger going in, which is why he took pains to present ideas of Rand’s which he thought were true. The first: “[S]he was among the first to appreciate that the notion of collective rights (a mirror image of racial discrimination) would ‘disintegrate a country into an institutionalized civil war of pressure groups, each fighting for legislative favors and special privileges at the expense of one another.’”

This was an empirical prediction which experienced has verified, and is therefore true, but not yet universally acknowledged.

Daniels also recommends “her observation that ‘Even if it were proved…that the incidence of men of potentially superior brain power is greater among the members of certain races than among the members of others, it would still tell us nothing about any given individual and it would be irrelevant to one’s judgment of him.’”

This is a (comforting) statement of philosophy and is false. Further, any statistician knows that it is false.

Suppose there are two group, M and N. And to avoid emotion, suppose M and N represent the sales (in dollars) of two rival products. The statement that the evidence shows the “incidence of…superior brain power is greater among the members of” a certain group is translated into “the evidence is that the probability of group M having higher sales than group N is greater than 50%.”

Writing this in traditional notion (for those comfortable with this) gives

    Pr( Sales[M] > Sales[N] | Our Evidence) > 0.5).

Another way to state this is that if you had to guess which product, M or N, would have greater sales, you would maximize your chance of confirming your guess by saying “M.” This does not, of course, prove that the sales of M will be greater. N can still beat M.

Rand would say that the evidence that it is probable that M > N “would still tell us nothing about any given individual and it would be irrelevant to one’s judgment of him.” Here there is only one individual per group (just the sales M and N), but knowing who that individual is tells us something about that individual and is not irrelevant to our judgment of him.

You might object that Rand obviously meant more than one individual per group. So suppose sales of M and N are generated by salesmen. That is, M has a host of salesmen hawking it and so does N. The number of salesmen in each group need not be equal.

Our equivalent evidence is that the salesmen in M sell more than do the salesmen in N. That is, given this evidence, the probability that a salesperson from M outperforms a salesperson from N is greater than 50%. Notationally,

    Pr( Individual[M] > Individual[N] | Our Evidence) > 0.5).

If all—pay attention to this “all”—we knew about the two individuals in front of us is that one sold M and the other sold N, then this would tell us something about these given individuals.

Our knowledge of what group these individuals belonged to would be relevant to our judgment of them. Our judgment is that, given the evidence we have of the two people in front of us, the guy that sold M is probably a better salesman than the guy who sold N.

This, again, does not prove guy M is better than guy N. We could learn new evidence that changes our perception: for example, guy M is drunk.

But in general, Rand’s statement is logically false.

Be careful to understand what we proved. We did not prove that, in comparing different groups of humans, there will be measurable differences. Whether or not there are is an empirical, and not a logical, question. This is what statistics is all about.

What we did prove was that if those differences are real, then that information would be relevant to judgments about members in different groups.

16 responses so far

Feb 06 2010

What do you guys look like?

Published under Fun

How many of you are there?

According to a mixture of Wordpress and Google Analytics statistics reports, I receive roughly twelve to fifteen hundred hits per day. That’s excluding bots and other riffraff. And that’s not unique visitors, either: it’s page views.

Order of magnitude, it’s about 1000 different people a day. That translates into about 30,000 unique visitors a month (an overestimate, since some people come here using more than one computer, and each is counted as a separate person; plus, lots of you are regulars).

Traffic has been trending up steadily, too. Divide everything by two, and you have a reasonable estimate of last year at this time.

I owe much of this increase to you, my readers. And that’s not my attempt at flattery, either. (If it was flattery, I would have said, “my abnormally intelligent, surely good-looking readers”.)

I know this is true from examining the incoming traffic stats. A chunk of search-engine directed or linked traffic comes from keywords or material from this site’s comments. This means that people are coming here to read what you said.

I am grateful for this, especially as traffic to the site positively correlates with my wallet size. Perhaps hat size, too.

But there are more than a few of you that come regularly and do not comment. To you, I say: speak up! You are surely thick-skinned enough to handle the inevitable ridicule, opprobrium, and excoriation the other readers will heap upon you for your, what they will tell you are your, undoubtedly mistaken views.

Kidding! I’m just kidding. Most of us haven’t killed and eaten anybody in years, so you have nothing to fear. We are nice people.

Where are you from?

Most gratifying is the proportion of non-USA traffic. It is nearly 50% and growing. For example, the majority of people who downloaded my Quirk’s article from this past week were not in the States.

And it’s not just the English-speaking countries, like you’d expect. Visitors are from all over. There is a solid base of folk from Finland (maybe the Northern Michigan connection?), even more from Germany, and a steady supply from Japan. We even had somebody—perhaps lost—from Mongolia.

Remember, if you need a statistician, I’ll go anywhere: I can gesture in several languages.

What’s interesting is where people are not from. Over the past year, I had no visits from the following countries: North Korean, Cuba, Nicaragua, Haiti, Kyrgyzstan, Paraguay, French Guiana, Suriname, Papua New Guinea, Serbia and Montenegro, Somalia, and most of Central-West Africa from Mauritania to D.R. Congo.

I don’t know about Antarctica, since Google doesn’t track it (other than, perhaps, the “not set” continent distinction; of which there were 320 visits) . Everywhere else was represented. This story is surely the same for other web sites.

Who are you?

Our regular contributers are largely professional, but that’s the same all over the web. We don’t, for example, attract a computer-illiterate crowd.

What’s not the same, is that most of us are familiar with the right-hand-rule, understand jokes about rogue 540nm photons, like to be left alone, and know how to pronounce “corpsman“.

We are also, surely, abnormally intelligent and good-looking.

I’m thinking about setting up—strictly for fun—a survey of readers, just so everybody can see who we all are.

Anyway, thanks to everybody for making the site work. Keep sending those links and ideas. And feel free to send the site to your rich relatives who are authorized to sign large-dollar contracts and who might need a statistician.

39 responses so far

Feb 06 2010

R Lecture 7: Reading External Data Part II

Published under Podcast, Statistics

10 minutes is a shockingly short period of time!

Today, some common errors you WILL see when you try to read in data. The code below has many typos.

Watch for Windows to mistakenly save your file as “advertising.csv.txt”. Windows might append that “.txt” without you’re being aware. Open the folder properties in “C:/myR” and unhide the file extensions.

Sometimes missing values in your spreadsheets have hidden characters, or R gets confused about the data type. Remember, R tries to guess whether each column in your data is a number or a factor or categorical variable. To avoid difficulties use the argument: na.strings=”" where “na” means “missing data” inside R.

Notice there is no space between the quotation marks. You can also specify a value that means missing. Perhaps—NEVER DO THIS—you coded missing as “99″ or something equally foolish. Then you could use na.strings=”99″. We’ll talk about data coding another time.

Cut and paste or type the following NEW block of text into your myRcode.R file and SAVE it.


# Common errors
# Windows
x = read.csv("C:\myR\advertising.csv")
x = read.csv("C:/myr/advertising.csv")
x = read.csv("C:/myR/advertisng.csv")
# Mac
x = read.csv("~/Desktop/myr/advertising.csv")
x = read.csv("Desktop/myR/advertising.csv")
x = read.csv("~/Desktop/myr/advertisng.csv")
# Linux
x = read.csv("myR/advertising.csv")
x = read.csv("/home/matt/myr/advertising.csv")
x = read.csv("/home/matt/myR/advertisng.csv")
#
# Good code
?read.csv
# Windows
x = read.csv("C:/myR/advertising.csv", na.strings="")
# Mac
x = read.csv("~/Desktop/myr/advertising.csv", na.strings="")
# Linux
x = read.csv("/home/matt/myR/advertising.csv", na.strings="")

We will cut & paste this code from the file myRcode.R into the R command window. EACH TIME REMEMBERING TO HIT THE ENTER KEY (inside R).

R can be downloaded here: R-project.org. A direct link to the CRAN package archive is here.

All videos are on YouTube under the username “mattstat” (wmbriggs was taken). That service imposes a ten-minute limit of videos. Accordingly, lectures are short.

All questions to matt@wmbriggs.com.

2 responses so far

Feb 05 2010

Climate Skeptic Conspiracy Strikes!

Published under Climatology

I have never been part of a conspiracy before—there was never the opportunity—so you can imagine how excited I am about finally joining one.

It’s true that, in 1978, I, my sister, and a neighbor once piled bales of hay into a fort behind the garage and started an exclusive, invitation-only club. But we never got past the election of officers, nor did we have a chance to initiate any blood oaths, much to my bitter disappointment.

So I am truly juiced about being back in the game. Clandestine meetings, secret passwords and handshakes, furtive glances across train station waiting rooms. Regular readers know that I already have the fedora. Well, I also have a trench coat. Now I’ll really be able to put them to use. It’s going to be great!

Of course, I haven’t been contacted by any other members of the conspiracy yet. In fact, I have had little communication with anybody about the subject of the conspiracy. But it can’t be much longer until I’m in.

How do I know this? Rajendra Pachauri, chairman of the IPCC and sex novelist, was asked by the Financial Times, “Do you think there is an organised effort to demolish your reputation and the reputation of the IPCC?” He replied,

It doesn’t take a genius to arrive at the conclusion that apparently this is carefully orchestrated. These things are certainly not happening at random.

Carefully orchestrated. Not happening at random. There it is! The first spooky signs of a conspiracy. And I’m going to be in on it.

Incidentally, before we get back to Pachauri, I want to tell the Fellowship (for this is what I imagine their name is) that I have distinct theories about manufacturing and maintaining stealth. I want them to know that I can contribute, that I am worthy.

For example, here’s one hot tip. Sunglasses are out: too cliche and a dead—emphasis on dead—giveaway. And from my years of studying the “progressive” mainstream media, I am an expert on plausible deniability.

More Pachauri:

…I would say [there are] nefarious designs behind people trying to attack me with lies, falsehoods [alleging] that I have business interests…What [the Fellowship] are indulging in is skulduggery of the worst kind. I’m reasonably sure that very soon people will realise the truth and they would also question the credentials of some of the people who are behind them.

I don’t want to get down to a personal level, but all you need to do is look at their backgrounds. They are people who deny the link between smoking and cancer; they are people who say that asbestos is as good as talcum powder—I hope that they apply it to their faces every day—and people who say that the only way to deal with HIV/Aids is to screen the population on a regular basis and isolate those who are infected.

There is clearly a very obvious intent behind this whole thing. I’m certainly not going to be affected by it. I’m totally in the clear. I have absolutely nothing but indifference to what these people are doing.

Despite his comment that he has “nothing but indifference”, The Fellowship has clearly unhinged our man. He wants us to rub asbestos on our faces. Every day! That must be the oddest death fantasy ever publicly admitted.

The FT tried to pin him down, “Who exactly is the ‘they’ that you are pointing to, and what do you think is the purpose of this campaign?” Our man replied, “They are people who deny the existence of the human influence on the earth’s climate.” Which is nobody I know of. I have heard of people—I am one of them—who claim the influence is minimal and not especially worrisome.

But the FT reporter caught scent of the conspiracy. He asked, “Do you think they have other backing?” The reply:

The presumption is since these people are spending so much time trying to write all kinds of malicious articles and indulge in invective, there would probably be some resources that are flowing to them. It’s all part of a pattern. But let me clarify. I have no proof. I can only presume something like this is at work.

So he knows the Fellowship is out there. He even suspects some of the names. The only reason they would be skeptical is because of cash gifts.

But he has no proof! He admittedly has no evidence. Don’t you see what that means?

It must mean that the conspiracy exists. No other conclusion is possible: the logic, so far as I can discover, is air tight.

So I can’t wait to join. I can’t wait especially to partake of some of those malicious monies of which Pachauri spoke. I’m going to be rich!

20 responses so far

Feb 04 2010

R Lecture 6: Reading External Data Part I

Published under Podcast, Statistics


10 minutes is a shockingly short period of time!

Go to http://www.wmbriggs.com/book/ and save the advertising.csv file into your myR folder.

Make SURE it is saved as a CSV file and NOT a CSV.TXT file: sometimes Windows thinks it knows more than you.

CSV stands for comma separated value: these are just text files with rows of data with each column separated by a comma. They can be read into any spreadsheet software. If you do not have a spreadsheet, download and use OpenOffice: it’s free and open source.

The read.csv() function reads in a CSV file and places the values from that file into an object. You have to give this object a name. I frequently use “x” because it is short and easy to type. But you can call it anything you like.

The “x” object is called a data frame inside R. In other words, it’s a data set. Why didn’t the R programmers use the name “data.set” instead of “data.frame”? The minds of computer geeks are dark and mysterious.

If you did not put the myR folder where I told you, or if you failed to save the advertising.csv file in that folder, you WILL have problems.

Next time, I’ll show you some of the common problems.

Cut and paste or type the following NEW block of text into your myRcode.R file and SAVE it.


# Data found at:
# http://www.wmbriggs.com/book/advertising.csv
#
# Windows
x = read.csv("C:/myR/advertising.csv")
# Mac
x = read.csv("~/Desktop/myR/advertising.csv")
# Linux
x = read.csv("/home/matt/myR/advertising.csv")
#
summary(x)
plot(x)

We will cut & paste this code from the file myRcode.R into the R command window. EACH TIME REMEMBERING TO HIT THE ENTER KEY (inside R).

R can be downloaded here: R-project.org. A direct link to the CRAN package archive is here.

All videos are on YouTube under the username “mattstat” (wmbriggs was taken). That service imposes a ten-minute limit of videos. Accordingly, lectures are short.

All questions to matt@wmbriggs.com.

No responses yet

Feb 04 2010

Quirk’s: Telling the future from the past: predictive versus classical statistics

Published under Statistics

Quirks

Today’s post is at Quirk’s, the well known trade journal and marketing research review.

If you want to read the article on-line at Quirk’s, registration is required, but free.

You can also download a PDF copy of the article: Telling the future from the past: predictive versus classical statistics.

If you rely exclusively on classical statistical techniques, then you will often be too certain of yourself. You will derive answers and make decisions in which you are too confident. This is guaranteed. Why? Well, read the article, or stick around. I’ll be writing about this more.


The audience for this paper is working statisticians or marketers and executives who have experience using statistics. Here’s the start:

There is a story about a marketing statistician who was asked by his mother what he was doing. “Modeling for Victoria’s Secret,” he said. “You’re doing no such thing!” she said. She was shocked. She shouldn’t have been, because classical statistics is a lot like a modeling lingerie.

A common experience many readers, especially of the male type, have of the Victoria’s Secret catalog is to marvel at how well the models exhibit their wares. A reader, surely fixated on fashion, might closely examine a photograph and say, “This model appears ideal. Her clothing fits perfectly.” Some especially attentive viewers can tell you the measurements of the garments down to the nearest fraction of an inch. They look at a model and announce, “She must not have got that outfit off the rack because there’s almost no chance a ready-made garment would have fit that well. It must have been made for her.” Yet, knowing this, they still buy the clothing hoping that it will do for them—or a close associate—exactly what it did for the model.

Thanks to Jim Dukarm for providing very helpful comments on an earlier draft.

4 responses so far

Feb 03 2010

Malthus Was Wrong, But Not Why You Think

Published under Culture, Statistics

It’s hard to think of a historical writer more misunderstood than Thomas Malthus. A week doesn’t go by without somebody dropping his name, but only to show how wrong he was.

Take this Stephen Malanga City Journal article, “Our Vanishing Ultimate Resource: Plummeting birthrates threaten prosperity worldwide. Can America buck the trend?

Malanga writes that the “media continue to warn us about impending environmental catastrophe and mass starvation caused by an exploding human population. These Malthusian alarms persist even though the last 200 years have proved Malthus completely wrong.”

Malthusian alarms! Well, I don’t blame Malanga, because you cannot find our good Reverend named in any context other than as a failed forecaster in the same vein as Paul “Population Bomb” Ehrlich. Everybody thinks that Malthus predicted doom by overpopulation.

Not so.

Malthus’s theory was a steady state one. He said that a species will breed up to the point at which no more of it can be fed. He made the logically undeniable point that no more of a species can exist than can be supported by the available food supply. The population will increase and stay at those levels and cannotbecause there is no food to—go beyond that point. The doom of which you constantly hear is impossible. Stay here until you understand this. This applies to man, too.

What Malthus said was that a species was always at its limit—barring disasters, wars, famines, booms (exceptionally good harvests), “unnatural practices” (by which he meant abortion and homosexuality), and so on. Charles Darwin saw the brilliance of Malthus’s theory and married it to his idea of evolution: that species are always competing for food provided the mechanism to drive evolution. Those that were better at finding food, survived.

But Malthus was wrong about our species, and exactly in the opposite direction you commonly hear. Man has not bred up to the point that he can be supported by the available food supply. Man does not follow the strict theory of evolution.

NOTE: this does not imply that that theory is wrong overall; merely that it is incomplete with respect to our species; e.g., strict Darwinan “selfish genes” theory does not sufficiently explain abortion, altruism, and adoption, to name just the As.

In fact, mankind has turned out to be quite a slacker, survival-of-the-fittest-wise. As things get better, we breed not more, but less. Take a look at this picture, which shows the estimated World population since 1950.

World Population

Looks like nowhere to go but up, right? If so, this is yet another example of how to cheat with statistics. Take a look at the same numbers, but shown as the velocity, or rate of change of population.

World Population Velocity

That hatchet-notch around 1960 was caused by yet another attempt to create a socialist paradise in China (it’ll work next time, right?). Centrally-planned famine wiped out a good chunk of humanity.

However, Malthus would be at a loss—as we are—to explain the drop-off starting around 1990. True, part of it is due to good old communist stick-to-itiveness: China is vigorously aborting a fairly large fraction of its pre-women, and some pre-men, in its “one-child” policy (they misread Malthus, too).

But weirder is the trend in the West, where the beer is always cold, grocery stores overflow, over 500 channels are on demand, and there is plenty of room to grow. In short: life is good. But people are not celebrating their success in the way they would have in the days before electricity.

Following strict utilitarian principles, some of us are willingly giving up the passing on of our genes. We are not competing for our survival.

Don’t believe it? Then look at Japan. Is there are more technologically advanced civilization? Low crime, more than enough food, and talk about healthy? These people regularly pop out past the century mark. Surely, they must be beavering away producing the next generation. Here are the numbers:

Japan Population

The dip is obvious, even in the raw numbers. And remember: demographic forecasts are almost always right, at least at the decadel level. It’s easy to count people, and breeding new humans takes about a year. Makes it easy to guess what will happen in the short term.

But you don’t have to accept the prediction. Just look at the velocity.

Japan Population Velocity

A line that straight downhill is spooky: it cries out for a cause. It is such a steep slope that it appears there was a national decision, after some initial indecision before the 1970s, to stop having babies.

Can a civilization exhaust itself? Turn so inward and self-indulgent? Is there some hidden virus or amoeba acting to suppress the desire to breed? Maybe an adequate diet—in exact opposition to theory—causes that suppression.

It isn’t just Japan. It’s Italy, Sweden, Germany, Austria, and on and on. Even the “developing” countries show signs of the same disease: the better they get (materially) the less they breed. So far, the US is holding its own and still getting to business. Nobody knows why.

Malanga says it’s because we lack an overly strong government. But if he’s right, and since our government has only grown stronger, then the US will be on the same downward path soon.

24 responses so far

Feb 02 2010

R Lecture 5: Reading Built-in Data

Published under Podcast, Statistics


This is the fifth in a series of lectures on R.

10 minutes is a shockingly short period of time!

Today, we read in some datasets that come with R. To list, use the data() command, find what you want, then use the name of the dataset in the data() function. Like data(ToothGrowth).

The question mark ? may be used to ask for help for datasets.

Always try summary() for everything. It will almost always give you what you expect.

Also try plot().

Open the myRcode.R file we saved in our myR folder. Windows users have it in their “C” drive. Mac users have it on their Desktop.Linux users have it in their home path.

Reminder: DO NOT USE MICROSOFT WORD.

Cut and paste or type the following NEW block of text into your myRcode.R file and SAVE it.


# Built-in datasets
data()
data(ToothGrowth)
?ToothGrowth
ToothGrowth
summary(ToothGrowth)
plot(ToothGrowth)

We will cut & paste this code from the file myRcode.R into the R command window. EACH TIME REMEMBERING TO HIT THE ENTER KEY (inside R).

R can be downloaded here: R-project.org. A direct link to the CRAN package archive is here.

Next time: finding commands and plotting! The next lesson will appear on the weekend.

All videos are on YouTube under the username “mattstat” (wmbriggs was taken). That service imposes a ten-minute limit of videos. Accordingly, lectures are short.

All questions to matt@wmbriggs.com.

3 responses so far

Older Entries »