The Increased And Increasing Chinese Military

More on matters that don’t appear to be as urgent as they are.

From the DoD’s annual report on Chinese military capability, “In 2004, Hu Jintao articulated a mission statement for the armed forces titled, the ‘Historic Missions of the Armed Forces in the New Period of the New Century.’”

What’s mission number one? Defense of the motherland? Securing borders peacefully? Modernization of weaponry? Training of troops? None of these.

Try, “Provide an important guarantee of strength for the party to consolidate its ruling position.” Skeptics and appeasers will say he doesn’t mean what he says. The ghosts of Tiananmen will know the words are true.

Bullet number two: “Provide a strong security guarantee for safeguarding the period of strategic opportunity for national development.” Is that a long way to say lebensraum? And just what is the “period of strategic opportunity”? Does it have an expiration date?

World peace is on the list, but it doesn’t even place, coming in a distant fourth.

Defense Ministry spokesman Yang Yujun ritually condemned the DoD report, saying, “The report does not hold water as it severely distorted the facts.”

Regardless, it is clear that China’s military is growing. Is has reached adolescence, voraciously consuming budgets faster than teenagers eat pizzas.

The standard picture is to show spending as a percent of GDP:

Chinese military spending percent GDP

One difficulty is that this is a ratio, the denominator of which is the GDP itself, which in China is expanding rapidly. Notice that the 2011 figure is an estimate based on supposing a heavy increase in the Chinese economy.

This means that pictures which show spending as a percent of GDP in an economy that are bubble-like will tend to underplay the true amount of spending. Pictures like this can also mask increases in spending, as long as the increase in spending is at a lower rate of the increase in GDP—which is the case here.

We are told the military spending for 2011 will increase, but that it will increase at a slower rate than it did from 2009 to 2010.

As an amusing aside, the New York Times writes that this increase is a slow down in spending. Why? Probably because the Times is so used to writing that increases which aren’t accelerating are “decreasing” that it cannot think in other terms.

The second, and irremovable, problem is that the numerator uses the number provided by the Politburo. Everybody believes this to be an underestimate.

To help fix that, here’s another look, this time per capita (known) military spending :

Chinese military spending per capita

The increase—which is an increase and not a “slow down”—now appears to be an increase.

The denominator here is population, which is rigorously controlled by the Chinese one-child policy. For completeness, population is shown at the bottom of the post. The track is fairly smooth: the country is now only adding millions a year. Curiously, the deceleration appears to have stopped four years ago. A relaxation of the bureaucratic oversight of the one-child policy? Or just plain bad numbers?

Spending per capita increased in 2000 and again in 2005. It perhaps slowed in 2010, though this may reflect a change in accounting or the deceleration of population increase.

Anyway you slice it, China’s military is expanding rapidly. As reported last week, China is still conducting test flights of its prototype J-20 stealth fighter, is on the market for Soviet Russian T-50 fighters, and rolled out its newish aircraft carrier, which it will use to menace Taiwan and the Philippines.

The Philippines, incidentally, beefed up its naval presence recently, with a garage-sale purchase of an American Coast Guard cutter (the USGC Hamilton, as was). They did this because China claims its terroritorial [sic?] waters run right up the shores of the PI. President Aquino begs to differ.

We’ll do more on the DoD report another day, but for now, read this report on our potential non-sale of F-16s to Taiwan.

———————————————————-

Chinese population

Detecting Deceptive Opinion Spam

Ever seen a review like this?

My husband and I satayed for two nights at the Hilton Chicago,and enjoyed every minute of it! The bedrooms are immaculate,and the linnens are very soft. We also appreciated the free wifi,as we could stay in touch with friends while staying in Chicago. The bathroom was quite spacious,and I loved the smell of the shampoo they provided-not like most hotel shampoos. Their service was amazing,and we absolutely loved the beautiful indoor pool. I would recommend staying here to anyone.

Your author has come across dozens that started like this one: with “My husband and I” or “My spouse and I”. Surfing over to Yelp and choosing San Francisco brings up another, “My husband stayed here for a little less than a week and were extremely pleased with the place…”

Turns out there’s a good reason for this similarity: many of these reviews are fake, put there by mercenaries, making as little as $5 for two, necessarily glowing, reviews. The $5 figure is from the New York Times, via A&LD. Bogus “five-star” ratings on sites like Amazon and TripAdvisor turn out to be a large problem.

The glowing notice above is known to be fake because it was solicited via a website that specializes in selling fake reviews (I have no idea whether the Yelp review is real or genuine). This solicitation was done as part of a study by Myle Ott and others at Cornell in an effort to develop an algorithm that can detect fakes.

Incidentally, Ott is a computer scientist, and those guys say “train algorithm” when statisticians say “fit model” or physicists say “build model.” All these terms mean exactly the same thing—though, admittedly “training an algorithm” sounds sexier than “fitting a model.” “Training” implies that “learning” can go on indefinitely, while “fitting” implies merely applying some formula. Computer scientists are winning the battle of terminology. They are also—justifiably—winning the battle over the philosophy of modeling, but that’s a story for another day.

Building the algorithm to determine fraudulent reviews is not simple; however, creating the database from which to fit the model is the real trick. One approach was to gather reviews which are too similar, vis à vis plagiarism. Another was to “ask participants to give both their true and untrue views on personal issues (e.g., their stance on the death penalty).” Everybody becomes their own control in this way.

Here, the authors did one better and solicited 400 fake reviews in the same way that fake reviews are solicited by actual websites. They also gathered 400 hoped-to-be-genuine reviews from TripAdvisor. In the end, they had 20 real and 20 fake reviews for 20 different hotels. These were used to fit their model—or train their algorithm, if you will.

One tidbit was the discovery that fake reviews are often written in a hurry. One “took just 5 seconds and contained 114 words.” This of course implies the text was prepared in advance and cut and pasted in. Reviews written by first-time users, or newly created users names, are also more likely to be fake. Sites like TripAdvisor can use these facts as pieces of information to flag a review as genuine or fake.

The models themselves were naive Bayes and support vector machines, both commonly used as classifiers. Classification is the meat and potatoes of statistics (I would say it is the sole reason for its existence; of that, more another time). Logistic regression is classification, as are discriminant analysis, so-called machine learning algorithms, and on and on.

Support vector machines are a kind of non-parametric discriminant analysis. Various combinations of functions of data are produced which spit out whether the given message is likely fake or likely real. If you want to be fancy, you say SVMs “find a high-dimensional separating hyperplane between two groups of data.”

The data is the content of the messages themselves: how long it took them to be written, the number of times the word “I” was used, and so on. For example, deceptive reviews used “experience”, “my husband”, “I”, “feel”, “business”, and “vacation” more than genuine ones.

They got about 90% accuracy on their test data, which is excellent. Especially considering that human readers do no better than 50%. Experience says that that high rate won’t be realized on new data. Why?

Well, the model was fit to the data at hand. If new data was exactly like the data at hand, then the new accuracy rate would be the same as the old. But the new data is never exactly like the old data: if it was, it would be a mere copy. It is the inevitable differences between the old and new that account for the decrease in performance.

This wisdom applies not just to Ott’s model, but to all statistical/probabilistic or computer science/fuzzy logic models. The models’ performance is always conditional on the data at hand.

————————————————————————–

Ott has made his data publicly available. Do not download, however, unless you know how to read things like this, “!/.__The/DT ,/,__and/CC ,/,__and/CC ,/,__and/CC ,/,__and/CC ,/,__as/IN ./.__I/PRP ./.__The/DT ./.__Their/PRP$ ./…”

Conversation With A Progressive: The British Riots

Nigel Shocking, just shocking these riots. Wouldn’t you agree?

David No. I’d say rather that they were expected.

Nigel Too right they were. These youth were disaffected. Nothing but products of a society which has given up hope on them. The only surprising thing is that the riots didn’t happen sooner and last longer.

British RiotsDavid Hang on a minute. You’re saying that it was society’s fault that these people broke the law?

Nigel Of course. The evidence is everywhere. We have a society in which “bankers’ bonuses, MPs cheating on their expenses, unemployment, government spending cuts, poverty, social inequality” are routine. Society itself is to blame.

David Let us be clear. Do you claim that it was society that caused these people to steal trainers, iPhones, and televisions? To loot and destroy? To set fire to private property and to attack the police?

Nigel That is so.

David And that if it was society that was the cause of all this, the rioters themselves were not to be blamed? That although they perpetrated these criminal acts, they were justified in doing so?

Nigel A brutal truth and unpalatable. But one which is the case. As the Daily Telegraph said, it is our “culture of greed and impunity” that drove these unfortunates to violence.

David Very well. Let us accept your premise as true—that society caused these people to act in criminal fashion—and see where it leads us.

Nigel No tricks, now.

David Heaven forfend! But you would agree, I hope, that society—since it is society which we are investigating—is composed of people who, more or less, share a common culture or at least a culture which is in parts different than other cultures?

Nigel That is so.

David The culture itself is shaped by its members, by those people who live within certain geographic bounds. Its members interact with one another in ways too complex to track completely, but can we agree that it is these interactions that, so to speak, create the culture? And that culture and society mean much the same thing?

Nigel Of course.

David “Society”, then, is all the people living in some defined place. Society is comprised of all its members, and you say that society is to blame for the riots. Therefore, the rioters, since they are part of society, are to blame for their acts? Isn’t that opposite of what we assumed?

Nigel You don’t understand. When I say society is to blame, I do not mean all of society, but a part of it.

David Which part?

Nigel The part which controls the money and power.

David Do not these rioters control some money and some power?

Nigel They do, but only a fraction. It is money-hungry businesses and power-seeking corporations that control most.

David So it the business owners themselves that were responsible for their shops being vandalized and looted?

Nigel In a roundabout manner of speaking, yes.

David Would you agree that some among those business owners and corporate board members themselves commit crimes? That rioters don’t have a monopoly on lawlessness?

Nigel I more than agree.

David The class of businesses and corporations is not uniform. Some businessmen and some corporations are richer than others.

Nigel This is true.

David Then, according to your hypothesis, those lower in the hierarchy must have been driven to crime by those higher up. Something caused some businessmen to break the law. And the only explanation you have offered is that crimes are committed by unbearable urges caused by being members of the lower classes. It must then be that all crime is caused by person or small group of persons who sit atop the hierarchy.

Nigel I said nothing about unbearable urges.

David Did you not say, after amendments, that merely being a member of a lower class was what “drove” people to crime? And is not “being driven” another way of saying creating an irresistible urge?

Nigel Whether it does or not is not interesting. I protest your bending of words to suit your own meanings. It was clear that by businesses and corporations I meant an entirely different class of people, one who are wholly apart from those under them.

David So businessmen are not part of the same society? Then, since “society” caused the riots, it must be that either these businessmen or the rioters played the role of foreign invader.

Nigel You’re in the realm of the fanciful. Anyway, we would not be the first to arrive at the conclusion of class warfare for social justice.

David I would remind you that it was Cicero who said that an unjust peace is better than a just war.

Logical Probability And The IPCC’s Ambiguous Forecasts

This post is inspired by Roger Pielke, Jr., as well as Bernie, Matt, and other readers who asked me to have a look at some IPCC probability statements.

Roger Pielke, Jr quotes from the IPCC’s AR4 report

The uncertainty guidance provided for the Fourth Assessment Report draws, for the first time, a careful distinction between levels of confidence in scientific understanding and the likelihoods of specific results. This allows authors to express high confidence that an event is extremely unlikely (e.g., rolling a dice twice and getting a six both times), as well as high confidence that an event is about as likely as not (e.g., a tossed coin coming up heads). Confidence and likelihood as used here are distinct concepts but are often linked in practice.

Pielke rightly became perplexed by this language. What could it mean? He asked his readers (and me via email) to consider the following:

Here are some specific definitions to help you answer some questions.

A. “high confidence” means “about 8 out of 10 chance of being correct”.
B. “extremely unlikely” means “less than 5% probability” of the event or outcome
C. “as likely as not” means “33 to 66% probability” of the event or outcome

So here are your questions:

1. If the IPCC says of a die that it has — “high confidence that an event is extremely unlikely (e.g., rolling a dice twice and getting a six both times)” — how should a decision maker interpret this statement in terms of the probability of two sixes being rolled on the next two rolls of the die?

I answered this puzzler on Roger’s blog (Roger showed two questions, but they are the same at base), but I thought it worth developing further here.

The answer is that there is no answer; or rather, that there are an infinity of answers. The IPCC’s language of “high confidence that an event is extremely unlikely” is ambiguous and incomplete.

Remind yourself that all probability is conditional on certain, exactly specified information, evidence, or premises. What are the premises or evidence for a “high confidence that an event is extremely unlikely”?

Our evidence specifies that “high confidence” means that a statement has 0.8 chance. Here, we have 0.8 chance of an “extremely unlikely event,” and our evidence specifies that this event (call it A) has probability less than 0.05.

We have 0.2 chance missing. That is, there is an 0.8 chance that A is extremely unlikely. But we need the full probability to say what is the probability of A. This must mean that there is a 0.2 chance that A is something other than extremely unlikely. The IPCC does not specify what this “other” than extremely unlikely is, so it could be anything.

We can provide our own evidence to provide a solution. Suppose, just for fun, that the 0.2 chance is for an event A that is merely unlikely, which we specify to mean 0.1 probability. Then we can write a cartoon equation:

      Pr(A | this information) = 0.8 * (Prob < 0.05) + 0.2 * 0.1 = 0.8 * (Prob < 0.05) + 0.02.

And that’s as far as we can go. Whatever 0.8 * (Prob < 0.05) becomes, we add 0.02 to it. The problem is that we do not know what (Prob < 0.05) means. Does it mean “more likely to be 0.05 than 0.01″? Or “equally likely to be any number between 0 and 0.05″ or something else entirely? There is no language in the IPCC that allows us to discern which of these (or some other) is true.

The IPCC’s language is either sloppy thinking or shrewd politics. Given my experience with actual, working scientists, I tend to believe the former. But if it’s shrewd politics, regardless whether A happens or not, the IPCC has given itself wiggle room to say that it predicted A wouldn’t happen, or that it predicted A wasn’t particularly unlikely.

I say this because though we cannot come to an exact solution, we can find its bounds given the language we do have. First, we know there is a 0.8 chance that (Prob < 0.05): the lowest this can be is 0 (just in case (Prob < 0.05) means 100% certainty of 0), and the highest it can be is 0.05 (just in case (Prob < 0.05) means 100% certainty of 0.05). Thus 0.8 * (Prob < 0.05) is between 0 and 0.04.

Now the 0.2 chance. The probabilities available to us are those between 0.05 and 1 (or so it seems; the language is still ambiguous). This means 0.2 times whatever this is is bounded between 0.01 and 0.2.

Our solution is then

      Pr(A | our information) in [0.01, 0.24].

Thus, if A did not happen, the IPCC would point to its prediction and say, “See! We told you so. We said A was nearly impossible and it didn’t happen.” But if A did happen, it could say, “Well, A happened, it’s true. But it happens about 1 out of 4 times, which isn’t that unlikely. We can be satisfied with our prediction.”