Fun

Playing catch up

When I was out, a lot of people sent me leads and links which I did not have time to upload. Here are a few of them.

* Many people emailed me this article from the New York Times about the R statistical software platform.

R is my daily bread and butter. It is probably the most used piece of statistical software, in the sense that most statisticians have a copy of it. I am lucky, because I am independent and I can choose to use what I like, and my luck is extended because of the beauty of R. Many statisticians are not so fortunate, however, and are forced to use something ugly, and ridiculously expensive, like SAS. Remind me to tell you sometime why SAS is so awful.

* Tom Hamill sent this Time’s article about the failure of Value at Risk (VAR) models in the finance industry. Our friend, Black Swan ego-boy Nicolas Taleb is there calling everybody idiots and “intellectual charlatans.”

* Tom also sent this along. Just click and then come back. Yes, people have truly lost their minds. Except for scanning for metal weapons, everything about airport security is useless and even harmful as it gives a false sense of security. Liquids? Don’t get me started. Strap a plastic bottle full of explosive fluid to a woman’s leg; as long as she wears a skirt, it gets through, etc., etc.

* My Aunt Kayla sent me this article, subtitled “Who will wear the Chief Probability Officer hat in your organization?” I hereby volunteer to be the CPO at any large company as long as my executive compensation package is at least half as large as any other officer. I’m not greedy. I’ll sit in my office all day with my auguries and predict. Incidentally, JMP is software developed by SAS when they realized SAS stunk.

* Harvey Motulsky noticed that Dilbert’s creator Scott Adams is fond of statistical argumentation.

Categories: Fun

15 replies »

  1. “Remind me to tell you sometime why SAS is so awful.”

    Please do so!

    I was only exposed to SAS once and for a very short time before discovering R and never looking back. I really didn’t have the time to form an informed opinion on SAS beyond a first impression of painfully unintuitive syntax…

  2. Welcome back!!!

    Since you’re playing the game I would very much like to know what you think about this latest:

    http://www.nature.com/nature/journal/v457/n7228/full/nature07669.html

    Finally it has been proved that even Antarctica is warming: all the data so far known -that were saying otherwise- are wrong because not analyzed…

    I heard “skeptics” complaining about the usual use of statistics to “analyze” (read instruct/tune) the model so that the outcome supports the request for more funding but I am not in position to crunch all the numbers.

    When you can I would really love to know your thoughts… even in private.
    Thanks,

    Marco

  3. R is great for one-off boutique analyses and makes real pretty pictures. It’s not so good for large production systems that have to run “hands-off” 24×7. R chews up memory and leaves lots of copies of the data floating around. SAS chews up disk space and isn’t as flexible or way-cool state-of-the-art. I use them both. Different design, different purposes.

    Actually, JMP was developed by SAS when Apple wouldn’t let SAS have as many file handles as it wanted. Apple won that argument and John Sall went off and created JMP “the Apple way.” Again, its really pretty and great for the kids but not so hot for larger systems. Try writing JMP code that writes JMP code…

    The purpose of the computer is to replace humans on the tedious stuff, you know. After you’ve analyzed the umpteenth standard RCB design, they lose their charm.

  4. bill,

    I’ve been running R on several different production platforms for a few years with little trouble. I run it through web interfaces and in batch mode, so memory usage has never been a problem, even less so than with SAS.

    And don’t get me started on SAS’s bloated data architecture.

    Since you can link in C/Fortran code with R, you can obviate any data problems (copies etc.) with R.

    I can’t think of anything to recommend SAS over R except that more people use SAS operationally. But only because of inertia.

  5. We use SAS for mostly ETL stuff, and the inertia involved in changing anything at our company would prevent us from funding MS, SAS, IBM, etc….

    I’ve never used R, but am really interested in how it does ETL and if it integrates with .NET, java, etc…

    Links? Experience?

  6. Briggs,

    Maybe we are using different production systems. Mine takes test description files, generates paperwork (questionaires and forms), checks and edits/rejects incoming data and then performs a standard set of analyses based on variable types (from the test description files) and experimental design (ditto) and then writes summary reports (Latex) and distributes the results to the end-users when released by the statisticians. It does it for several thousand studies/year. The statisticians spend more time designing the studies and doing the one-off analyses that are the “value-added” parts of the job (using SAS, R, or whatever fits…)

    How is SAS’ data architecture bloated? Its simple, but you can adjust column widths easily, and every PROC takes the same dataset. I like the different types in R, but trying to figure what particular datatype some of the functions take can cause hair loss.

    Bill.

  7. Matt:
    The thing I noticed on the Amazon link to the Playmobil set , was what other stuff people bought who bought this toy set. Take a look! Any explanation for this empirical convergence should make interesting reading.

  8. It’s not the fat tail, or the black swan Taleb needs to worry about, it’s the peacock tail that he’s grown that will prove his downfall.
    “If the best man’s faults were written across his forehead, he would draw his hat over his eyes.”
    The worst man wouldn’t bother.

    Inspector Bernie, you’re on to something,
    The model is for practice, planning and breifing;
    The Shutter speed book, for learning about how to build or fox an X-ray machine, that still uses an electronic shutter with a scintillator: this book could be useful! Perhaps just to fox a CCTV.
    Measuring devices, couldn’t they have gone to the kitchen cupboard for those?
    As for the book about good versus evil, well, I’d say it’s an open and shut case.
    Them’as done this, WANT to be caught.

    It is a bit freaky though, The viewed items aren’t any less incriminating than the bought ones.

  9. Glad to have you back.

    This story, published today, tickled me.

    “A cow with a name produces more milk than one without, scientists at Newcastle University have found.

    Drs Catherine Douglas and Peter Rowlinson have shown that by giving a cow a name and treating her as an individual, farmers can increase their annual milk yield by almost 500 pints.

    The study, published online today in the academic journal Anthrozoos, found that on farms where each cow was called by her name the overall milk yield was higher than on farms where the cattle were herded as a group.”

    http://www.telegraph.co.uk/earth/agriculture/farming/4358115/Cows-with-names-produce-more-milk-scientists-say.html

    Although, statistically, a cow with a name may well produce more, it is the individual care that the farmer gives that actually enables the higher milk yield. Although the naming may be an typical of a caring farmer, surely it is an irrelevance and yet, in the first line, it infers that the name is the primary influence, which I suppose is statistically true.

    Odd that a statistically-correct fact can be appear to be so misleading.

  10. I always find that when I refer to my bartender by name, the size of my servings goes up. But then so does my tip.

  11. I was fortunate to learn SAS, then S, then S-Plus, and then R, which is derived from the original S of Bell Labs. I recommend Visualizing Data by William S. Cleveland (1993, Hobart Press) to get an idea what object-oriented visual statistical analysis looks like.

    As wonderful as R is (and it is a godsend to statistics), it is still possible to use it to turn out junk analyses. Garbage in, garbage out, as they say.

    Re Taleb. He may be a pompous ass but his message is valuable. Most risk assessment is prone to overconfidence. S**t happens, as they say.

  12. What’s in a name? that which we call a rose by any other name would smell as sweet.
    It yields better results with the ladies not to talk to their breasts, why shouldn’t it work with cows?
    Tell Daisy what she wants to hear and she’ll pay you in kyn.

    So Taleb states the obvious after the event. Shame he uses the black swan as an analogy for something unforeseen. I saw one last time I fed a group of swans. There’s often a black one. His “blind man” imagery that he uses to represent idiocy is odd. Some of the most intelligent men I’ve ever known were totally blind.
    Joe Nocera’s article was refreshing. He speaks as an outsider but with a depth of knowledge enough to explain the topic to the reader. Thanks for linking to it.

  13. Airport security anecdote:

    I was carrying-on a smallish backpack, which I sent through the conveyor belt x-ray scanner successfully. But I myself kept triggering the walk through alarm. Finally after much wondering, the problem was traced to a garden variety thinly metal lined tobacco pouch in my pocket. But when I reached my destination, I was looking through my backpack and was stunned to find 5 .38 cal. “NATO round” bullets sequestered away deep within the folds of one of the pouches! They aren’t small. [I also use the backpack out in the wilds and unload my gun before driving, placing the bullets in the backpack.] I guess I was lucky, but the bullets made it through “security” without a problem….And it’s not really even a very busy Airport which I flew out from.

  14. Excellent article on R.

    If I may be so bold to I would like to add my own lowly experience on improbable occurrences. I’m a civil draftsman for a local Council; as such I design stormwater structures, some of which are rather large. In engineering a rare event storm may be a 1 in 100 year storm event, or for major highways, you might design for 1 in 1000 year flood events.

    The one thing we do recognise is that we aren’t dealing with a 1% possibility of a major event, but that the major event will happen about once every 100 years.
    It will happen.

  15. I had to use punch cards when I learned BASIC programming language in my freshman year. Those who are old enough probably remember how much fun it was to use punch cards. It’s quite different now.

    There is an R package called R Commander that uses a menu driven interface to statistical methods comparable, say, SAS or SPSS. Google it for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *