Skip to content
April 28, 2008 | 20 Comments

Hitting or Pitching. Which wins more games?

By Tim Murray and William Briggs

You obviously need to score runs to win baseball games, and buying better hitters does this for a team. But you also need to keep your opponent from scoring too many runs, and buying better pitchers does this. Good, error-free, fielding, all other things being equal, will also help a team keep the runs scored against it low. Most teams cannot afford to buy both the best batters and the best hurlers, so they have to make decisions.

You’re the newly appointed manager for your favorite team. The roster is nearly made out, and you find you have money for one more player. You can buy a hitter to improve your team’s overall batting average (BA) or you can acquire a pitcher to lower your team’s earned run average (ERA). What do you do?

We decided to try and answer this question by looking at the complete data from the 2001 to the 2007 seasons for all teams in Major League Baseball. For each team, the number of regular season Wins, batting average, earned run average, number of errors, which league American or National, and total payroll were collected. We also counted the total runs scored for and allowed for each team, but since these statistics were so closely connected with batting average and earned run average, we don’t consider them further.

Payroll is obviously used to buy what teams consider, but as fans know to their grief do not always work out to be, the best players. If winning more games was simply a matter of increasing the payroll, the New York Yankees would win every World Series. Thankfully, then, money isn’t everything.

But it is something. This picture shows the payroll by the number of wins, with each team receiving its own color (since this is for seven years, each team appears seven times on this, and all other, plots). The team to the far right in blue are the Yankees, far exceeding any other team in money spent. The club next to them in red are the Boston Red Sox. There is a huge difference in the amount of money spent between teams. The 2006 Florida Marlins spent the least at about $15 million but won a respectable 78 games. They were followed closely by Tampa Bay, which in 2000 spent about $20 million, only rising to $24 million by 2007. Their wins were steady at around 66.

wins by payroll

A horizontal line has been drawn in at 90 games to show that there is still an enormous range of team payrolls for clubs winning at least this impressive number of games. For example, the 2001 Oakland A’s spent only about $34 million to capture 102 games. They increased the payroll a mere $6 million the next year and won 103 games. Oakland, as documented in the book Moneyball by Michael Lewis, didn’t really drop much below 90 games until last year, winning only 76 games while spending the most they ever had, nearly $80 million.

While spending a lot does not guarantee winning the most games in any year, it does help. The Yankees, for example, never dropped below 94 games (in 2007). Boston was the second biggest spender, and it has helped them win at least 82 games a year. However, most teams cannot spend nearly as much these two. Other teams must be grateful that money isn’t everything.

This second picture explains why money can’t necessarily buy happiness. Each of the three predictive statistics, BA, ERA, and Errors, is plotted against Payroll. A statistical (“nonparametric”) regression line is drawn on each to give a rough, semi-qualitative idea of the relationship of the variables. The signals go in the expected direction: larger payrolls mean, on average, higher BAs, lower ERAs, and lower numbers of Errors. But none of the signals are very strong.

wins by BA, ERA, and Errors

To explain what we mean by that, pick any level of payroll, say $100 million. Then look at the scatter around that number (the points below and above the solid line). With BA, the scatter is just about as wide as the range of team batting averages in the data, which are .240 to .292. The same is true for both ERAs and Errors. Still, there is a general weak trend: spending more money does, very crudely, buy you a better team.

But not much better. For example, if you wanted to spend enough to be 90% sure of upping your team’s batting average 5 points (from the median of .268 to .273), you’d have to shell out an extra $50 million (this is after controlling for League, Errors, and team ERA). That’s a huge increase in team salaries. Even worse, the players you buy would have to have extraordinarily high batting averages to bring the entire team’s average 5 points higher. It’s the same story for ERA and Errors. The point being, is that predicting what players will do, paying more money for those you consider better, and their actual performance after you buy them is not just a tricky business, but an almost impossible one.

This still doesn’t answer what is better, in the sense of predicting more wins: hitting or pitching. Take a look at this picture:

BA, ERA, and Errors frequency by League

This shows fancy, souped-up, “histograms” (called density estimates) for the frequency of BA, ERA, and Errors by League. Higher areas on the graph, like a regular histogram, mean that number is more likely. For example, the most likely value of ERA for teams in the National League is just over 4.0.

It’s clear from these pictures that the American League teams have on average higher ERAs and BAs than do clubs in the National League. Obviously, the designated hitter rule for the American League accounts for most, if not all, of this difference. There doesn’t seem to be any real differences in Errors between the two Leagues, which makes sense. The League differences between ERA and BA have to be accounted for when answering our main question.

This next series of pictures shows there is even more complexity. The first is a plot, separated by League, of each teams’ BA by ERA. There is some weak evidence that as ERA increases, BA drops, especially in the American (A) League, perhaps another remnant of the designated hitter effect. But this isn’t a very strong indicator.

BA by ERA by League

This next pictures shows some stronger relationships. The top two panels, again separate by League, are plots of ERA (on the vertical axis) by Errors (on the horizontal axis): as ERA increases, so do numbers of Errors. Similarly for BA, as numbers of Errors increases, the batting averages of teams tend to decrease. All this evidence means that when a team is bad, it tends to be bad in all three dimensions, and when it is good, it tends to be good in all three dimensions. This is no surprise, of course, but we do have to control for these factors when answering our question.

BA, ERA, by Errors by League

We finally come to our main question, which we answer with a complicated statistical model, one which accounts for all the evidence we have so far demonstrated. The type of model we use accounts for the fact that the number of Wins is a discrete number, by which we mean the total Wins can be 97 or 98, say, but they cannot be 97.4. In technical terms, it is called a quasi-Poisson generalized linear model, a fancy phrase that means that the model is very like a linear regression model, about which you may have heard, but with some twists and extra knobs that allow us to control for our interacting factors and discrete response.

The answer lies in these complicated-looking pictures. Let’s work through them slowly. First, only look at the top picture, which is the modeled, or predicted number of wins by various batting averages.

Predicted wins

There are two sets of three curves. The brownish is for the National League, and the blueish for the American. Now, in order to predict how many wins a team will have, we have to supply four things: their expected BA, ERA, number of errors, and League. That’s a lot of different numbers, so to simplify somewhat, we will fix the number of Errors at the median observed figure, which is 104. (Changing Errors barely changes the results.)

We still have to plug in a BA, ERA, and League in order to predict the number of wins. We first start by plugging in the BA over the range of observed values, but we still have to supply an ERA. In fact, we supply three different ERAs: the observed median, and first and third quartiles, which are: 4.04, 4.37, and 4.74. For the American League, these are the three blue curves: the top one corresponds to the lowest ERA of 4.04, the middle for the value of 4.37, and the bottom for the highest value of 4.74. To be clear: each point on these curves is the result of four variables: a BA, an ERA, a number of Errors, and a League. From these four variables, we predict the number of wins, which varies as the four variables do.

All of these curves sweep upwards, implying the obvious: higher BAs lead to more predicted Wins, regardless of ERA or League. At the lowest BAs, differences in ERA are the largest in the American League. Meaning that, if your team is hitting very poorly, small variations in pitching account for large changes in the number of games won. To make sure you see this, focus on the very left-most points of the graph, where the BAs are the smallest. Then look at the three blue curves (American League): the three left-most points on the blue curve are widely separated. Moving from a team ERA of 4.74 to 4.04 increases the number of games won from 61 to 78, or 17 more a season, which is of course a lot. But when a team is batting well, while differences in ERA are still important, they are not as influential. These are the right-most blue points on the figure: notice how at the largest BAs, the three curves (again representing different ERAs) are very close together. If a team in the American League is batting very well, improvements in pitching do not account for very many more games won.

That is so for the American League, but perhaps surprisingly not for the National, where the opposite occurs. Differences in ERA are more important for high batting averages, but not as important for low ones: better pitching becomes more crucial as the team bats better. The brown curves spread out more for high BAs, and are tighter at low BAs.

Now let’s look at the bottom picture. This is the same sort of thing, but for the range of ERAs are three fixed levels of BA: .259, .266, and .272. The top curves are the highest BA, and the bottom curves the lowest. Looking first at the American League, we can see that when the team ERA is low, differences in BA do not account for much. In fact, when the team ERAs are the lowest, improvements in batting in the American League are almost not different at all! When team ERAs are high, changes in BA mean larger differences in numbers of games won: the spread between the blue lines increases as ERA increases.

Again, the situation is opposite for the National League: when the team ERA is low, changes in BA are more important than when teams ERAs are high. In this league, when team ERAs are low, good batting can make a big difference in numbers of games won. But when ERAs are high, improvements in batting do not change the number of games won very much.

Once more, we point out that we can draw each of these three curves again for different numbers of Errors. We did so, but found that the differences between those curves and the ones we displayed were minimal, but not negligible: for example, adding a whopping 40 errors onto a team that ordinarily only commits 80, on average only costs them 2 games a season. Higher BAs or ERAs can mitigate this somewhat, from losing 2 games to only losing about 1 extra game a season. So while Errors are important, they are by far decisive factors in an overall season.

So what should you do?

Look again at the two plots. In the BA plot, the highest number of predicted wins, for a BA of .292 for the ERA of 4.04 (the lowest pictured) is about 104 games for National League teams, and about 100 for American League clubs. But the hitoghest number of predicted wins, looking at the ERA plot, for teams with the lowest ERA of 3.13 with the BAs of .272 (the highest pictured) is about 111 games for the National League and 107 games for the American. Conversely, back in the BA plot, those teams with the lowest BAs of .240 and high ERAs of 4.74 won only about 61 games in the American League and 67 in the National. While—in the ERA plot—teams with the worst ERAs of 5.71 and lowest BAs of .259 won only about 56 games in the American and 62 in the National.

Clearly, then, pitching is more important than batting overall: more games on average will be won by those clubs who have the lower ERAs than those teams with the higher BAs.

But that isn’t necessarily the answer to our question. Remember that you only have money for one more player. Should you recruit or trade for a better pitcher or batter? It depends on what kind of team you have now. Our team right now has a certain ERA, BA, and expected number of Errors, so what do we do? The final answer is in this last picture.

Effects of ERA and BA

This shows improvement, in either ERA (decreasing) or BA (increasing) on the bottom axis. The other axis shows for each “unit” of improvement (0.05 for ERA, 0.001 for BA), the additional games won. These are the same, in essence, of the plots above, but they show the data in a different fashion (the same colors still represent the two leagues). The way this figure works is that you pick a certain point, say a BA of .266 or an ERA of 4.34 (which is the same point on the graph), and then move upwards (to the right on the horizontal axis) by one “unit” (0.05 for ERA, 0.001 for BA) and then pick off the number of additional games won.

No matter where we are on the graph, ERA easily wins this race, in the sense that buying a better picture to improve the ERA wins more games than buying a better batter to improve the BA. This is true for either league. (These pictures are also concocted using the median values of ERA, BA, and Error, as mentioned above: do not worry if you don’t understand this; the results do not change for the other values.)

So spend your money on the pitcher.

Tim Murray is a student at Central Michigan University and can be reached at William Briggs is a statistician in New York City and can be reached at
April 24, 2008 | 11 Comments

CONTEST: Preliminary Discussion of the “Best Internet Conspiracy Theory”

Best Internet Conspiracy Theory
This is the first posting preliminary to the announcement of an Official Contest to find the Best Internet Conspiracy Theory.

The Contest will be officially announced in about one week.

This contest is primarily a public service for those who contribute regularly to sites like,,, etc. Many of those people are forced to spend an inordinate amount of time concocting theories that neatly explain messy world events. This has led to an enormous increase in carpal tunnel and internet addition syndrome cases worldwide. Thus, we want to provide these overworked souls a handful of ready-made theories to which they can refer. The theories we have in mind are described in the contest rules below.

I will need help in publicizing this Contest, and may need help in judging entries, depending on how many I receive. Volunteers should email me: put “CONTEST” in the subject line.

A sketch of the rules is as follows:

(1) All entries must be shorter than 150 words. Shorter entries will receive more weight than longer ones.

(2) Entries—one per person—must be placed into the Comments Section of the Official Contest Post. No discussion will be allowed on that post; only Contest entries are allowed.

(3) All entries will be judged by the intrinsic awfulness, brevity, completeness of derangement, plausibility, specificity (names named), and potential appeal to the everyday, e.g., Digg reader.

(4) The Contest will last approximately two to three weeks.

(5) A prize, or prizes, to be decided later, will be announced.

(6) An example of an Internet Conspiracy Theory:

Certain scientists discovered a formula, derived from an alien artifact dug up in Area 51, for turning ordinary sea water into limitless, cheap fuel. Green Energies, a subsidiary of, based in the World Trade Center was about to sell this discovery and eliminate Global Warming, when the Oil Companies learned of it. Big Oil contacted George Bush, who ordered the Twin Towers destroyed before the secret could get out. Ron Paul found out about this and was going to expose the entire matter had he won the Republican Nomination, which he would have done except the Mainstream Media ignored him.

Please do NOT post any conspiracy theories now! Save them for the Contest.

April 21, 2008 | 73 Comments

CO2 and Temperature: which predicts which?

Parts of this analysis were suggested by Allan MacRae, who kindly offered comments on the exposition of this article which greatly improved its readability. The article is incomplete, but I wanted to present the style of analysis, which I feel is important, as the method I use eliminates many common errors found in CO2/Temperature studies. Any errors are, of course, entirely my own.

It is an understatement to say that there has been a lot of attention to the relationship of temperature and CO2. Two broad hypotheses are advanced: (Hypothesis 1) As more CO2 is added to the air, through radiative effects, the temperature later rises; and (Hypothesis 2) As temperature increases, through ocean-chemical and biological effects, CO2 is later added to the atmosphere. The two hypotheses have, of course, different consequences which are so well known that I do not repeat them here. Before we begin, however, it is important to emphasize that both or even neither of these hypotheses might be true. More on this below.

The source of monthly temperature data is from The University of Alabama in Huntsville, which starts in January 1980. Temperature is available at different regions: global, Northern Hemisphere, etc. The monthly global CO2 is from NOAA ERSL.

We want to examine the CO2/temperature processes at the finest level allowed by the data, which here is monthly at the time scale, and Northern and Southern Hemisphere and the tropics at the spatial scale. The reason for doing this, and not looking at just yearly global average temperature and CO2, is that any processes that occur at times scales less than a year, or occur only or differently in specific geographic regions, would be lost to us. In particular, it is true that the CO2/temperature process within a year is different in the Northern and Southern hemispheres, because, of course, of the difference in timing of the seasons and changes in land mass. It is also not a priori clear that the CO2/temperature process is the same, even at the yearly scale, across all regions. It will turn out, however, that the difference between the regional and global processes are minimal.

The question we hope to answer is, given the limitations of these data sets, with this small number of years, and ignoring the measurement error of all involved (which might be substantial), does (Hypothesis 1) increasing CO2 now predict positive temperature change later, or does (Hypothesis 2) increasing temperatures now predict positive CO2 change later? Again, this ignores the very real possibility that both of these hypotheses are true (e.g., there is a positive feedback).

During the course of an ordinary year, both Hypotheses 1 and 2 are true at different times, and sometimes neither is true: in the Northern Hemisphere, the temperature and CO2 both increase until about May, after which CO2 falls, though temperature continues to rise. In the Southern Hemisphere, temperature falls in the early months, while CO2 rises, and so on. These well known differences are due to combinations of respiration and changes in orbital forcing.

There are, then, obvious correlations of CO2 and temperature at different monthly lags and in different geographic regions (I use the word “correlation” in its plain English meaning and not in any statistical sense). We are not specifically interested in these correlations, which are well know and expected, and whose role in long-term climate change is minimal. The existence of these correlations present us with a dilemma, however. It might be that, for either Hypothesis 1 or 2, the time at which either CO2 or temperature changes in response to changes in forcing is less than one year, but disentangling this climate forcing with the expected changes due to seasonality, is, while possible, difficult and would require dynamical modeling of some sort (in the language of time series, the seasonal and long-term signals are possibly confounded at time scales less than 1 year).

Therefore, instead of looking at intra-year correlations, we will instead look at inter-year correlations. This introduces a significant limitation: any real, non-seasonal, correlations less than 1 year (or at other non-integer yearly time points) will be lost and it will be possible that we are misled in our conclusions (in the language of time series, the “power” on these non-integer-year lags will be aliased onto the 1 year lag). What is gained by this approach, however, is that there is no chance of misinterpreting lags less than one year as being due to a process other than seasonality. However, the main purpose of this article is not to identify the exact dynamical and physical CO2/temperature relationship, nor to identify the lag that best describes it; we just want to know is Hypothesis 1 or Hypothesis 2 more likely on time scales greater than 1 year?

Most of us have seen pictures like this one, which shows the monthly CO2 for 1980-1984; also shown in the Northern Hemisphere (NH) temperature anomaly (suitably normalized to fit on the same picture).
Co2 through time
You can immediately see the intra-year CO2 “sawtooth”. This sawtooth makes it difficult to find a functional relationship of CO2 and temperature. I do not want to model this sawtooth, because I worry that whatever model I pick will be inadequate, and I do not immediately know how to carry the uncertainty I have in the model through to the final conclusion about our Hypotheses. I also do not want to smooth the sawtooth, or perform any other mathematical operation on the observed CO2 values within a year, because that tends to inflate measures of association.

Instead, let’s look at CO2 in a different way:
Co2 through time by month
This is yearly CO2 measured within each month: each of the 12 months has its own curve through time. It doesn’t really matter which is which, though the two lowest curves are from the winter months (for those in the NH). What’s going on is still obvious: CO2 is increasing year by year and the rate at which it is doing so is roughly constant regardless of which month we examine.

Looking at the data this way show that the sawtooth has effectively been eliminated, as long as we examine year-to-year changes within each month through time.

Suppose we were only interested in Decembers and in no other months. Let us plot the actual December temperature from 1980 to 2006 on the x-axis and on the y-axis plot the increase in CO2 for the years 1981 to 2007. Shown in the thumbnail below is this plot: with black dots for the Southern Hemisphere (SH), red dots for the NH, and green dots for the tropics (redoing the analyses with global or sea surface temperatures instead of separating hemispheres produces nearly indistinguishable results). For example, in one year, the NH temperature anomaly was -0.6: this was followed in the next year by an increase of about 1.5 ppm of CO2 (this is the left-most plot on the figure).
Co2 through time by month

The solid lines estimate the relationship between temperature and the change in CO2 (the dCO2/dt on the graph). These are loess lines and estimate the relationship between the two variables. If the loess lines were perfectly straight (and pointed in any direction), we would say the two measures are linearly correlated. The lines aren’t that straight, so the data does not appear to be that well correlated, linearly or otherwise.

Click on the figure (do this!) to see the same plot for each of the 12 months (right click on it and open it in a new window so you can follow the discussion). Notice anything? Generally, when temperature increases this year CO2 tends to increase in the following year. Hypothesis 2 is more likely to be true given this picture.

The loess lines are not always straight, which means that a straight-line model, i.e. ordinary correlation, is not always the best model. For example, in Januaries, until the temperatures anomalies get to 0 or above, temperature and change in CO2 have almost no relationship; after this point, the relationship becomes positive, i.e., increasing temperatures leads to increases in the change of CO2. The strength of the relationship also depends on the month: the first six months of the year show a strong signal, but the later six show a weakening in the relationship, regardless of where in the world we are.

Coincidence? Now plot the actual December CO2 from 1980 to 2006 on the x-axis and on the y-axis plot the change (increase or decrease) in temperature for the years 1981 to 2007. For example, in one year, the NH CO2 was 340 ppm: this was followed in the next year by a temperature decrease of about -0.5 degrees (this is the bottom left-most plot on the figure). No real signal here:
Co2 through time by month

Again, click on the figure (do this!) to see all twelve months. There does not appear to be any relationship in any month between CO2 and change in temperature, which weakens our belief in Hypothesis 1.

It may be that it takes two years for a change in CO2 or temperature to force a change in the other. Click here for the two-year lag between temperature and change in CO2; and here for the two-year lag between CO2 and change in temperature. No signals are apparent in either scenario.

As mentioned above, what we did not check are all the other possibilities: CO2 might lead or lag temperature by 9.27, or 18.4 months, for example; or, what is more likely, the two variables might describe a non-linear dynamic relationship with each other. All I am confident of saying is, conditional on this data and its limitations etc., that Hypothesis 2 is more probable than Hypothesis 1, but I won’t say how much more probable.

It is also true that, over this period of time and using this data, CO2 always increased. The cause of this increase sometimes was related to temperature increases (rising temperatures led to more CO2 being released) and sometimes not. We cannot say, using only this data, why else CO2 increased, although we know from other sources that CO2 obviously increased because of human-cased activities.

April 20, 2008 | 8 Comments

It was bound to happen

Remember how you used to cavalierly ignore those “Keep of the Grass Signs” in your un-enlightened youth?

Well, you brutal, uncaring, beast.

For it has finally been announced—from Europe, naturally, from the Swiss government-appointed Federal Ethics Committee on Non-Human Biotechnology—that plants have feelings too.

They have authoritatively stated that “interfering with plants without a valid reason as ‘morally inadmissible.'” This means the next time you carve you and your sweetheart’s name into a tree can lead to a nice, long jail sentence. (If the famed Swiss police ever catch you, that is.)

The ethics committee did grudgingly admit—for now—that “all action involving plants for the preservation of the human race was morally justified.” Meaning, I suppose, that it’s still OK to eat them. I probably don’t need to explain to you the fix we’d be in if we could not. But there is only direction for the Enlightened to go, so stay tuned for an announcement banning the use of “higher” plants, such as maybe corn and tomatoes, for use in the “preservation of the human race.”

The august Swiss body has also found that “genetic modification of a plant did not contradict the idea of its ‘dignity’.” Yes, I can see how a kumquat would not find it an affront to be genetically probed. Until, that is, the kumquat learns how easily this sort of thing can sully one’s reputation. It’s only matter of time before a lawyer figures this out and brings a case to Brussels.

Just keep all this in mind, think about what you are doing—raise your awareness!—next time you are at the salad bar.