By Tim Murray and William Briggs
You obviously need to score runs to win baseball games, and buying better hitters does this for a team. But you also need to keep your opponent from scoring too many runs, and buying better pitchers does this. Good, error-free, fielding, all other things being equal, will also help a team keep the runs scored against it low. Most teams cannot afford to buy both the best batters and the best hurlers, so they have to make decisions.
You’re the newly appointed manager for your favorite team. The roster is nearly made out, and you find you have money for one more player. You can buy a hitter to improve your team’s overall batting average (BA) or you can acquire a pitcher to lower your team’s earned run average (ERA). What do you do?
We decided to try and answer this question by looking at the complete data from the 2001 to the 2007 seasons for all teams in Major League Baseball. For each team, the number of regular season Wins, batting average, earned run average, number of errors, which league American or National, and total payroll were collected. We also counted the total runs scored for and allowed for each team, but since these statistics were so closely connected with batting average and earned run average, we don’t consider them further.
Payroll is obviously used to buy what teams consider, but as fans know to their grief do not always work out to be, the best players. If winning more games was simply a matter of increasing the payroll, the New York Yankees would win every World Series. Thankfully, then, money isn’t everything.
But it is something. This picture shows the payroll by the number of wins, with each team receiving its own color (since this is for seven years, each team appears seven times on this, and all other, plots). The team to the far right in blue are the Yankees, far exceeding any other team in money spent. The club next to them in red are the Boston Red Sox. There is a huge difference in the amount of money spent between teams. The 2006 Florida Marlins spent the least at about $15 million but won a respectable 78 games. They were followed closely by Tampa Bay, which in 2000 spent about $20 million, only rising to $24 million by 2007. Their wins were steady at around 66.
A horizontal line has been drawn in at 90 games to show that there is still an enormous range of team payrolls for clubs winning at least this impressive number of games. For example, the 2001 Oakland A’s spent only about $34 million to capture 102 games. They increased the payroll a mere $6 million the next year and won 103 games. Oakland, as documented in the book Moneyball by Michael Lewis, didn’t really drop much below 90 games until last year, winning only 76 games while spending the most they ever had, nearly $80 million.
While spending a lot does not guarantee winning the most games in any year, it does help. The Yankees, for example, never dropped below 94 games (in 2007). Boston was the second biggest spender, and it has helped them win at least 82 games a year. However, most teams cannot spend nearly as much these two. Other teams must be grateful that money isn’t everything.
This second picture explains why money can’t necessarily buy happiness. Each of the three predictive statistics, BA, ERA, and Errors, is plotted against Payroll. A statistical (“nonparametric”) regression line is drawn on each to give a rough, semi-qualitative idea of the relationship of the variables. The signals go in the expected direction: larger payrolls mean, on average, higher BAs, lower ERAs, and lower numbers of Errors. But none of the signals are very strong.
To explain what we mean by that, pick any level of payroll, say $100 million. Then look at the scatter around that number (the points below and above the solid line). With BA, the scatter is just about as wide as the range of team batting averages in the data, which are .240 to .292. The same is true for both ERAs and Errors. Still, there is a general weak trend: spending more money does, very crudely, buy you a better team.
But not much better. For example, if you wanted to spend enough to be 90% sure of upping your team’s batting average 5 points (from the median of .268 to .273), you’d have to shell out an extra $50 million (this is after controlling for League, Errors, and team ERA). That’s a huge increase in team salaries. Even worse, the players you buy would have to have extraordinarily high batting averages to bring the entire team’s average 5 points higher. It’s the same story for ERA and Errors. The point being, is that predicting what players will do, paying more money for those you consider better, and their actual performance after you buy them is not just a tricky business, but an almost impossible one.
This still doesn’t answer what is better, in the sense of predicting more wins: hitting or pitching. Take a look at this picture:
This shows fancy, souped-up, “histograms” (called density estimates) for the frequency of BA, ERA, and Errors by League. Higher areas on the graph, like a regular histogram, mean that number is more likely. For example, the most likely value of ERA for teams in the National League is just over 4.0.
It’s clear from these pictures that the American League teams have on average higher ERAs and BAs than do clubs in the National League. Obviously, the designated hitter rule for the American League accounts for most, if not all, of this difference. There doesn’t seem to be any real differences in Errors between the two Leagues, which makes sense. The League differences between ERA and BA have to be accounted for when answering our main question.
This next series of pictures shows there is even more complexity. The first is a plot, separated by League, of each teams’ BA by ERA. There is some weak evidence that as ERA increases, BA drops, especially in the American (A) League, perhaps another remnant of the designated hitter effect. But this isn’t a very strong indicator.
This next pictures shows some stronger relationships. The top two panels, again separate by League, are plots of ERA (on the vertical axis) by Errors (on the horizontal axis): as ERA increases, so do numbers of Errors. Similarly for BA, as numbers of Errors increases, the batting averages of teams tend to decrease. All this evidence means that when a team is bad, it tends to be bad in all three dimensions, and when it is good, it tends to be good in all three dimensions. This is no surprise, of course, but we do have to control for these factors when answering our question.
We finally come to our main question, which we answer with a complicated statistical model, one which accounts for all the evidence we have so far demonstrated. The type of model we use accounts for the fact that the number of Wins is a discrete number, by which we mean the total Wins can be 97 or 98, say, but they cannot be 97.4. In technical terms, it is called a quasi-Poisson generalized linear model, a fancy phrase that means that the model is very like a linear regression model, about which you may have heard, but with some twists and extra knobs that allow us to control for our interacting factors and discrete response.
The answer lies in these complicated-looking pictures. Let’s work through them slowly. First, only look at the top picture, which is the modeled, or predicted number of wins by various batting averages.
There are two sets of three curves. The brownish is for the National League, and the blueish for the American. Now, in order to predict how many wins a team will have, we have to supply four things: their expected BA, ERA, number of errors, and League. That’s a lot of different numbers, so to simplify somewhat, we will fix the number of Errors at the median observed figure, which is 104. (Changing Errors barely changes the results.)
We still have to plug in a BA, ERA, and League in order to predict the number of wins. We first start by plugging in the BA over the range of observed values, but we still have to supply an ERA. In fact, we supply three different ERAs: the observed median, and first and third quartiles, which are: 4.04, 4.37, and 4.74. For the American League, these are the three blue curves: the top one corresponds to the lowest ERA of 4.04, the middle for the value of 4.37, and the bottom for the highest value of 4.74. To be clear: each point on these curves is the result of four variables: a BA, an ERA, a number of Errors, and a League. From these four variables, we predict the number of wins, which varies as the four variables do.
All of these curves sweep upwards, implying the obvious: higher BAs lead to more predicted Wins, regardless of ERA or League. At the lowest BAs, differences in ERA are the largest in the American League. Meaning that, if your team is hitting very poorly, small variations in pitching account for large changes in the number of games won. To make sure you see this, focus on the very left-most points of the graph, where the BAs are the smallest. Then look at the three blue curves (American League): the three left-most points on the blue curve are widely separated. Moving from a team ERA of 4.74 to 4.04 increases the number of games won from 61 to 78, or 17 more a season, which is of course a lot. But when a team is batting well, while differences in ERA are still important, they are not as influential. These are the right-most blue points on the figure: notice how at the largest BAs, the three curves (again representing different ERAs) are very close together. If a team in the American League is batting very well, improvements in pitching do not account for very many more games won.
That is so for the American League, but perhaps surprisingly not for the National, where the opposite occurs. Differences in ERA are more important for high batting averages, but not as important for low ones: better pitching becomes more crucial as the team bats better. The brown curves spread out more for high BAs, and are tighter at low BAs.
Now let’s look at the bottom picture. This is the same sort of thing, but for the range of ERAs are three fixed levels of BA: .259, .266, and .272. The top curves are the highest BA, and the bottom curves the lowest. Looking first at the American League, we can see that when the team ERA is low, differences in BA do not account for much. In fact, when the team ERAs are the lowest, improvements in batting in the American League are almost not different at all! When team ERAs are high, changes in BA mean larger differences in numbers of games won: the spread between the blue lines increases as ERA increases.
Again, the situation is opposite for the National League: when the team ERA is low, changes in BA are more important than when teams ERAs are high. In this league, when team ERAs are low, good batting can make a big difference in numbers of games won. But when ERAs are high, improvements in batting do not change the number of games won very much.
Once more, we point out that we can draw each of these three curves again for different numbers of Errors. We did so, but found that the differences between those curves and the ones we displayed were minimal, but not negligible: for example, adding a whopping 40 errors onto a team that ordinarily only commits 80, on average only costs them 2 games a season. Higher BAs or ERAs can mitigate this somewhat, from losing 2 games to only losing about 1 extra game a season. So while Errors are important, they are by far decisive factors in an overall season.
So what should you do?
Look again at the two plots. In the BA plot, the highest number of predicted wins, for a BA of .292 for the ERA of 4.04 (the lowest pictured) is about 104 games for National League teams, and about 100 for American League clubs. But the hitoghest number of predicted wins, looking at the ERA plot, for teams with the lowest ERA of 3.13 with the BAs of .272 (the highest pictured) is about 111 games for the National League and 107 games for the American. Conversely, back in the BA plot, those teams with the lowest BAs of .240 and high ERAs of 4.74 won only about 61 games in the American League and 67 in the National. While—in the ERA plot—teams with the worst ERAs of 5.71 and lowest BAs of .259 won only about 56 games in the American and 62 in the National.
Clearly, then, pitching is more important than batting overall: more games on average will be won by those clubs who have the lower ERAs than those teams with the higher BAs.
But that isn’t necessarily the answer to our question. Remember that you only have money for one more player. Should you recruit or trade for a better pitcher or batter? It depends on what kind of team you have now. Our team right now has a certain ERA, BA, and expected number of Errors, so what do we do? The final answer is in this last picture.
This shows improvement, in either ERA (decreasing) or BA (increasing) on the bottom axis. The other axis shows for each “unit” of improvement (0.05 for ERA, 0.001 for BA), the additional games won. These are the same, in essence, of the plots above, but they show the data in a different fashion (the same colors still represent the two leagues). The way this figure works is that you pick a certain point, say a BA of .266 or an ERA of 4.34 (which is the same point on the graph), and then move upwards (to the right on the horizontal axis) by one “unit” (0.05 for ERA, 0.001 for BA) and then pick off the number of additional games won.
No matter where we are on the graph, ERA easily wins this race, in the sense that buying a better picture to improve the ERA wins more games than buying a better batter to improve the BA. This is true for either league. (These pictures are also concocted using the median values of ERA, BA, and Error, as mentioned above: do not worry if you don’t understand this; the results do not change for the other values.)
So spend your money on the pitcher.
While your conclusion is probably correct, much of it’s strength may stem from the fact that ERA is a much better statistical representation of pitcher’s ability than BA is of offensive ability–you need to use OPS, or better yet R(uns)C(reated)/27 outs or VORP (value over replacement player). And, errors is a TERRIBLE measure of fielding proficiency–range factor, or, again, better, zone rating provide some idea of individual fielding, and defensive efficiency rating gives a very good team-wide measure of defensive ability.
There is quite a lot of good statistical work on baseball stats and team wins. This would be a pretty good addition to the field with the proper team-wide stats for offense and fielding.
Uhh, in reference to the last paragraph – exactly how many Andy Warhols does it take to equal one Cy Young?
“Good pitching beats good hitting every time…….and vice versa” attributed to Bob Veale.
The betting lines on baseball games depend on who are the starting pitchers.
Mr Physics,
Thanks for the tips! I’ll see if Tim can get to work on this part. Tim is a fantastic student at Central, and this work arose out of his hard work on a class project in an advanced statistics class.
Briggs
Not a stats guy, but this seems to be similar to the interaction of marginal utility curves in basic Economics.
And yes, the stat categories should be changed to reflect better predictive values.
Finally, total payroll fails to account for the bifurcated salary structure of MLB. Younger players don’t have the ability to become free agents, so they play for less. It would be better to segregate the impact of those players who are/were eligible for free agency to measure return on marginal dollar of free agency spending.
The key question is how do we get the most bang for the marginal dollar spent on a free agent. We should restrict our stats to those generated by players who have sufficient service to be possible free agents (even if they were locked up by the current clubs with long term deals) and restrict our salary numbers to the amounts which have to be spent to procure such players. Then we compare the marginal win shares of free agent-eligible pitching per dollar to the marginal win shares of free agent-eligible hitting per dollar.
[a separate question is whether the dollars would be better spent (long term) on scouting and player development in the minors]
I’d direct Tim to Baseball Prospectus for their PECOTA work. Basically does win predictions using Mr. Physics above numbers.
Amazingly, Baseball’s best statistic may be VORP and is completely ignored by announcers, etc..
For those looking for humor around baseball statistics and how they aren’t that accepted yet, please see http://www.firejoemorgan.com.
In general, this is what just about every serious baseball student probably believes (it’s all about the pitching!). But to nitpick, certain pitchers (i.e., starters) will have a much larger impact on ERA than other pitchers (i.e., closers).
On the batting side, his impact would also depend upon where in the order you expect him to hit (lead off gets more at bats than 7th).
Also, it would be interesting on both sides to see what sort of an impact on the team statistics the different roles had. A starter would expect to get something like 1/5 of the starts (but how long can he go?) while a batter expects 1/9 of the at bats.
Well, I guess we’re getting deeper into Moneyball territory now…
Bill James wrote a bunnch of “Baseball Abstracts” in the 1980s addressing many of these points.
Pete Palmer wrote “The Hidden Gam of Baseball”, also handling baseball in a mathematical manner, though from a different perspective than James.
A better indication of batting strenght, from both James and Palmer, is
(On Base Average) times (slugging percentage).
That number would roughly give the number of runs per plate appearance. James compared Gene Tenace of the Oakland As, who batted about 240 but hit for power and walked a lot, making him one of the teams stars, to Manny Mota of the Pittsburg Pirates, who was, in James’ words, “an honest to Ty Cobb 400 hitter” who couldn’t make the starting lineup because of no power, and a low walk percentage.
Your figure of 61 wins for the 4.74 ERA team VS 78 wins for the 4.04 ERA team reminds me of another of James’ formulas, which was win% is roughly
equal to
(Runs^1.85)/( Runs^1.85 + (opponent Runs)^1.85)
That doesn’t match you figures above, but the 4.74 era team probably had a higher
(unearned runs/earned runs) ratio to help make up some of the difference.
An additional factor may be baseball park effects. I remember when I was a teenager and thought the
Houston Astros had great pitching. It took me a few years to realize the pitchers’ and hitters’ stats were biased by “park” effects.
You guys have too much time on your hands. 🙂
Fred,
I’m just trying to distract myself from the Tigers’ dismal season.
Wow, this is great work, from the article it looks like he based his finding off team success and demise rather than individual players which i feel most of these posts don’t seem to understand, specifically talking about Starters and closers.
Also mentioning Ball park effects shouldn’t hold much weight due to the fact that the team plays half of it’s games in different stadiums and the teams in the same division also play close to a sixth of their games in each other parks.
This is simply amazing work, as an avid baseball fan and as a Math “fan”. Tim keep up the good job and do work!
Hey, Mr. Briggs, the Tigers are presently on a roll, despite a relative lack of hitting. Do not despair!
Henry,
I was particularly happy to see them demoralize the Yankees this week.
(I admit to skimming, so if I missed something mea culpa).
1. I think your analysis missing a discussion of $ versus quality as they differ for pitchers and hitters. IOW, sure getting a pitcher is better, but perhaps the extra money doesn’t buy you that much ERA.
2. Even if it does, perhaps the reason for the discrepancy is a difference in happiness function. Perhaps hitters drive more ball park revenue and thus some trade-off of payment for wins versus payment for fannies in seats is made by the clubs.
I had some additonal thoughts on this. Offence and defence are obviously each 50% of the game. Hitting must be 50%, pitching must be less than 50% since fielding also plays a factor, probably about 5%,
In the National Leage, each non pitcher is a 1/9 component of offence, or 1/18 component of the game. In addition, each batter contributes an average of 1/9 (5%)= 0.556% by fielding. Of course tha actual percentage varies by position. The pitcher makes up (45.56%) for pitching and fielding plus 1/9(1/2) for his hitting ability, or 51.16% of the game, versus 5.56 + 0.56%= 6.12% of the game for each hitter. You might argue that all pitchers are poor hitters, but
a difference of 0.1 runs/game hitting ability between two opposing pitchers can add up. If a National Leage pitcher was able to pitch every game, he’d be worth about 8.36 times as much as the average hitter. Pitching an average of 1/5 of all games, the pitcher would be worth about 1.67 times as much as the average batter.
In the American Leage, the pitcher should be worth only 45.56/51.16= 89% as much, since he contributes nothing to offense.
You might also consider the rich literature by Hunter and Schmidt on the “value of talent” accross different occupations. Essentially most of their work involves characterizing the standard deviation (and some examination of nonnormality) of various bell curves for worker output. They publish(ed) a lot of J Appl Psych.
Good job, Tim. Sabermetrics! If agriculture was the birthplace of statistics, baseball is the vacation condo.
Even the obscure Bill James has a baseball job now, with the Bosoxs. It’s a growing opportunity sector for stat grads. More fun than being an actuary.
The best stat job I ever heard of, though, is the stat whiz who rates golf courses for the USGA. Every GC must be evaluated for cross-handicap purposes. Just because two courses are par 72, it doesn’t mean they are equally difficult. Somebody has to fly around rating them, and he/she gets paid to do it. Might be a direction in which to guide the aggressive yet leisure-prone student.
Another, perhaps more realistic stat-leisure field is in market analysis for destination resorts. Every resort draws visitors from certain zip codes, at certain times of the year. Most resorts do not do intensive analysis of their particular markets, nor do they direct their advertising efficiently. Often their marketing is highly subjective rather then analytical. They need stat help. They also need their analysts to visit the resorts and be comped for rooms, meals, and leisure activities, to get familiar with the amenities offered. Aloha Hoi!
Mr. Briggs,
Thanks for sharing! Great info, and I am going to send it to some friends. Hope you are well, and thanks again for the interview.
Mike D: Maybe that is right (I’m not sure). I would say that I talked with the head of Breckenridge, which is a resort with both heavy destination and day skiing. He was aware of the need to understand the two populations and their different needs and even (he wouldn’t quite admit this, but it’s obvious) the differing price sensativity, ability to exploit, etc. He said, “we’ve paid for all kinds of statistical market research”.
Good start to the discussion, a lot of field still left to plow. I think throwing out some data that just doesn’t matter would also help bring to light critical factors. The mention that slight improvements in team BA matters more for those with already good/great ERA’s is interesting. Many teams each year are going through rebuilding and are fielding AAA ballplayers and charging MLB ticket prices. Culling these out of the database would seem to add value to getting crucial results. After all we are looking for a competitive edge in getting to the playoffs/WS and not so concerned with data from teams that are just building for the future and not having much hope of a winning season, ie. 2019 Marlins, Tigers, Orioles, Mariners etc. I would look closer at this idea but maybe cull out somewhere those teams 20 GB or so from data analysis. Not all data is good data for answering the particular question of not only just “what wins games” but “what is it that takes a team above the pack?” I have a feeling this would bring a sharper focus to showing what really matters to teams truly trying to make it to the post season!