# March Madness Tournament Analysis: Which Seed Has Best Chance Of Winning?

Most know what the structure of the March Madness basketball tournament looks like. I am not most; I had to look it up; what I don’t know about college basketball would fill volumes: nevertheless, I was able to discover this picture. 64 teams are seeded into one of four groups; these teams are paired and play knock-out games, the winner advancing to play the team that bested its opponent, and this continues until one team is left standing.

This structure is similar to the *First Things* Tournament of Novels. There, individual novels paired off, the winners advancing to the next bracket, and so forth. The same structures are also used in playoffs in the major leagues.

There is undoubtedly much extant analysis on the statistical properties of these kinds of tournaments, but my knowledge of these results mirrors my acquaintance with basketball. However, we can say some interesting things with just a little effort. The simplest question is what effect seeding has on the chances of winning the tournament.

Suppose teams (or novels, or whatever) enter the tournament with fixed measures of ability, higher indicating the superior entity (team, novel, etc.). Then it is easy to see that the best team always wins its tournament. This is because no matter where team number 1 is seeded, that team always beats its opponent because its ability is always higher than any other team’s. Thus, if all entities had fixed strengths, then tournaments would be boring, the outcome known before they begin.

It is more likely that teams can be ranked on strength—say, their historical win percentage—and that weaker teams (based on this ranking) have a non-zero chance of beating stronger teams. Clearly, the higher the ranking the more likely it is the better team beats the weaker team. The closer in strength two teams are, the more any contest between them becomes a toss up, i.e. the closer the chance of winning nears 50%.

Teams must be seeded into the “brackets.” This can be done in many ways, but I chose two: random seeding, whereby each team regardless of strength is placed uniformly into the brackets; and by strength, whereby each team is initially matched with its next strongest competitor. That is, the team with the highest strength is initially matched against the team with the second highest strength, and so on.

Some measure of strength must be chosen; actually, all we need is a distance. If the distance, measured in terms of strength, between the best and worst teams is large, then we would expect the best team to have a better chance of winning the tournament than if this distance is small. I used three relative distances: 0.6, 0.4, and 0.2, indicating the strength of the worst team: strength is a number between 0 and 1; think of it like the historical win percentage; the best team always had a winning percent of 0.8.

Suppose a team with a strength of 0.8 faces a team with a strength of 0.4; then the chance the better team wins is

(0.8 – 0.4) / 2 + 0.5 = 0.7

and so on for all match-ups.

Finally! If there were only 2 teams in the tournament and the distance was 0.6, the chance the best team wins is 80%. If there were 64 teams, and all were randomly seeded, then the chance that the top team wins the entire tournament is about 5% (see the solid black line with open circles).

As the distance between the best and worst teams decreases, so does the chance that the top teams takes it all, as we might expect. The top-rated team always has the best chance of winning among all other teams.

The dashed blue line is the “uniform strength line”, it is the chance any team wins if all teams are equally matched at the beginning. Thus, the distance between (say) the black line and the dashed blue indicates the lift superior strength imbues. (The lines are not straight because each point is based on a simulation of 5000 tournaments.)

The other dashed lines indicate the chance that the *worst* team in the tournament wins. It is also non-zero, but always well below the best team.

Next comes strength, or positioning seeding. The best team still has the highest chance of winning, but that chance is significantly diminished. Even the worst team now has hope. The (simple) lesson is that if the best teams face each other early, rich blood will be shed, leaving the more anaemic teams in a position to do serious damage.

What to do next is best-worst seeding, where the best team is paired with the worse, the second best with the second worst, and so on. Presumably, the best team will have the top chances of winning tournaments like this. Just how much better a chance than with random seeding, I leave for you as an exercise. (R code for the two simulations above can be downloaded here. Try incorporating actual historical win rates.)

*This post was inspired by long-time reader J Ferguson.*

For what it’s worth, my wife filled out a bracket without realizing that the teams were seeded at all – she had no knowledge of what teams were expected to win. And of course, she is winning our pool of about 20 people. By a lot.

I think that the NCAA has gotten worse at seeding teams over the last few years, mainly because the best teams have gotten worse and the mediocre teams have gotten better. seeds 5 through 12 all have basically the same ability.

You should definitely take a look at Raymond’s highly scientific predictions for the 2011 NCAA men’s basketball tournament. He does this every year, with a different way of picking. His knowledge seems similar to that of Briggs.

This year, Raymond basically used BCS (football) rankings. In the past, he’s used things like university president tenure / salary.

A buddy of mine posts some stats that he cooks up each year (see the posts just earlier than that one for the actual stats).

Fun homework…

“The top-rated team always has the best chance of winning among all other teams.”

This isn’t true. In my random seeding tests, the strongest team had the highest probability of winning the tournament only ~60% of the time in the 64 team case. Also, I rewrote the code to compute the probabilities of every team winning the tournament rather than simulating the tournament. I’ll send the code over if you’re interested.

mt,

I am so proud of you! 10 extra credit points for handing your work in ahead of time. To everybody else, let mt’s diligence be a lesson to you.