Statistics

# Every Family Has Children Until They Have A Boy: Probability Stumper

The Grand Old Stumper

Reader Bryan Davies quoted a poser from Math Overflow (have we heard of that?) which read:

In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?

The question is ill-posed as stated because it includes several unstated assumptions. One set might be, and the one I’ll use, is that the country begins with no kiddies and with n couples, all the same age and who never die during their reproductive years and can reproduce at will and do, always on New Year’s Day, a day of celebration. Further, no babies are killed or die before they are born and none or killed or die once they escape into the wild. Babies are born once per year for every couple until success (a boy!). And no immigration nor emigration.

One last assumption is no genetic engineering or other meddling: forget the kind of things that happen in China and India. Why complicate things? Now, something causes each child to be a boy or girl, but we do not know what this something is in each case. Probability is a measure of information, not of biology. Therefore, given there are only two concrete choices, we deduce from our assumptions the probability (which I repeat measures our uncertainty, not the biology) is 1/2 for boys, same for girls.

So, Year 0, there are no boys, no girls and no ratio neither.

Year 1, the uncertainty in the number of boys will (given our assumptions) follow a binomial, characterized with p = 1/2 and n chances. Pr(0 boys | assumptions) = (1/2)n, Pr(1 boy | assumptions) = n * (1/2)n, Pr(2 boys | assumptions) = (n choose 2) * (1/2)n and so forth. The “(1/2)n” is always there because of a nifty quirk of the binomial with p = 1/2.

The proportion of boys to girls follows right from this. If there are 0 boys, the proportion is 0/n, because there must be, given our assumptions, n girls. The probability of seeing this proportion is the same as seeing 0 boys. And so on for all the other proportions, 1/(n-1), 2/(n-2), etc., except if all boys are born then the proportion is infinite (well, n/0 anyway). Lastly, assuming n is even, the most likely occurrence is (if n is even, or within rounding if not) n/2 boys, giving a proportion of 1/1. This follows because p = 1/2.

Visually (with n = 8), we might have seen this:

BBBGGGGG

With a (for boys) b1 = 3 and (for girls) n – b1 = 5 and thus a ratio of 3/5.

Year 2. Those couples who have had a boy exit the competition, the remainder have another go. The uncertainty in the number of boys in this new crop will again be characterized with a binomial with the same p but with n – b1 chances. Again, the most likely outcome, to our knowledge, is (n – b1)/2 new boys. That’s again because p = 1/2.

This year might have given, say, a b2 = 3, thus

BBBGG

The ratio counts Year one’s b1 boys and n – b1 girls plus this year’s crop, for a total of b1 + b2 = 6 boys and (n – b1) + (n – b1 – b2) = 7. The ratio is 6/7.

Year 3 is a repeat, our uncertainty another binomial but with (n – b1 – b2) = 2 chances. The most likely number of boys is 1. Suppose we see two boys:

BB

The total boys is 8, the total girls 7, for a ratio of 8/7.

If you are mathematically inclined, you will notice this ratio is not 1/1, which (I’m guessing) is the answer the examiner wants. It’s close, though.

The reason it is close is that each year the most likely occurrence, to within rounding, is half boys, half girls from the couples who are still going at it. Adding all those halfs up, as it were, gives half-and-half boys and girls as the most likely final outcome. But this isn’t necessarily the outcome.

We could figure the probability of seeing 8 total boys and 7 total girls easily but tediously enough. It involves calculating the probability of seeing 3 boys and 5 girls in Year one and 3 boys and 2 girls in Year two and 2 boys and 0 girls in Year 3. But then we’d have to figure the other ways (if any exist) to get 8b/7g in three years. Conceptually simple, because each combination follows a binomial with known parameters, but, as claimed, tedious to run through.

Large n

Now it could have been, for whatever sized n, that Year 1 saw all boys. We know the probability of this is (1/2)n, which is always greater than 0 for any finite n (which it always will be). Meaning, our knowledge does not preclude an infinite ratio: it could happen, especially with small n.

For large n, we follow the same pattern as above. But eventually two things happen: the kiddies become adults and pair off and begin producing their own children, and eventually the thus created grandpas and grandmas cease their efforts. How many child bearing years does a woman have, after all, assuming she’s pushing out a kid a year?

Ceasing to produce is easy to account for, but figuring the number of new couples is hard, because that depends on—you guessed it—the proportion of extant boys-now-men to girls-now-women. If the proportion of boys and girls is not 1/1, and while this is the most likely it is not certain, then some boys or girls will go marriageless. You then have to assume if they’re going to remain that way (easy), or if the strays can marry the strays which probably will come along the next year (hard, because how long will they remain fecund?).

Now all this is discrete and tedious, but if one has the energy and time it could all be ploughed through. We also have the sense that, because of the symmetries and clear assumptions, that the “system” will reach a limit where the proportions of boys to girls is roughly equal.

It may be equal in any year, but it’s more likely, we guess, to only “near” equal, where we can work out what it means to be near.

Categories: Statistics

### 39 replies »

1. Scotian says:

This is an old chestnut that I remember from my youth. As you note the usual solution assumes a continuum and all sorts of infinities. It would be interesting to run a computer simulation and look for attractors. The sex ratio could be varied, kin avoidance could be added, all sorts of things. I wonder if this has been done. I know that population simulations are very popular in biology.

2. Briggs says:

Scotian,

In this, like in most questions, there isn’t any problem per se introducing continuity, hence infinities. But it’s how one approaches the limit that matters. Is crucial, even. It is never a good idea to jump right to infinity: think of needle dropping problems and the like.

3. Under the (perhaps unlikely) assumption that both sexes are always equally likely (with no tendency of any parental pair to have more girls than boys or vice versa)we have the following:
The expected proportion of first-born children who will be boys is 1/2.
The expected proportion of second-born children who will be boys is 1/2.
etc.

4. More tediously, but still kind of interesting due to its display of a cute summation identity:

The expected fraction of all families that will have just one child is 1/2. Of the rest 1/2 will get a boy on the second try, and so on.

This gives the total expected number of boys equal to the total number of families times 1/2+1/4+1/8+etc=1 (which is too well known to be interesting and in any case it was obvious that,in the absence of twins, every family would have exactly one boy).

But the expected number of girls counted by family size is a bit more interesting. We get none of the one child cases, one in each two child family, two in each threesome and so on. Thus the total number of girls is equal to the total number of families times the sum from n=1 to infinity of n/(2^(n+1)).(Which is also equal to one but may be less well known)

PS There is now an apocryphal story about John von Neumann in which he instantly gave the answer 1/2 and when the student who posed the question said “Ah, so you saw how to do it by considering birth order rather than family size” vN replied “No I just used family size and summed the series”

5. Briggs says:

Ed,

Thanks. Interesting thing about “expected values” is that they’re usually not expected. Take the value on a die: “expected value” of 3.5, which is impossible to expect. EVs have some nice mathematical properties, but they’re best not to start with in unfamiliar situations. Can lead you astray. So can infinities, as I mentioned to Scotian.

Suppose, like my Uncle Pat and Aunt Patty (n = 1) you kept trying for a boy, as the rules stated, but quit after 4 kids. They had 4 girls and said enough’s enough. All couples, like I say in the text, must quit sometime. Landsburg assumes they can continue indefinitely, an assumption which would please my Uncle, but is not realistic.

Now (use Landsburg’s chart), the probability of B is 1/2, and GB is 1/4, GGB is 1/8, and GGGB is 1/16. But then the probability of GGGG is also 1/16 (we must write everything that can happen and the probabilities of everything that can happen must add to 1). That’s a B/G ratio of 0/4 = 0, which has (of course) a probability of 1/16.

Again, the right answer depends on when you’re looking. The expected value doesn’t help much unless you’re peering into infinity.

For my Aunt and Uncle, the ratio of B/G was always 0 (0/1, 0/2, 0/3, then 0/4). But then that’s because of their stopping rule (nature builds one in, even if you don’t).

For stopping rule of 0 (no kids), the B/G ratios (probabilities) are 0/0 (1).

For stopping rule of 1, the B/G ratios (probabilities) are 1/0 (1/2), 0/1 (1/2).

For stopping rule of 2, the B/G ratios (probabilities) are 1/0 (1/2), 1/1 (1/4), 0/1 (1/4).

For stopping rule of 3, the B/G ratios (probabilities) are 1/0 (1/2), 1/1 (1/4), 1/2 (1/8), 0/3 (1/8).

Etc., up until, what, 15? 20? Not much more than that, anyway. Notice that with n = 1 the most likely B/G ratio is 1/0. There can never be more than 1 boy, right? The only time the ratio is 1/1 is with stopping rules of at least 2, and each has probability 1/4.

6. Scotian says:

Alan, the von Neumann story is usually told in reference to a fly and bicycles or trains.

7. Briggs says:

Alan, Scotian,

Hadn’t seen that before. Love it!

8. Yes Scotian, I was just having a bit of fun with it (which is why I said “there is *now* an apocryphal story…”) – but thanks for posting those links to the original version.

9. I don’t think it matters what the limit on family size is. The expected proportion of boys will always be 50% (based again on the firstborn, secondborn, etc breakdown for the simplest explanation).

What does change is the expected total number of children which is twice the number of families if we can keep trying forever but a bit less than that if our maximum number of tries is limited.

[eg with a cap of 3 tries the expectation is that half of the families will get a boy on the first try, of those left half will get a boy on the second try, and half of the rest will get a boy on the third try for an expected number of 1/2+1/4+1/8=7/8 boys per family; and for girls it’ll be none in the one child families, one in each of the families that gets a boy on the second try, two in each of the families that gets a boy on the third try, and three in each of the families that never gets a boy for an expected total of 0(1/2)+1(1/4)+2(1/8)+3(1/8)=7/8 ]

10. Ed says:

Mr. Briggs, thanks for the reply, I was really looking at the post with other eyes and will look closer to what you’ve stated.

11. Scotian says:

Alan, all the von Neumann stories may be apocryphal. Thus the change in trains versus bicycles and in who asks him the puzzle. Briggs may accuse you of jumping to the continuum too quickly in your finite family size example. This is why I would like to see random walk like simulations under the conditions of a distribution of family size, fertility rates, finite initial population, a realistic boy/girl birth ratio, and so on. As happens with random walks once the initial even sex ratio deviates from half and half due to chance it might take a long time to come back to parity. This might have enormous implications for population dynamics or I might be off my rocker (even chances).

By the way there is that evil word chance again!

12. Timotheos says:

Hmmm, working on the assumption that the families will copulate onto infinity until having a boy, and that we are looking at this from a non-specified time range, I would say that the average number of girls that a family would produce would be the Sum of All Terms from 0 to infinity of the function x*((1/2)^x), which would be equal to 2.

Given that the odds of having a boy is a finite number, it makes no sense to say that there could be an infinite period of time in which only girls, and no boys, were born, since those odds would be equal to the limit of x as x approaches infinity of the function (1/2)^x, which is equal to 0. So the number of boys per family must be equal to 1.

Thus, the ratio of boys to girls is 1 to 2, which makes 2/3rds of the population girls, and 1/3rd of the population boys.

13. Casey says:

So long as the probability of a pregnancy producing either a boy or a girl is independent of previous pregnancy outcomes, the “stopping rule” does not matter, half of pregnancies result in girls and half in boys, so the population proportions will balance similarly. There are lots of descriptions of the reproductive cycle that will give this outcome.

14. David Engle says:

The “stopping rule” certainly does matter. The problem requires that all mothers which have not yet produced a son cannot stop producing girls until they do (and must stop upon producing a single son.) Clearly the population will have more girls.

expectation of girls is ( sum of n from 1 to infinity of ( (1/n)*(1/2)^n ) )

15. “Clearly the population will have more girls.” But,as so often happens, what is “clear” is often NOT true.

16. Scotian says:

Clearly you can not fool mother nature in this manner by such a passive process. My, and I admit trivial point, was to ask about the approach to equilibrium from an initial imbalance and the effect of non-passive factors. This could take just one generation. In a sense the experiment has already been done as many people have done exactly what the puzzle states to no effect. Even a minority of people trying this would be visible in the population, if it was possible. Of course given that the natural boy/girl birth ratio is about 105/100 it might be hard to detect this in practice. To those who state otherwise I will leave it up to you to find the error in your math. Be ruthless.

Of more, not puzzle, interest is the effect not of birth ratio but of reproductive age sex-ratio due to differential mortality or other means of avoiding child bearing, such as birth control, lack of partners, etc. Here one is interested in population dynamics and the conditions under which populations crash. Related to this one can ask what is the probability of a small breeding population producing all boys or all girls? Can small sub-populations evolve that show a strong sex birth selection that is parasitic on the larger population? If any biologist is reading this, do you know of any examples in the biosphere? I know that in some animals sex ratio is temperature dependent and then there are bees.

Of course none of this has anything to do with the solution to the puzzle.

17. David Engle says:

Hmmm… Having slept on it, I realize my earlier comment was wrong. The number of girls and boys in the population will be equal. The answer is not as obvious as I’d thought at first.

Perhaps the simplest argument is correspondence. Each and every mother must have exactly one son, and each and every son must have exactly one mother, therefore there must be a 1:1 correspondence between males and females in the population.

18. Speed says:

This is an example of a statistician making something more difficult and tedious than necessary. Alan Cooper’s answer (the one I came up with after a couple of minutes — I am typing this with one hand as the other is busy patting myself on the back) makes it understandable and therefore teachable to the uninitiated. It is also the type of reasoning that people need to use in daily life and scientists in thinking about and explaining their experiments.

There is still the need for a rigorous solution and clear definition of the experiment but the long explanation will be lost on most people.

19. Briggs says:

Speed,

The only problem with Alan’s answer—admittedly a small one—is that it’s strictly wrong.

The answer, as I’ve tried (perhaps unsuccessfully) to emphasize, depends on when you look. We cannot say with certainty that the answer is 1/2 boys and girls always and everywhere. Not for small n, not for infinite n, not for infinite couplings, neither.

I showed (in comments) that in the situation where n = 1 and with most stopping rules there is only a 1/4 chance of equal numbers of boys and girls.

So while Alan’s summary might guide the intuition toward the right answer (and see my text for a similar guide), it can lead you astray.

Not everything is easy.

20. Speed says:

Briggs wrote, “The only problem with Alanâ€™s answerâ€”admittedly a small oneâ€”is that itâ€™s strictly wrong.”

But then there is no strictly right answer. There are a range of possible right answers — some more likely than others. Or the right answer is a range of possible answers, some more likely than others. It is nice to acknowledge this and it could become a terrific teaching moment but is far beyond the scope of the “poser” and I can’t imagine working through the equations at a cocktail party.

It is a problem that can teach middle school students (and non-technically trained adults) how to solve interesting and non-obvious puzzles.

21. Briggs says:

Speed,

“But then there is no strictly right answer.”

No, sir. That is false. There is a strictly, deducibly (yes, deducibly) right answer for every specification of n and stopping rule. See the comments above where I worked out the strictly right answer for an n = 1 and for all stopping rules.

It can also be worked out—and I gave notes how—for larger n. In fact, given our computing power, this should be easy to do. That’s homework for an ambitious reader!

Perhaps the difficulty is what you’re calling a “right answer”. I think maybe (I’m guessing) you say “right answer” means the exact proportion that must occur under certain specified circumstances. This does not exist except a stopping rule of 0. All—as in all—other situations are distributions over possibilities.

It’s too bad we can’t work this out at a cocktail party (but then you’ve never been to one of mine), but that’s life.

22. “Itâ€™s too bad we canâ€™t work this out at a cocktail party” – I’ll drink to that! But I’m not sure what I said that you think is wrong. I tried to be sure to refer to the 50% ratio always as an expectation, but maybe I did miss one. Of course, just like the Spanish Inquisition (see http://www.youtube.com/watch?v=Tym0MObFpTI), no one expects the expectation (in fact if I suspected doctored data I might specify exactly 50 Heads in 100 coin tosses as a low p-value event on which to reject the null hypothesis of Boolean trial with 100 reps at p=0.5). But for the ultimate argument that stopping conditions have no effect I like Casey’s argument (even though he unfortunately said “half…will …” rather than “the expectation is that half … will…”). In ANY set of ova attacked by a 50-50 mix of equally powerful blue and pink sperms the expected proportion of male babies is 50%.

23. Briggs says:

Alan,

Apologies. Misspoke in a hurry. I meant “strictly wrong” in the sense there exists an exact proportion with certainly.

24. Timotheos says:

Eeek! I just realized that I set up my summation wrong. I made the subtle mistake of adding together the odds of an outcome times the number of girls it would return, not by the odds of ONLY that number of girls being returned. In effect, I counted everything twice. So my summation should have been x*((1/2)^(x+1)), which is equal to 1 and not 2. And I see that Alan Cooper already beat me to the punch, so now Iâ€™m thirsty.

Another slightly more helpful way to write summation would be to take the odds of getting a girl and then add the odds of getting an additional girl, generating the summation sequence 1/2, 1/4, 1/8..etc, which is exactly the same as the sequence for the boys. This has the intuitive advantage of lining up the same numbers for both sexes, which makes the solution a little easier to process.

Still, it may seem strange that the stopping policy had no effect on the proportion. But I think thatâ€™s because our intuition is screaming that changing a factor should have SOME effect, and not because it is telling us anything about the proportion. And while instigating the stopping policy does not change the proportion of boys to girls, it certainly does change the number or boys and girls, since we supposed that the couples could copulate ad infinitum for both boys and girls, which would allow for an infinite number of both. Adding in the stopping policy necessitates the number of boys and girls in each family is finite, so the stopping policy certainly puts a large restriction on the number, but not the proportion, of boys and girls in the population.

25. David Engle says:

The stopping policy does have an effect on the ratio.

Every son must have one mother, but not every mother has a son, and no mother can have more than one son, therefore there must be more mothers than sons.

The problem is that the wording of the puzzle implies that every woman eventually has a male child, and we work out the series summation using this assumption. In reality this isn’t true, some women will chose to have no children, some will fail to find mates, some will die before they succeed in producing a male child. Because of these cases, which aren’t accounted for in the summation, women will outnumber men.

26. Yes, David, if there is a limit on family size then the number of boys will be less than the number of mothers. BUT, even though some mothers may have more than one girl, others will have less, and in fact the expected number of girls will also be less than the number of mothers (to exactly the same extent that the expected number of boys is). The REAL reason why this must be true was provided above by Casey, and the only reason for counting by family size or whatever is in order to understand how arguments which seem to contradict him go wrong.

If you look back at my first answer you will see that it really covers all of these cases. If the maximum allowed number of children per family is N, then, for N=0 the expected numbers of boys and girls are both zero. For N=1 the only child in each family has equal chances of being a boy or girl so the expected numbers of each sex per family are both 1/2. For N=2 the second children (allowed only in those of the families that start with girls) also have equal chances of being boy or girl, so the expected family type frequencies are 1/2 Bonly, 1/4GB and 1/4 GG for an expected number of boys per family of 1/2(1)+1/4(1)=3/4 and of girls 1/4(1)+1/4(2)=3/4. I did the case N=3 in an earlier response, and the general case follows the same pattern.

27. Briggs says:

Alan,

And there’s the weakness of expected values. With n = 1, there is only a 1/4 chance of equal boys and girls for all stopping rules 2 or greater. A pretty wide disparity over the expected value guesstimate.

28. Yes, and for a single one child family there’s ZERO chance of seeing the expected numbers of boys and girls.
But then as Monty Python told us “NoOne Expects…”

29. Nullius in Verba says:

It doesn’t even need an infinite summation. All the births that actually take place are 50:50 events. So 50% of them (on average) will be boys.

The stopping rule is a complete red herring. The only way it could have an effect is if the birth probabilities for any given family were not independent – for example, if different people had a different (i.e. not 50%) boy/girl birth ratio determined by their genetics, but the mean of these different birth ratios across the population was 50%. Then waiting for a boy would selectively remove those parents with a propensity to have boys, leaving those with the propensity to have girls.

Anyway, the rest of it is just Briggs’ usual point about the expectation not being the outcome. Something made even more obvious by pointing out that if you have 5 parental pairs, the expected number of children is 2.5, which seems rather unlikely to happen. (One would hope.)

To which I think most people would reply: “You know perfectly well what I meant.” It’s not really all that complicated.

30. Nullius in Verba says:

Typo – that should be “2.5 male children”.

31. David Engle says:

“All the births that actually take place are 50:50 events. So 50% of them (on average) will be boys.”

And there is the problem, you are not considering the births that do not take place. In the case where there is no birth at all, the mother (1 girl) out numbers the son that does not exist (0 boys). This has no effect on the ratio of the (still theoretical) growth rate which is 50:50, but until someone’s actually born this population ratio is still 1:0.

The problem is the imprecise sematics of question make us cofuse the growth ratio and the population ratio, as well as the terms: family, mother and female. As is often the case in math if the original wording were more precise we wouldn’t be arguing at all.

32. David Engle says:

The stopping rule is a red herring in the sense that it does not alter the birth ratio.
but the stoping rule still has a very important effect, it imposes a hard constraint that boys can never out number girls no matter what the other conditions of the system are.

33. aGrimm says:

Matt: Not being a statistician, I’ve always wanted to know what the odds are for/against my family’s lineage. Perhaps you would be so kind?

Great-grandfather: 4 boys
Grandfather: 2 boys
Father: 9 boys (Uncle: 4 boys)
No girls until some dum-dum (me) broke the chain. Trust me, boys are easier to raise and require a lot less toilet paper.

– Thanks

34. aGrimm says:

Matt: I got to thinking that my request (previous post) is rude without at least offering something in exchange. Being retired, the only thing I can offer is some positive feedback as to the influence of your site. First some background: the one statistics course I took 40 years ago, I principally employed to count things radioactive during my career. The course did teach me to recognize, most times, when I was getting my leg pulled by statistical shenanigans. I’ve been following your site since I saw your first posting to Watts Up With That. I have learned a lot more about recognizing statistical shenanigans (Yay p value!),for which I thank you very much. But your influence does not stop with me for I am the go-to guy in my rather extensive circle of family and friends when it comes to science and math questions. I cannot glibly talk statistics, but I’m one of those who has the ability to translate science/math into laymen’s terms. Your debunking of so many things gets translated to this group and hopefully to their circle of family and friends. If I am an example, your influence goes well beyond the list of commenters here. And you should be rightfully proud of your work.

35. Nullius in Verba says:

“And there is the problem, you are not considering the births that do not take place.”

The births that don’t take place have no effect on the population statistics. By definition.

“but the stopping rule still has a very important effect, it imposes a hard constraint that boys can never out number girls no matter what the other conditions of the system are.”

Suppose in the first generation that every family has a boy and stops. What are the populations of boys and girls?

36. David Engle says:

(g+m)/(f+s)

where g is the number of girls who have no children, lets say 10 for in one possible case,
m is the number of mother who have children, lets also say 10,
f is the number of fathers, lets say 5, ( yeap, this is also possilbe )
s is the number of sons ( which your forcing to be equal to m), so also 10

notice the ratio is 20:15 in favor of girls, and every subsequent generation in this example can never have more boys than girls, ever, NO MATTER how subseqent random births happen.

37. Does no-one ever die in your world, David? It is true (and obvious without any fancy formulas)that if no mother can have more than one male child then the number of males who have ever lived must be less than the number of such females. But if every woman has one male child and then dies, then the number of males will (for one brief lifetime)certainly exceed that of females.

38. Scotian says:

Sorry David but you are wrong. The mistake that most people are making is reading too much significance into the condition that only the birth of a girl leads to another attempt. The condition of the puzzle is identical to any family being given the right to try for an additional child independent of the sex of the previous one based on some sort of lottery. For example, half of the original, then half of that half and so on. It doesn’t even have to be exactly half as it could be determined by coin toss to simulate the randomness of birth. This would not produce families of only one boy but it is equivalent as far as the sex ratio goes and you see that there is no bias for either sex and that boys could very well outnumber girls by chance. After the experiment is complete the children can easily be redistributed into “families” of one boy each. You might argue that the stopping point might leave you will an excess of girls with no boys to balance them, although strictly speaking this can only happen if there are no boys at all, but you can have an excess of boys too. The latter is forgotten by the simple fact that the boys can be used to make up single child families. Once you realize that the puzzle is only about distribution and not sex ratio the rest follows. It is sort of like topology.