So I am getting to the age where a bunch of my friends are having a babies. One thing I love to do is turn their pregnancy into game for me. That is, I love baby pools. Most baby pools require a number of inputs:
1. Gender
2. Date (day and time)
3. Height
4. Weight
Placing is determined by how far off your guesses are on these 4 categories. Without looking at any data, I would guess these have to be related. For example, if a mother is late it only makes sense to guess a larger height and weight. Besides guessing on these 4 categories another key piece of information that is just as important if not the most important for success in a baby pool is keeping track of what others are doing. Baby pools are like Price is Right, it is definitely to your advantage to guess last.
Guessing the gender without any prior knowledge is essentially like flipping an unfair coin.
They (wikipedia) say the gender ratio of boys to girls is about 105:100. So ignoring the guesses of other players in the game, a better bet on gender will always be boy. The real question is: Can we make a better prediction based on knowing information on the couple. This is tricky question, and to give it justice will require some very specific data on the father's family tree. Hopefully I can find something eventually and we can return to this topic. For now I am going to focus on predicting the due date.
So the one website I found that had data on due dates is here. Essentially, based on numerous sources the distribution of pregnancies looks something like this:
It is pretty clear that pregnancy length is not normally distributed, and assuming a usual 40 week due date, about 62% of babies are born on or after their due date. Also, apparently the probability of having a child exactly at 40 weeks is less than 5%. With this plot and the 5% fact in mind, we might expect to see a probability distribution for single days that looks something like this:
Shifting the plot over so that we can think of the plot as days before or after the magic 40 week number we have:
This plot is maybe hard to see, but based on this data it is unlikely the "due date" is even the most probable day for the baby to be born. Wierd! Anyway, back to the baby pool. Speaking like an economist we would have baby pool efficiency if the distribution of guesses looked just like distribution above. I'm guessing this almost never happens. For example lets say the distribution of guesses looked something like this:
The approach I would take would be to pick a date with largest positive difference between true birth distribution and baby pool distribution over a short interval. In this case that looks like about 4-7 days late gives the best chance to win the date portion of the pool.
At this point I want to make a formal complaint about the lack of birth data that is publicly available. It probably exists out there, but key websites that host health and demographic data like CDC, Census, and WHO in my opinion need to get their act together. I can say hands down it easier to research on the most obscure baseball topic than on the topic that is central to so many people's lives. So if any of you statisticians, nurses, or public health people out there know of some good data I would love to hear from you!




Hey Soma!
ReplyDeleteThis post caught my eye - nice topic choice!
The National Center for Health Statistics might have what you're looking for (www.cdc.gov/nchs/fastats/births.htm). You can find a link on that same site to Vital Statistics for more raw data.
If your interest in childbearing holds, I might also suggest one of my favorite blogs (www.scienceandsensibility.org) for interesting and critical analyses on the research, stats (!), and clinical care of families during pregnancy and birth.
And if you really want to go nuts, I might suggest delving into the stats surrounding genetic testing during pregnancy; the probability of a condition occurring, likelihood of detecting it with our current testing options, odds as they vary by maternal and paternal age, false positive rates, etc.
Best wishes to you and keep up the good work!
Cheers,
Kala K
Mike-
ReplyDeleteSomehow I bumped into your site and said, "Hey, I know that guy." I'm a fan of the semi-useless baseball splits (i.e. Joe Blow is batting .383 left-handed against righties during day games in away ballparks in the central time zone with runners on 1st and less than 2 outs after the 6th inning with the count in his favor...), and I've enjoyed your posts so far. (You would have lost badly in a pool for our daughter's birth.)
I'm sure you've seen that Cliff Lee is 5-0 with a 0.21 ERA in 5 starts in June and has 3 consecutive shutouts. ESPN also says that only 5 pitchers have gone 5-0 with an ERA of 0.21 or better in a calendar month before since the end of WWII (actually all since Nolan Ryan in 1984). I'm sure that there are many things you could calculate from that...
I was just thinking of how many "Pitcher-months" there have been from 1946-2011, assuming 5 months per season. It would take quite a bit of figuring because of baseball expansion and teams not always having 5-man rotations.
Also, to calculate the odds of throwing a shutout, would you look at the number of scoreless vs. scored-on innings a pitcher has had? I'm thinking that might be more useful than a straight ERA or batting average against. Maybe BAA with RISP would come into play?
Last unrelated thought- Francisco Liriano had never thrown a complete game before his no-hitter. Would that put the odds close to 0 that he would throw a no-hitter, since he had never thrown a complete game? Or could you go by average # of pitches per at-bat and factor in his career high # of pitches to see if it was likely that he COULD throw a complete game?
Anyway-have fun with that-
Christoph
Hey Kala,
ReplyDeleteThanks for checking out my post and the heads up on the CDC. I actually really would like to look at some of the genetic stuff in later posts, but that data has been even harder to come by. I checked out the scienceandsensibility blog.....I feel I am now an expert on breastfeeding;) The comparison of competencies of mothers under midwife care and traditional care was interesting though.
Christoph,
ReplyDeleteFirst of all, I hope all is well with Adelia. I heard she was born quite premature. I do wonder if those crazy events can be some attributed to genetics, or if it is just another case of regular old randomness.
I will comment on the Liriano no-hitter. The probability of that I'm sure was very close to zero. As it is for any pitcher for any particular game. Liriano's would likely be even lower than most, though. Even if you take away his no hitter, his batting average against is .252, which is not terrible. But, assuming there was no arm tiring factor, the probabilty of him throwing a no hitter would be somewhere around (1-.252)^27 = 0.0003938611. The kicker is he averages almost 17 pitches an inning. That puts him at about 150 pitches to throw a complete game. If he normally gets pulled after 100 pitches, he would also have to cut his pitch count by 33% which is also extremely unlikely. Anyway, no one was more surprised than I was when I heard Liriano threw a no-hitter.
They should just adjust due dates back a week!
ReplyDelete