Lab post 3

I chose to visualize the top 10 baby names (both male and female) over the course of 10 years. In order to do this, I needed a clear yet engaging way to display the data, so I chose a bar race chart. Rather than show the top ten baby names per year and sex, this would take the cumulative count of names over the course of 10 years. For example, the number of babies born with the name Jessica in 2002 was 327, but in my data it would be 664 because in 2001 there were 337 babies born with the same name. This also means, however, that I had to go through the tedious task of completely reorganizing the data for excel to properly add all of the counts together. So, the data looks a little something like this:

It took me a few tries to get the data correct as I was reorganizing by hand. Luckily, excel has a sum equation that would do all the repetitive calculations for me. In doing this, I had realized a few problems with the data (though this doesn’t show up in the final bar graph). The biggest problem I noticed was that if a name didn’t appear in the top ten again for the next year, the number would stay the same as last year. Take a look at the name Samantha on the data sheet above, the name appeared once in the top ten and never again so the number stayed the same. This creates a little bit of inaccuracy as there were definitely babies named Samantha in the following years. However, because names with these problems didn’t stay in the top ten for their sex for long, they don’t show up in the visualization of the total top ten. It made me realize how difficult it is to visualize even with this simple set of data. Visualizing DH data in the future will be very interesting as it usually isn’t numerical data.

Once I was done with the data, I got to do the fun part which was actually creating the visualization. I still wanted to include the sex as a segment of the data so I changed the colors to indicate the sex of the names. It took some time messing around with the categories to get the count to be the one to change every year, though I am happy with the final result.

1 thought on “Lab post 3

  1. This is such a fascinating way to visualize the data! Your description of the Samantha problem makes me wonder about a related phenomenon in the data: are there (usually) fewer girls in the top 10 than boys because there were fewer girls born, or because more girls are given “unique” names that don’t show up? Thanks for putting in the effort to make the race work!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php