The Problem with Data….

Ph.D. candidates Andrea Mayer and Jacob Lupfer are working with me to create a website that parallels Gapminder.  Our site contains data from the American states (the site will be unveiled soon, we hope).

Ahh, but data are tricky.  I created a motion chart using State Per Capita Income and Poverty Rate 1976-2009 data. Watching the chart unfold, it became clear that the PCI data after 1990 were screwy — watch it and see for yourself.  I’m working on fixes now. This is harder than it may seem…..

Here’s the problem.  All the PCI data come from the Department of Commerces’ Bureau of Economic Analysis.  The BEA doesn’t have all the state PCI data in a single file, so I would have to go year-by-year to download.  Too much work for me.  I figured that someone else would have a data set with many years of data, and I found a couple of reputable (university) sources that did, with each citing the BEA as the source.  So I cut and paste from these files.  But…these data sets have data that are pretty unreliable, at least when closely examined over time and compared to other data sources….

Solution: I asked my research assistant, Rose Tutara-Baldauf, to do the hard work of collecting data year-by-year (oh! the BEA reports that they don’t even HAVE data from 2001-2009 yet — so I’m not sure where the other sources actually got data for those years).  Thanks, Rose.

