Monday, January 21, 2008

Playing With Census Data

The sheer amount of data that the U.S. Census Bureau (http://www.census.gov) puts out is staggering. They provide demographic data based both on the decennial data (the demographic survey mandated by the Constitution) which today collects far more information than they "have" to.* In addition to the decade survey** they provide information from their American Community Survey (a survey of approximately 78,000 households ever year), economic data, poverty data, racial data, geographic data, and more.

The value this has to sociologists and businesses can hardly be measured.

First, for sociologists it allows them so see basic population counts, then what classes, geographic areas, races, etc., group together, or how their income relates to the overall population. Combined with other data, this can be incredibly valuable.

For businesses knowing the race, median/mean income, class, etc., of your clients is equally valuable. Are most of your customers white, black, hispanic, or asian? Do they come from the upper classes, the middle classes, the lower classes, or the underclass? This allows for "target marketing," which means more than knowing where to advertise, it means knowing how to best serve your customers.

For the last year or so I have been working on mapping and then visualizing the data (using colors and elevation to represent different variables, income, population, race, whatever). It's only been in the last two weeks I've been able to create the maps (and those are still imperfect), and I have yet to really start on getting the Census data into a database that correlates with the mapping. But this is an ongoing project. It's mostly our of curiosity reasons.

For anyone that is interested in similar projects, I'll be posting the tech specs on how I've imported the data. For people that just want to know what I am up to I'll be posting examples -- when I am able to get that far.

For me, this is a very exciting project.

P.S. I plan to make everything available for free (think GPL for those of you who know what that is) as I am able to make progress.


*Article 1, Section 2 of the U.S. Constitution states:
"Representation and direct Taxes shall be apportioned among the several States which may be included within this Union, according to their respective Numbers ... . The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct."

It is from the Census that Congressional districts are drawn (usually on a state level), and the number of electoral votes determined. However, today the Census Bureau collects far more data than the Constitution mandates (I do not know if Congressional law mandates it).


**Surveys generally take a random sample of people (if you see a survey in the news regarding polls, etc., the number of people surveyed should by about 1,500 for a good survey, 1,000 for an ok survey, less than that, ignore the survey). This is opposed to the Census, which tries to count every person in the United States every ten years.


No comments: