Tuesday, January 29, 2008

Watching Bethel

As promised here is the most recent time lapse of the construction of the new BU commons. (Created today, before dawn)



Monday, January 21, 2008

Playing With Census Data

The sheer amount of data that the U.S. Census Bureau (http://www.census.gov) puts out is staggering. They provide demographic data based both on the decennial data (the demographic survey mandated by the Constitution) which today collects far more information than they "have" to.* In addition to the decade survey** they provide information from their American Community Survey (a survey of approximately 78,000 households ever year), economic data, poverty data, racial data, geographic data, and more.

The value this has to sociologists and businesses can hardly be measured.

First, for sociologists it allows them so see basic population counts, then what classes, geographic areas, races, etc., group together, or how their income relates to the overall population. Combined with other data, this can be incredibly valuable.

For businesses knowing the race, median/mean income, class, etc., of your clients is equally valuable. Are most of your customers white, black, hispanic, or asian? Do they come from the upper classes, the middle classes, the lower classes, or the underclass? This allows for "target marketing," which means more than knowing where to advertise, it means knowing how to best serve your customers.

For the last year or so I have been working on mapping and then visualizing the data (using colors and elevation to represent different variables, income, population, race, whatever). It's only been in the last two weeks I've been able to create the maps (and those are still imperfect), and I have yet to really start on getting the Census data into a database that correlates with the mapping. But this is an ongoing project. It's mostly our of curiosity reasons.

For anyone that is interested in similar projects, I'll be posting the tech specs on how I've imported the data. For people that just want to know what I am up to I'll be posting examples -- when I am able to get that far.

For me, this is a very exciting project.

P.S. I plan to make everything available for free (think GPL for those of you who know what that is) as I am able to make progress.


*Article 1, Section 2 of the U.S. Constitution states:
"Representation and direct Taxes shall be apportioned among the several States which may be included within this Union, according to their respective Numbers ... . The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct."

It is from the Census that Congressional districts are drawn (usually on a state level), and the number of electoral votes determined. However, today the Census Bureau collects far more data than the Constitution mandates (I do not know if Congressional law mandates it).


**Surveys generally take a random sample of people (if you see a survey in the news regarding polls, etc., the number of people surveyed should by about 1,500 for a good survey, 1,000 for an ok survey, less than that, ignore the survey). This is opposed to the Census, which tries to count every person in the United States every ten years.


Thursday, January 3, 2008

Politics

So it's finally time to start caring about the 2008 presidential race. 2007 was about the entertainment, we are finally getting down to "the real thing," kinda.

My dad correctly predicted the Iowa caucus for the Democratics. From some conversations from a student Obama activists, Obama probably won because he had more people on the ground -- a lot more people. The polls as of yesterday still had put Hilary ahead, and the truth is, she probably still is. Her supporters just didn't go and vote like Obama's did.

The caucus results are more important right now than the polls because you can win more easily simply through voter turnout. Once you hit November, more people get out anyway, hence your ability to win by getting people out that normally would not goes down. At that point, polls matter quite a bit.

I have my own theory on predicting what party will win the Whitehouse, it's this: whichever party is more unified at its beginning. Basically, which party has the smaller spread in the caucuses. Unfortunately, I quickly found out this is hard to empirically verify.

I started by looking at numbers for the Iowa caucus, but can only find them back to 1972 for Democratics, and 1976 for Republicans. The other problem is that whenever an incumbent ran he ran unopposed (at least for all practical purposes). Hence, the theory only works on years when there wasn't an incumbent.

Since 1976 (when both numbers from Iowa were available) there has only been three such elections: 1976 (I don't count Ford as a real incumbent), 1988, and 2000. This is not enough data.

Of course there are other ways to find numbers, but its hard to find good numbers. I don't know where I can find poll results for a similar time period and if I could I'd probably have to pay for them, and even then I would have to find a way to compensate for different polling methods and confidence intervals (e.g., statistical problems).

So in the mean time I might have to find a new theory, at least if I want it to be empirical. I might still be right on this one, but it doesn't do a lot of good without the data to predict.