CompSci 108
Spring 2010
The Software Studio

GEDIVA

Data are everywhere, especially given the increasing commitment by government and corporations to making their processes transparent. The flip side is that we are overwhelmed by the sheer amount of data available, unable to turn it into useful information. However, many studies have shown that when data is analyzed, it can be used to gain a deeper understanding of a domain or even make transformative changes based on patterns found. Thus the main challenge we face is how to programmatically turn that data into information we can use. These web sites are examples of trying to start that process: Swivel, ManyEyes, and a Visualization Gallery. Google has also jumped into the mix, announcing its own Data Explorer.

Specification

A typical architecture for many program designs is one that divides a program's execution into three stages: input, data is provided to the program; process, that data is transformed; and output, the program displays the results of transforming that data. This input/process/output (IPO) model of programming is used in simple programs like this one as well as in million-line programs that forecast the weather or predict stock market fluctuations. Your final program should be clearly separated into three independent modules such that each contains one or more classes that make it flexible enough to accommodate a variety of options without requiring either of the other modules to change. To do this, you must carefully design an API, Application Programming Interface, that defines the result of each step, and how to interact with it, so it can be received independently by the next step.

Write a program to allow users to import different data formats and visualize them in different ways.

The Input phase can be viewed as the back-end, or Model, of program, while the Output phase is the front-end, or View. The Process phase is mostly in the Model, but can have components in the View as well. For this project, one pair of students will work exclusively on the Model and one pair will work only on the View. Your pairs will agree to an API between you that cannot be changed after the first week. If it must be changed, the change and its reasons must be clearly described in a separate document turned in with the final version of the project.

Extensions

These extensions are intended to stretch your design further and to differentiate your program from others in order to capture the global data visualiation market, your team should agree on one area of extensions to focus on if you want to be considered for a grade in the A range. These extensions must further the good design of your program and not simply be hacks of code added at the last minute. If you do not have time to implement an extension, partial extra credit may be given for excellent justification of how your design either supports adding such a feature already or how it would need to changed sufficiently to support such a feature.

However, note that the amount of extra credit will be in proportion to the amount of intellectual effort needed to implement the option. For example, adding yet another way to filter key words would not be worth very much because your design should already support it. Of course, a well-tested, perfectly working program that has fewer features (but plenty of clear paths to easy expansion) is always worth more than the leaky kitchen sink. In short, to maximize your grade, you should implement enough variety in your program to clearly demonstrate that your design supports further such extensions.

Resources