Lab 2: Networks

Taken from Prof. Michael Kearns Networked Life course.

Picking your Network

With your partners, you should identify a specific source of real-world data, the precise definition of the network (vertices and edges) you plan to extract from this data, and the methodology by which you will extract it.

We will be generous with the term "real-world", which could include data from the domains of biology, sociology, economics and finance, technology, etc. However, it must be a well-defined, objective data source gathered by a third party. An example of an entirely acceptable data source is the recently released corpus of emails exchanged by Enron executives, where it would be natural to examine the network of whom exchanged email with whom. An example of an unacceptable data source and network would be "I wrote down a list of all my friends and then connected any pair of them that I thought shared a lot of common interests". This example is too subjective and the data is not gathered by a third party.

To be sure there is some minimal level of complexity to your network, we require that the number of vertices in the network be at least 12, and the total number of edges in the network to be at least 12. However, considerably more ambitious networks are encouraged.

By the "methodology" by which you will extract your network, we mean how you plan to go from the raw data source and your defined network to an acutal representation of your network in our simple format (see below, but essentially nothing more than a list of all the vertices in your network, followed by a list of all those pairs of vertices that are connected by an edge).

For part 1, you should submit a brief write-up detailing the information described above for your network. If your data source is online, please provide the URLs for the source; feel free to include a small portion of the raw data in your write-up if it would be helpful to do so. Be sure to be as precise as possible in all aspects of your write-up, from network definition to methodology. As an informal test, your write-up should be sufficiently precise that a third party could independently create the same network you will from your description.

Creating the Graph

You will be asked to submit your network as a file in a specific format that is extremely straightforward --- essentially simply listing the names or IDs of your vertices, followed by a list of pairs of vertices that are connected by an edge.

The file should bve named graph.dot and be in your public_html/s1 directory. The graphs will be written in the DOT attributed graph language. As an example:

graph G { Alice; Bob; Chuck; Dora; Alice -- Bob; Alice -- Dora; Bob -- Chuck; Bob -- Dora; label = "Example graph\nby:\nStudent 1\nStudent 2"; }

In this example, we first "declare" there to be four vertices that will have the names or labels "Alice", "Bob", "Chuck" and "Dora"; and then we declare the four edges listed above. Of course, the names or labels can be anything you like, so we suggest you make them meaningful in the context of your network. You should probably avoid strange characters, underscores, etc. The syntax above is important --- i.e. you need semicolons after each vertex and edge declaration, you need to use "--", not "-", etc. The label= line gives the name of your graph.

Possible Data Sources

Testing Your File

Go to Grappa 1.2 demo page. Paste your file into the text box on the bottom. Click the Press to Submit text button. See the results of your work.


Jeffrey R.N. Forbes
Last modified: Tue Sep 13 08:56:35 EDT 2005