v CompSci 1: Lab 3

Lab 3: Networks

Overview

  1. Set up your account with Duke Scrobbler
  2. Learn how to use GUESS
  3. Choose data source for Network Construction assignment

Using Duke Scrobbler


Dametrious Peyton, Zach Marshall, and Beth Trushkowsky created online social network tools for use in our courses called Duke Scrobbler based off of Facebook, UPenn's Lifester, and last.fm.

If you have Facebook account, create a Scrobbler account by logging. You can delete your Scrobbler account and unlink it from you Facebook account after the lab if you like. This system also allows you to store your music listening profile. To submit your listening history from your computer or iPod, you will use Duke Scrobbler Client, powered by the Audioscrobbler technology. More on this client later.

For now, you should just download your Friends Graph. Right click on View My Friends Graph and save the file as myFriends.graphml.

Using GUESS


Eytan Adar developed GUESS a tool for analyzing and visualizing networks. Ben Spain then adapted it for our courses as DukeGUESS.

Follow these steps to open DukeGUESS.

  1. Go to the Start menu.
  2. Next go to Programs.
  3. Inside that sub-menu, go to Programming Languages.
  4. Inside that, go to DukeGuess.
  5. Finally, choose the only available option: the DukeGuess folder.
  6. This should open a directory window with all of the DukeGuess material. In this directory, double-click DukeGuess.jar to run the desired program.
You can download the zipped folder for use outside of lab.

We will use Duke GUESS to analyze and visualize networks in this course, so you should follow the steps in the Duke GUESS tutorial.

Working with your friend graph

  1. Open the myFriends.graphml file that you saved earlier in GUESS. Export the file to gdf format by choosing the Export GDF... option in the File menu. Export to myFriends.gdf. You will submit this file for this week's lab and also use it in next week's lab.
  2. Type
    g.nodes
    into the interpreter. What do you see?
  3. Instead of using your actual Facebook logins or IDs, each node in the graph is denoted by a digital fingerprint like isfGSN_i6zXA. generated by a hash function. Why do we use that instead of your actual ID?

Building a Network

Taken from Prof. Michael Kearns Networked Life course.

You should identify a specific source of real-world data, the precise definition of the network (vertices and edges) you plan to extract from this data, and the methodology by which you will extract it.

We will be generous with the term "real-world", which could include data from the domains of biology, sociology, economics and finance, technology, etc. However, it must be a well-defined, objective data source gathered by a third party. An example of an entirely acceptable data source is the recently released corpus of emails exchanged by Enron executives, where it would be natural to examine the network of whom exchanged email with whom. An example of an unacceptable data source and network would be "I wrote down a list of all my friends and then connected any pair of them that I thought shared a lot of common interests". This example is too subjective and the data is not gathered by a third party.

To be sure there is some minimal level of complexity to your network, we require that the number of vertices in the network be at least 12, and the total number of edges in the network to be at least 12. However, considerably more ambitious networks are encouraged.

By the "methodology" by which you will extract your network, we mean how you plan to go from the raw data source and your defined network to an acutal representation of your network in our simple format (see below, but essentially nothing more than a list of all the vertices in your network, followed by a list of all those pairs of vertices that are connected by an edge).

For this part, you should submit a brief write-up detailing the information described above for your network. If your data source is online, please provide the URLs for the source; feel free to include a small portion of the raw data in your write-up if it would be helpful to do so. Be sure to be as precise as possible in all aspects of your write-up, from network definition to methodology. As an informal test, your write-up should be sufficiently precise that a third party could independently create the same network you will from your description.

You may do this section in pairs.

Data Format

Your networks description should be in a file called myNetwork.gdf in the GUESS .gdf file format. A gdf file consists of a section describing all of th evertices and then one describing all of the edges. Consider simplegraph.gdf, a graph has six nodes (A-F) and eight edges. The file is listed below and a visualization of the graph is to the right.

nodedef> name A B C D E F edgedef> node1,node2,directed A,B,true A,C,true B,C,true B,D,true C,D,true D,C,true E,F,true F,C,true The vertex section of the graph begins with the following line.
 
nodedef> name
This line indicates that each subsequent vertex definition line will have the vertex name on it. You can define many other attributes for vertex, but only a name is required. The edge section of the graph begins with:
edgedef> node1,node2,directed
Edge definitions have a similar structure to vertices. The only required entried are the names of the two nodes to be connected by an edge. The directed attribute indicates whether a particular edge is has directionality. False means that an edge is undirected and true means that the edge is directed with the first node being the source and second node being the destination.

Possible Data Sources

Submitting

You should submit the following files via the Lab 3 assignment on Blackboard.
  1. myFriends.graphml and myFriends.gdf: Your friends from Facebook extracted from Duke Scrobbler
  2. lab3.txt: Answers for all of the questions from the "Using GUESS" section in a text file
  3. writeup.txt: A file for the "Building a Network" section that lists the specific source of real-world data, the precise definition of the network (vertices and edges) you plan to extract from this data, and the methodology by which you will extract it.
  4. myNetwork.gdf: GUESS data description for the network you specified in writeup.txt
For writeup.txt and myNetwork.gdf both you and your partner should submit the same file.