Prelab 4

All prelabs should be completed before your assigned lab period. Print this page and bring your responses to lab.

The goals of this lab are:

  1. Analyze and model networks with graphs
  2. Learn the basics of recommender systems

Reading Assignment

All readings are on Blackboard under Course Documents

Supplemental readings

These readings are also posted on Blackboard and are optional. They will be useful in better understanding the material in lab.

Questions

  1. Facebook questions:
    1. Find the person at Duke in Facebook with the most friends. Document your process.
      
      
      
      
      
      
    2. Find the person with the fewest friends. What does this mean?
      
      
      
      
  2. Amazon.com has a recommendation system that attempts to predict other items (e.g. books, CDs, and DVDs) that a user might be iinterested given the user's profile. Amazon collects data from users implicitly by keeping a record of purchases and also explicitly by asking users to rate items and note what items they already own.

    Here is a list of my Amazon recommendations (PDF). Based on the recommender system article and these recommendations, answer the following questions.

    1. Describe characteristics of the domain space of Amazon's recommender system in terms of the criteria descibed in Resnick and Varian's article (see Figures 2 and 3).
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    2. What kind of items do you think I bought recently?
      
      
      
      
      
      
      
      
  3. (From Marti Hearst) Different search engines use different metrics for determining document similarity. For example, Google uses the PageRank algorithm to determine the ordering of search results.

    The link below shows a visualization for comparing the search results produced by two search engines. (I've started it off with a sample query; it sometimes takes a few seconds to load.) The dots that are filled in with color are those pages that are found by both search engines; the empty dots are those pages that were found by only one of the two search engines. Each hit is shown in order of its ranking (left to right signifies top to bottom on a search page). The connecting lines signify which pages are found by both engines, and their relative rankings.

    http://www.langreiter.com/exec/yahoo-vs-google.html?q=duke+university

    Name one reason why two search engines might rank the same page differently, and one reason why they might not retrieve the same pages for a given query.

    
    
    
    
    
    
    
    
    
    

    TouchGraph

    Parts taken from Prof. Michael Kearns Networked Life course.

  4. Before consulting any Web resources on the subject, list the ten universities that seem to you to be most similar to Duke.

    Your list

    1.                                      2.
    
    3.                                      4.
    
    5.                                      6.
    
    7.                                      8.
    
    9.                                     10.
    
    

    Now, visit TouchGraph's GoogleBrowser: http://www.touchgraph.com/TGGoogleBrowser.html Enter "www.duke.edu" as your stating URL. You should also visit Google and simply type "related:www.duke.edu" as your search query. The GoogleBrowser and the "related" search on Google.com are giving you two different visualizations of the same information that is in Google's database.

    Take a look at the top ten Universities whose Web sites seem to be the most related to Duke's, according to Google's analysis. (This is probably easier in Google itself, since results are listed in order of similarity.) The ranking of "related" pages in Google's database is "Google's list" and can be observed several ways, but just look at the Google search results for "related:www.duke.edu."

    Pick two or three of the most surprising differences between the top results on both lists, that is, "Your list" and "Google's list."

    
    
    
    
    
    
    What factors could account for these differences? (Looking at the GoogleBrowser may help you to determine this, since it gives some visual information about who links to who.) What things were you thinking about when you wrote your list, and what factors are important to Google's analysis?

    
    
    
    
    
    
    
    
    
  5. Pick a university on your list that was not "similar" to Duke according to Google. Find a path of similarity, either using the TouchGraph GoogleBrowser or using Google itself, between Duke and that university. Write down the path that you found. How long did it take you to find it?

    
    
    
    
    
    
    What strategy did you use to find that path?

    
    
    
    

Comments?
Last modified: Sun Feb 4 18:42:28 EST 2007