The goals of this lab are:
Here is a list of my Amazon recommendations (PDF). Based on the recommender system article and these recommendations, answer the following questions.
The link below shows a visualization for comparing the search results produced by two search engines. (I've started it off with a sample query; it sometimes takes a few seconds to load.) The dots that are filled in with color are those pages that are found by both search engines; the empty dots are those pages that were found by only one of the two search engines. Each hit is shown in order of its ranking (left to right signifies top to bottom on a search page). The connecting lines signify which pages are found by both engines, and their relative rankings.
http://www.langreiter.com/exec/yahoo-vs-google.html?q=duke+university
Name one reason why two search engines might rank the same page differently, and one reason why they might not retrieve the same pages for a given query.
Before consulting any Web resources on the subject, list the ten universities that seem to you to be most similar to Duke.
Your list
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Now, visit TouchGraph's GoogleBrowser: http://www.touchgraph.com/TGGoogleBrowser.html Enter "www.duke.edu" as your stating URL. You should also visit Google and simply type "related:www.duke.edu" as your search query. The GoogleBrowser and the "related" search on Google.com are giving you two different visualizations of the same information that is in Google's database.
Take a look at the top ten Universities whose Web sites seem to be the most related to Duke's, according to Google's analysis. (This is probably easier in Google itself, since results are listed in order of similarity.) The ranking of "related" pages in Google's database is "Google's list" and can be observed several ways, but just look at the Google search results for "related:www.duke.edu."
Pick two or three of the most surprising differences between the top results on both lists, that is, "Your list" and "Google's list."
What factors could account for these differences? (Looking at the GoogleBrowser may help you to determine this, since it gives some visual information about who links to who.) What things were you thinking about when you wrote your list, and what factors are important to Google's analysis?
Pick a university on your list that was not "similar" to Duke according to Google. Find a path of similarity, either using the TouchGraph GoogleBrowser or using Google itself, between Duke and that university. Write down the path that you found. How long did it take you to find it?
What strategy did you use to find that path?