Compsci 6, Spring 2012, Recommender How-to

Recommender guidelines and detaileld information.

For data files see the code directory or snarf the assignment. You are not given any already-written Python code.

Reading Data

Each data-reading module you write has a function get_data that returns two values: a list and dictionary. You must write two modules: one for reading data about books/ratings, one for movie/ratings. An overview an example of get_data follows. More details are in the details pages.

Book Ratings

You're given a file books.txt of 55 books and a file bookratings.txt of ratings for each book by 86 students/raters. The ratings for each student are given as 55 numbers where the ith number is the rating for the ith book in bookratings.txt. See details for more information.

Movie Ratings

The ratings for movies are stored in movieratings.txt in a different format than the book ratings. This file stores more than 22,000 ratings by 453 student-raters for 150 movies. Because the data is sparse, meaning most students don't rate all 150 movies, you'll need to process/read the file differently to create and return the requisite dictionary. For example, in the dictionary each list associated with a student/key must have 150 entries if there are 150 movies (don't hard-code the value of 150, calculate that based on reading the file).

You'll need to think about how to read this data. There are many ways to do so, but in your code you won't know the total number of movies until after you've read the entire file. You'll need to return the list of movies from get_data, you can create this list using sets and list functions. You'll either need to read the file twice, or store the data as it is read in some way so you can create the dictionary you'll return after reading the entire file.

So, you can either read the file twice or store (movie,rating) tuples or lists for each line read and process these pairs in creating the dictionary you return. For example, after creating a list of 150 movies you can find the index of any movie using code like movielist.index("Eclipse") to find the index of the movie "Eclipse".

You'll need to think about how to read, store, and ultimately return a list and a dictionary such that rdict[x] is a list of N int values if there are N movies for each rater x that's a key in the dictionary returned.


Rating Items

You'll need to include a minimum of three functions in the module Recommender.py that you write. You'll also need to call these functions and document the results you get. You must write functions with exactly these signatures (parameters and return values).