Extra Credit: Audio Scrobbler
Due: Wednessday, December 7 (max 10 points)
In this assignment, you will investigate the AudioScrobbler system used to
power the last.fm site and experiment
with music data stored on the Duke
Scrobbler site.
Part 1: Gathering Data
You should have already signed up for AudioScrobbler from Lab 3. In this part of the assignment you will:
- Follow the directions in the Duke Scrobbler documentation to log on to our version of .
- Sign up for last.fm and the CompSci 1 group. See the help page for more
information.
Writeup: Record your last.fm and facebook logins.
- Listen to at least 15 songs on your iPod or computer and upload the listening data
to both last.fm and Duke Scrobbler. The songs that you listen to
should reflect your current musical tastes.
Check your
profile page to see if your tracks are displayed. It sometimes takes
a few days for the tracks to show up on last.fm, while tracks should
show up immediately after being submitted with Duke
Scrobbler. See the last.fm help pages or send mail to scobbler-admin@cs.duke.edu for more
information.
- In a paragraph, characterize your music interests. How do
the songs that you listened to reflect those interests?
Part 2: Recommender Systems
You can view everyone's top artist and track lists. last.fm has a number of services set up so that
you can find people with similar musical interests, listen to your
favorite music, and discover new music that you should like.
Most of these services are based around the concept of a neighbor.
Your neighbors are supposed to be people with similar music taste to
you. How are neighbors calculated? Here's what they have to say?
We have developed an especially perverted type of probabilistic latent
semantic analysis. Profiles are decomposed using a custom algorithm
based on relative popularity of items, then organised using latent
class analysis.
The authors appear to be being deliberately vague here, but there is a
great deal of work on such systems. Latent
semantic analysis is often used in collaborative filtering
systems. Collaborative
filtering systems make predictions about the interests of a user by
generalizing from taste information collected by the collective user
community. AudioScrobbler is a type of recommender
system that collects data on user behavior and uses
collaborative filtering to recommend other songs. The
details of latent semantic analysis may be beyond the scope of this
course, but the general ideas are still somewhat accessible.
There are two good survey papers on Blackboard:
- Paul Resnick and Hal R. Varian. Recommender Systems (Introduction to
special section). Communications of the ACM, 40(3):56-58,
March 1997.
- Adomavicius, G. and Tuzhilin, A. Toward the next generation of
recommender systems: a survey of the state-of-the-art and possible
extensions. IEEE Transactions on Knowledge and Data
Engineering, 17(6):734-749, 2005.
Using the information from the papers, the data from Duke Scrobbler and on the last.fm profiles,
and the help pages and forums on the
last.fm site, answer as many of the following questions as you can as
to the best of your ability.
Duke Scrobbler
- Who would you rank as your closest neighbors on Duke Scrobbler? Why?
- Given the data that is available from Duke Scrobbler and the readings,
describe an algorithm for determining a metric for determining how
similar two users' musical tastes are.
You should use the View TASTE file utility from Duke Scrobbler to see what the data would look
for your proposed algorithm. Taste is a system for
making recommendations based on different collaborative filtering
algorithms.
Last.fm
- How similar are your musical tastes to those expressed by your
top neighbor in last.fm? Are many of the songs listed in his or her profile in your
collection as well? What percentage of the Top Tracks - Overall listed in his or
her profile do you own or listen to frequently? What about Top
Artist - Overall?
- If you have a neighbors page, go to it and note your neighbors. Based on the observed neighbor assignments and the readings on recommender
systems, how do you think the match values are calculated? What
information is used from the playlist? What information is used
from the actual music files? I am
not interested in the absolute match values, but rather the
relative values. For example, in what kind of system, would
profile A be a closer match than profile B and vice versa.
- Design an experiment to test your hypothesis from the previous
question.
- Who would be your neighbors from the CompSci 1 group? How could you
create a graph of the connections between users? What would the
vertices and edges represent?
Feedback
- What do you think of the Duke Scrobbler interface and documentation?
How does it compare to last.fm? Would you recommend the use of
Duke Scrobbler to a friend? What tasks did you find
particularly difficult or confusing?
Submitting
Write up your results and submit as either a Word or PDF document via Blackboard.