Recommender systems have been around for a while: Commercial websites such as amazon.com will make recommendations for what it thinks you should buy, based on your purchasing history. Internet radio stations such as last.fm and Pandora will select music for you to hear that it thinks you will like, based on your past ratings of songs. All of these systems use an idea called collaborative filtering, where positive ratings by other users are used to influence the recommendations made to you.
In 2006, the Netflix Prize contest was announced. Netflix has offered to reward one million dollars to the team that can make a 10% improvement in the quality of Netflix's movie recommendations. While the grand prize has not been granted yet, Netflix offers a $50,000 Progress Prize each year to the team whose results show the most improvement over the previous year's best result.
The Netflix Prize has been underway for a few years now, and there has been significant work on the project. Netflix maintains a series of forums for people to post questions and ideas. Each term that wins a Progress Prize is required to post a detailed paper describing the technique. Therefore, there is a wealth of information available about techniques that people use. At this point, it is very challenging to improve over the results that others that have done. Nonetheless, teams are doing it. Regardless, implementing algorithms that have been successful (and even unsuccessful) will teach you a lot about how this sort of thing is done. The experience you gain here would be exceedingly useful if applying for a company that utilizes collaborative filtering technology. Most online retailers are interested in these ideas.
Can your team win a Progress Prize, or the Grand Prize of 1 million dollars?
For this project, you will be implementing a system to be able to manage the Netflix training data, to run your algorithms, and to test your results. Specifically, you'll need to do the following:
One relatively unusual aspect of this comps project is that there is no "front end" to the software that you create: you'll need to do a lot of coding to implement both the underlying engine and the recommendation algorithms, but your output is simply going to be an error measurement.
The most useful references there are will be the forum postings on the Netflix prize website. The Progress Prizes and Leaderboard will point you in the directions of what has been most successful, and many other postings will point in the directions of other algorithms. Here are some references for some other older, more generic recommender system papers: you might find them useful, you might not.
J. Herlocker, J. Konstan, L. Terveen and J. Riedl. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22(1), pp. 5-53, January 2004.
B. Miller, J. Konstan, L. Terveen and J. Riedl. PocketLens: Towards a Personal Recommender System. ACM Transactions on Information Systems 22(3), July 2004, pp. 437-476.
Herlocker, J., Konstan, J., and Riedl, J., Explaining Collaborative Filtering Recommendations. In proceedings of ACM 2000 Conference on Computer Supported Cooperative Work , December 2-6, 2000, pp. 241-250.
J. Breese, D. Heckerman, C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence.