Carleton Comps Project: Netflix Prize

Project: Netflix Prize

Advisor: Dave Musicant

Final Results

I. Background

Recommender systems have been around for a while: Commercial websites such as amazon.com will make recommendations for what it thinks you should buy, based on your purchasing history. Internet radio stations such as last.fm and Pandora will select music for you to hear that it thinks you will like, based on your past ratings of songs. All of these systems use an idea called collaborative filtering, where positive ratings by other users are used to influence the recommendations made to you.

In 2006, the Netflix Prize contest was announced. Netflix has offered to reward one million dollars to the team that can make a 10% improvement in the quality of Netflix's movie recommendations. While the grand prize has not been granted yet, Netflix offers a $50,000 Progress Prize each year to the team whose results show the most improvement over the previous year's best result.

The Netflix Prize has been underway for a few years now, and there has been significant work on the project. Netflix maintains a series of forums for people to post questions and ideas. Each term that wins a Progress Prize is required to post a detailed paper describing the technique. Therefore, there is a wealth of information available about techniques that people use. At this point, it is very challenging to improve over the results that others that have done. Nonetheless, teams are doing it. Regardless, implementing algorithms that have been successful (and even unsuccessful) will teach you a lot about how this sort of thing is done. The experience you gain here would be exceedingly useful if applying for a company that utilizes collaborative filtering technology. Most online retailers are interested in these ideas.

Can your team win a Progress Prize, or the Grand Prize of 1 million dollars?

II. The Project

For this project, you will be implementing a system to be able to manage the Netflix training data, to run your algorithms, and to test your results. Specifically, you'll need to do the following:

Implement a framework and API for accessing the Netflix data. This data is BIG: you can fit it all in memory, but it's going to take some work to make it fast and accesssible. There are some open-source frameworks for doing this for the Netflix data (e.g., pyflix), but you'll avoid these and build your own. Handling raw data of this size that you need to access very quickly is challenging, and grappling with this challenge yourself is a worthwhile experience.
Summarize and report on the progress and state-of-the-art of the Netflix Prize.
Learn about and implement a variety of recommender system algorithms. The most successful algorithms on Netflix's Leaderboard are using variations on Singular Value Decomposition (SVD), but there are a variety of other algorithms that people have used to lesser success. For purposes of learning how recommendation systems work, it will be valuable for you to implement a number of these techniques as well.
Decide on a programming environment to do all of the above. Perhaps Python is sufficient; perhaps not. Carefully research how much memory you'll need and what you'll need to be able to do quickly. Algorithms such as SVD are provided with numerical computing libraries (such as numpy), but those libraries may not scale to data of this size.
Because the Netflix data is so large, and because you will be dependent on Netflix to validate how well your algorithm is working, you may need a smaller, more accessible dataset with which to experiment. The MovieLens datasets come in a variety of sizes and have friendly licenses.

One relatively unusual aspect of this comps project is that there is no "front end" to the software that you create: you'll need to do a lot of coding to implement both the underlying engine and the recommendation algorithms, but your output is simply going to be an error measurement.

III. References

The most useful references there are will be the forum postings on the Netflix prize website. The Progress Prizes and Leaderboard will point you in the directions of what has been most successful, and many other postings will point in the directions of other algorithms. Here are some references for some other older, more generic recommender system papers: you might find them useful, you might not.

J. Herlocker, J. Konstan, L. Terveen and J. Riedl. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22(1), pp. 5-53, January 2004.

B. Miller, J. Konstan, L. Terveen and J. Riedl. PocketLens: Towards a Personal Recommender System. ACM Transactions on Information Systems 22(3), July 2004, pp. 437-476.

Herlocker, J., Konstan, J., and Riedl, J., Explaining Collaborative Filtering Recommendations. In proceedings of ACM 2000 Conference on Computer Supported Cooperative Work , December 2-6, 2000, pp. 241-250.

J. Breese, D. Heckerman, C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence.