Introduction
Recommender Systems
Recommender systems are a hot topic in this age of immense data and web marketing. Shopping online is ubiquitous, but online stores, while eminently searchable, lack the same browsing options as the brick-and-mortar variety. Visiting a bookstore in person, a customer can wander over to the science fiction section and casually look around without a particular author or title in mind. Online stores often offer a browsing option, and even allow browsing by genre, but often the number of options available is still overwhelming.
Commercial sites try to counteract this overload by showing special deals, new options, and staff favorites, but the best marketing angle would be to recommend items that the user is likely to enjoy or need. Unless online stores want to hire psychics, they need a new technology. The field of data mining has a developing field of research in recommender systems, which fits the bill.
Recommender systems are systems that, based on information about a user's past patterns and consumption patterns in general, recommend new items to the user. Some systems incorporate information about the items in question, others are based only on usage patterns; the latter kind of system is known as a collaborative filtering system. Instead of asking the user to explicitly pick filters for a search, collaborative filtering uses information about the user's past behavior and similar users to make suggestions.
Our project
Applications of recommender systems can be found outside the online retail trade, although that is one of the most popular places to find them. The comprehensive exercise ("comps") assignment for our group was to build a collaborative filtering system to recommend courses for students at Carleton College. The end product would allow a current student to enter his or her transcript and - based on which classes had been taken and what grades had been earned - a list of classes in which the student would potentially do well would be returned. Clearly, there are some ethical issues at stake here, as the group would have to have access to old transcript data from real Carleton students in order to build a working recommender. Privacy both from the comps group and the end user was a serious challenge. After exploring several anonymity-preserving algorithms, the group expanded to include datasets of movie ratings provided by MovieLens and Netflix. Also available is an account of the challenges we faced in terms of data storage and the solutions we used.