Memory-based algorithms

Memory-based algorithms approach the collaborative filtering problem by using the entire database. As described by Breese et. al [1], it tries to find users that are similar to the active user (i.e. the users we want to make predictions for), and uses their preferences to predict ratings for the active user.

This page will talk about the general ideas; for specific equations and implementations, consult the Breese et. all paper and/or our code.

Similarity Measurements

In order to measure similarity, we want to find the correlation between two users. This gives us a value from -1 to 1 which determines who alike two users are. A value of 1 means that they both rate in the exactly the same manner, whereas a value of -1 means that they rate things exactly opposite (i.e. one high, the other low or vice versa).

There were two similarity measurements we used. The first was the Pearson correlation coefficient. It is the basic correlation algorithm for samples adapted for rating information. It tries to measure how much two users vary together from their normal votes - that is, the direction/magnitude of each's vote in comparison to their voting average. If they vary in the same way on the items they have rated in common, they will get a positive correlation; otherwise, they will get a negative correlation.

The other similarity measurement is called vector similarity. We can treat two users as vectors in n-dimensional space, where n is the number of items in the database. As with any two vectors, we can compare the angle between them. Intuitively, if the two vectors generally point in the same direction, they get a positive similarity; if they point in opposite directions, they get a negative similarity. To simulate this we just take the cosine the angle between these two vectors, which gives us a value from -1 to 1.

Predicting Ratings

In order to predict a rating for an item for an active user, we need to find all weights between the active user and all other users. We then take all non-zero weights and have each other user "vote" on what they think the active user should rate the item. Those with higher weights will matter more in the voting process. Once these votes are tallied, we have a predicted vote.

Note that the voting is based on how far off from a user's average they rate a movie - that is, we want to say how far off from the active user's average the active user will rate the item. Thus, with a positive correlation, the active user agrees with however far off the other user voted on a particular item; and with a negative correlation, the active user disagrees (i.e. goes in the opposite direction) from the other user's vote.

Enhancements

This works relatively well, but there are some enhancements we can make to the weighting techniques.

Default Voting: It turns out that correlation, as a similarity measurement, does not work very well on sparse data sets. That is, when two users have few items in common, their weights tend to be over-emphasized. Default voting simply adds a number of imaginary items that both have rated in common in order to smooth the votes.
Inverse User Frequency: There is an intuition that commonly enjoyed items are less important to weight than rarer items. That is, if everyone liked Star Wars, it doesn't help as much in determining weight as two people enjoying a rare independent film together. To take this into account, we just transform each vote when weighting two users such that commonly rated items are given less importance.
Case Amplification: This is a simple one - we simply amplify each weight by an exponent so that higher weights get higher and lower ones get lower. It tends not to work very well, but you can see a tiny improvement.

Advantages

The quality of predictions are rather good.
This is a relatively simple algorithm to implement for any situation.
It is very easy to update the database, since it uses the entire database every time it makes a prediction.

Disadvantages

It uses the entire database every time it makes a prediction, so it needs to be in memory it is very, very slow.
Even when in memory, it uses the entire database every time it makes a prediction, so it is very slow.
It can sometimes not make a prediction for certain active users/items. This can occur if the active user has no items in common with all people who have rated the target item.
Overfits the data. It takes all random variability in people's ratings as causation, which can be a real problem. In other words, memory-based algorithms do not generalize the data at all.

References

[1] J.S. Breese, D.Heckerman, and C.Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998.