Carleton Comps Project: Book Recommender

Final Results

Book Recommender

Advisor: Anna Rafferty

Background

Recommendation systems are incredibly common online, from products you might like to movies with personalized ratings. Most of these systems focus on making predictions based on the past behavior of other users. However, there are also recommendation systems that directly work with the content that's being recommended, such as sites that analyze the text of news articles to bring you the articles that are most relevant.

Book recommendations are an area where there are a few existing systems, such as GoodReads or whatshouldireadnext.com, but there's plenty of room for improvement. These systems primarily use user behaviors or genres provided by the publisher, rather than natural language processing to make sense of additional information, such as reviews. The domain of books is challenging due to the huge number of books and the fact that most existing datasets have relatively sparse information. For example, readers tend to rate a much smaller portion of books than movie watchers rate movies.

The large amount of data out there about books (short description paragraphs available on Amazon, professional reviews, reviews on sites like Amazon or GoodReads, ratings data) has a great deal of potential to help readers find their next book and to create representations of books that readers can explore and visualize the various types of books and their relation to one another. I envision a system that can recommend books to me based on my selection of a few books or authors, and that can specifically find books that are likely to be similar to these books.

The project

In more detail, in this project you will:

Study the literature on clustering and recommendation systems.
Develop an engine where a user can submit a few books or authors, and it will display an ordered list of suggested next books that the user should read (perhaps with the systems' predictions of the user's ratings of these books).
Evaluate your engine's performance.
(If time permits) Create a visualization that allows the user to easily see the structure of the space of books.

Activities you'll be doing will include evaluating the available data, using natural language processing to extract information from book reviews or summaries, implementing a recommendation system, and determining how to integrate textual information and behavioral data (e.g., matrices of which books have been rated by which readers). A major part of this project will be researching and discovering what's already been done in this space, and then building upon this work. The primary goal is to create a system that can successfully recommend relevant books, including evaluating how well your system can predict users' ratings on an existing book review dataset.

Once you have a basic recommendation system working with some evaluation of how well it performs, there are lots of places to push further on this project, such as the visualization and ways of incorporating additional natural language analyses to make better recommendations.

Deliverables

At a minimum, you will create a command line tool to that takes in books or authors and returns an ordered list of suggested next books. Additionally, you will write a short evaluation describing the system's performance on the existing dataset. As time permits, you may add in ways to visualize the data about the books, but this task is very much secondary to developing an algorithm that is effective and performs well on the recommendation task.

Recommended experience

In this project, you'll be doing intense (and interesting!) algorithmic analysis to determine what algorithms work and what's likely to be successful based on characteristics particular to recommendation on book datasets. You don't need to have previous experience with this kind of work, but you do need to be willing (and hopefully excited!) to be engaged in these analyses. Previous experience working with large datasets may be helpful but not necessary. Some courses that may be useful but are not required are Algorithms, Advanced Algorithms, Artificial Intelligence, Data Mining, Computational Models of Cognition, Natural Language Processing, or Linear Algebra.

References/inspiration

Below are a few papers about existing work in recommender systems and book recommendations in particular. Note that these references are intended to provide a minimal start for your literature search - they are certainly not the only nor necessarily the best sources for ideas. You will be finding and reading many additional papers!

Adomavicius, G., & Tuzhilin, A. (2011). Context-aware recommender systems. In Recommender Systems Handbook (pp. 217-253). Springer US.
Huang, Z., Chung, W., Ong, T. H., & Chen, H. (2002). A graph-based recommender system for digital library. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 65-73). ACM.
Ziegler, C. N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th International Conference on World Wide Web (pp. 22-32). ACM.