2023–24 Projects:
Advisor: Anna Rafferty
Recommendation systems are incredibly common online, from products you might like to movies with personalized ratings. Most of these systems focus on making predictions based on the past behavior of other users. However, there are also recommendation systems that directly work with the content that's being recommended, such as sites that analyze the text of news articles to bring you the articles that are most relevant.
Book recommendations are an area where there are a few existing systems, such as GoodReads or whatshouldireadnext.com, but there's plenty of room for improvement. These systems primarily use user behaviors or genres provided by the publisher, rather than natural language processing to make sense of additional information, such as reviews. The domain of books is challenging due to the huge number of books and the fact that most existing datasets have relatively sparse information. For example, readers tend to rate a much smaller portion of books than movie watchers rate movies.
The large amount of data out there about books (short description paragraphs available on Amazon, professional reviews, reviews on sites like Amazon or GoodReads, ratings data) has a great deal of potential to help readers find their next book and to create representations of books that readers can explore and visualize the various types of books and their relation to one another. I envision a system that can recommend books to me based on my selection of a few books or authors, and that can specifically find books that are likely to be similar to these books.
In more detail, in this project you will:
Activities you'll be doing will include evaluating the available data, using natural language processing to extract information from book reviews or summaries, implementing a recommendation system, and determining how to integrate textual information and behavioral data (e.g., matrices of which books have been rated by which readers). A major part of this project will be researching and discovering what's already been done in this space, and then building upon this work. The primary goal is to create a system that can successfully recommend relevant books, including evaluating how well your system can predict users' ratings on an existing book review dataset.
Once you have a basic recommendation system working with some evaluation of how well it performs, there are lots of places to push further on this project, such as the visualization and ways of incorporating additional natural language analyses to make better recommendations.
At a minimum, you will create a command line tool to that takes in books or authors and returns an ordered list of suggested next books. Additionally, you will write a short evaluation describing the system's performance on the existing dataset. As time permits, you may add in ways to visualize the data about the books, but this task is very much secondary to developing an algorithm that is effective and performs well on the recommendation task.
In this project, you'll be doing intense (and interesting!) algorithmic analysis to determine what algorithms work and what's likely to be successful based on characteristics particular to recommendation on book datasets. You don't need to have previous experience with this kind of work, but you do need to be willing (and hopefully excited!) to be engaged in these analyses. Previous experience working with large datasets may be helpful but not necessary. Some courses that may be useful but are not required are Algorithms, Advanced Algorithms, Artificial Intelligence, Data Mining, Computational Models of Cognition, Natural Language Processing, or Linear Algebra.
Below are a few papers about existing work in recommender systems and book recommendations in particular. Note that these references are intended to provide a minimal start for your literature search - they are certainly not the only nor necessarily the best sources for ideas. You will be finding and reading many additional papers!