For our comps project, we replicated the machine learning paper A Gang of Bandits, which introduces GOB.Lin, a new type of contextual multi-armed bandit algorithm that can incorporate additional information about users in the form of a social network. You can find our paper here and our GitHub here.

Abstract

Recommendation systems are vital in maximizing user enjoyment. There exists an abundance of content for users to choose from on the many entertainment and service platforms available. Some platforms don't have access to preexisting information about what users would like. A next question might be how good can a recommendation be when we start with next to no information on our users as opposed to services used by companies such as Amazon or Netflix which have large swaths of data about every user. We can formalize this question about recommendation systems to be a multi-armed bandit problem. Recent research has sought to improve multi-armed bandits through better utilization of available information. A Gang of Bandits presents a novel algorithm that uses social network information and contextual information to more quickly learn users preferences and optimize cumulative payoff. This paper explores the process of replicating the experiments and results from A Gang of Bandits in order to validate the findings of the paper.

Figure 1: Hal the Multi-Armed Robot Bandit