2023–24 Projects:
Advisor: Layla Oesper
Clustering algorithms take in a collection of data (with no labels) and are able to identify groupings of items in the data that share similar features. Numerous clustering algorithms exist. But one popular such algorithm is the k-means clustering algorithm (along with related algorithms k-medians and k-medoids). This algorithm aims to minimize the total variance amongst items clustering together, for a provided number of clusters. This approach has been used in a number of different domains including document clustering, identifying regions of cities with higher crime rates, or identifying cancer patients with different molecular profiles. However, in recent years there has been in increased focus on whether such clustering can be considered “fair” when considering certain subgroups in the data (e.g., demographic groups).
A number of modifications to the k-means (and related clustering algorithms) have been proposed in recent years to ensure that their output is “fair”. For example, Ghadiri et al. recently proposed a modification to the k-means objective function that changes how cluster centers are chosen. Alternatively, Chierichetti et al. introduced the concept of a fairlet - a minimal set of data items that satisfy fair representation while approximately preserving the clustering objective. However, there are many challenges associated with such approaches. For example, how do they affect the computational complexity of the clustering algorithm and what types of fairness do they work with?
In this project you will investigate approaches to incorporating fairness into the k-means algorithm (and related clustering algorithms). In particular you will:
Courses that could be useful for this project include Algorithms, Machine Learning and Making Decisions with AI. Some readings may assume basic knowledge of derivatives from Calculus as well.
Below are a few papers containing information about fairness in clustering. These are only intended to provide you a minimal start for your literature search - they are certainly not the only nor necessarily the best sources for ideas. You will be finding and reading many additional papers!
Fall Term, MWF 5A