Advisor: Eric Alexander
“Context matters!” So goes the refrain from countless people who feel that their words or works are not being accurately represented--from scientists, to celebrities, to politicians, to professional athletes. In some ways, such complaints have been exacerbated by the era of social media in which soundbites and headlines can propagate through a network of millions of people in minutes, with the potential for added noise or mutation at every step of the way. On the other hand, it is possible that “out-of-context” has just become a convenient excuse for someone who has said something they wish they hadn't. Instances where relevant context is missing (or not) may be something we can identify computationally.
In this project, we will use natural language processing techniques to investigate how the availability of context (or lack thereof) has influenced the way that quotes get shared, as well as how they are generated. Specifically, we will investigate interviews made of athletes, coaches, and other participants after sporting events. Such interviews are often contractually mandated, which has caused some athletes to become particularly tight-lipped in protest. At the other end of the spectrum, many athletes in recent years have taken it upon themselves to use this platform to advocate for social change. We will investigate how these trends and others may have an effect on what athletes talk about and how they talk about it.
As an initial stage of this project, we will identify datasets that allow us to explore this topic, including interview transcripts, popular press articles, and posts on social media platforms. We will then seek to apply algorithmic techniques such as n-gramming, named-entity recognition, topic modeling, and sentiment analysis to see what conclusions we can draw. Some questions we may ask include:
The deliverables for this project will depend in part on the specific questions we decide to pursue. The output might potentially take the form of a user-facing tool for exploring this data, but might also take the form of a model and accompanying visuals and write-up detailing what our model has shown.
Natural Language Processing (CS 324) would be valuable, but few if any of you have had the opportunity to take it! As such, its prereqs (Data Structures, Math of CS/Math Structures) will be necessary. Experience in other subfields of AI/machine learning may be helpful, but is not required.