Quotes in Context

Final Results

Quotes in Context

Advisor: Eric Alexander

Background

“Context matters!” So goes the refrain from countless people who feel that their words or works are not being accurately represented--from scientists, to celebrities, to politicians, to professional athletes. In some ways, such complaints have been exacerbated by the era of social media in which soundbites and headlines can propagate through a network of millions of people in minutes, with the potential for added noise or mutation at every step of the way. On the other hand, it is possible that “out-of-context” has just become a convenient excuse for someone who has said something they wish they hadn't. Instances where relevant context is missing (or not) may be something we can identify computationally.

The project

In this project, we will use natural language processing techniques to investigate how the availability of context (or lack thereof) has influenced the way that quotes get shared, as well as how they are generated. Specifically, we will investigate interviews made of athletes, coaches, and other participants after sporting events. Such interviews are often contractually mandated, which has caused some athletes to become particularly tight-lipped in protest. At the other end of the spectrum, many athletes in recent years have taken it upon themselves to use this platform to advocate for social change. We will investigate how these trends and others may have an effect on what athletes talk about and how they talk about it.

As an initial stage of this project, we will identify datasets that allow us to explore this topic, including interview transcripts, popular press articles, and posts on social media platforms. We will then seek to apply algorithmic techniques such as n-gramming, named-entity recognition, topic modeling, and sentiment analysis to see what conclusions we can draw. Some questions we may ask include:

Do complaints of quotes being taken out of context hold water? That is, are the quotes from interviews that appear in articles and social media representative of everything that was said in those interviews?
Has there been a change over time in how people (both those answering questions and those asking them) approach interviews? Does this correlate with the rise of social media, social justice movements, etc.?
Is there any difference amongst outlets in what types of language gets reported on when covering the same interview(s)?
Do we observe different behavior between different individuals and groups? (E.g., coaches vs. players.) Do people in different positions seem more or less comfortable offering candid responses?
Can we identify upticks in engagement around a story (e.g., through social media data) based on the language used? Is it possible to predict which quotes are likely to be shared widely?

Deliverables

The deliverables for this project will depend in part on the specific questions we decide to pursue. The output might potentially take the form of a user-facing tool for exploring this data, but might also take the form of a model and accompanying visuals and write-up detailing what our model has shown.

Recommended experience

Natural Language Processing (CS 324) would be valuable, but few if any of you have had the opportunity to take it! As such, its prereqs (Data Structures, Math of CS/Math Structures) will be necessary. Experience in other subfields of AI/machine learning may be helpful, but is not required.