2023–24 Projects:
Advisor: Sneha Narayan
Every article on Wikipedia is created by a group of editors who sequentially build upon one another’s work. Many times, this is a slow and steady process - perhaps a few editors might go back and forth over a couple of years to develop the article for Northfield High School, as you can see from the revision logs here. Other times, editors might be responding to a breaking news event, such as on the page for the Mueller report (revision logs here). This is often characterized by a flurry of activity right around when significant events take place (in this case, on April 18th, when the report was released publicly).
Articles about breaking news events are particularly tricky to write because the facts on the ground are constantly changing, and new sources regularly pop up that contain relevant information that might warrant inclusion. At the same time, editors need work quickly to create a synthesis that reflects what is currently known, while maintaining the encyclopedia’s commitment to presenting a ‘neutral point of view’ on the events. In this project, you will investigate how the process of co-creating an article on a breaking news topic like the Mueller report might look very different from the process of creating an article on a topic like Northfield High School.
How can we meaningfully characterize different patterns of activity that lead to the creation of article pages? One approach is to look at article trajectories - i.e. graph structures that represent the sequence of editor interactions on an article page. Figure 1 presents an example of what this looks like - an editor A may create an article, thus making the first revision to the page, after which editor B comes along and revises what editor A had. This action is represented by a directed edge from editor A to B in the article trajectory graph. As the sequence of edits progress, we can build more detailed graphs that demonstrate how the article evolved through this chain of revisions. These article trajectory graphs can then be analyzed to answer questions about how the creation of articles about breaking news events might follow different trajectories than articles about, say, historical events.
Much of the work you’ll do on this project will involve replicating work in this paper, which introduces the article trajectory approach and examines a corpus of Wikipedia articles about plane crashes. While you would be replicating the authors’ methods and approach, I would like you to identify a different category of breaking news events to conduct these analyses on and see how the original findings hold up. In particular you will:
All seniors are likely to have the necessary background for the computational parts of these projects. If you have had any statistics classes that introduced concepts about hypothesis testing (i.e. any course that might count towards the Statistics major, or a quantitative analysis course in one of the social sciences) that might be helpful for the analysis portion, but it is not a prerequisite. I’d expect you to be interested in learning and thinking about applications of computational techniques to problems motivated by the social sciences, but beyond a willingness to read papers in these areas, no specific social science background is required.
You can skim the paper this project is based on to get a sense of the kinds of analysis you will be doing, and the related work you’ll be reading for background.
Keegan, B., Gergle, D., & Contractor, N. (2012, August). Staying in the loop: structure and dynamics of Wikipedia's breaking news collaborations. In Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration (p. 1). ACM.