Revisionist History: Predicting Wikipedia Article Quality with Edit Histories

Authors

Narun Raman, Nathaniel Sauerberg, Jonah Fisher, and Addison Partida.

Advisor

Sneha Narayan

Term

Winter/Spring 2020

Abstract

We present a novel model for article quality classification based on structural properties of a network representation of the article's edit history. Inspired by Keegan et al. (2012), we create article trajectory networks, where nodes correspond to individual editors and edges join the authors of consecutive revisions. Using distance-, betweenness-, and clustering-based metrics generated from this model, along with general properties like the number of editors and article length, we predict which of six quality classes (Start, Stub, C-Class, B-Class, GA, FA) articles belong to, attaining a classification accuracy of 49.35% on a uniform sample of articles. This represents a similar level of accuracy to models that more directly align their predictors with Wikipedia quality class criteria, such as Warncke-Wang et al.'s "Actionable Model" (42.5% accuracy) and Halfaker's ORES model (62.9% accuracy). These results suggest that structures of collaboration underlying the creation of articles, and not only characteristics of the current public version at a particular point in time, should be considered for accurate quality classification.