2023–24 Projects:
Sometimes you're reading Wikipedia and a sentence, or just a fragment, sticks out like a sore thumb. Maybe it's a non sequitur, maybe the grammar or style doesn't match the rest of the paragraph, or maybe it makes a dubious claim. Whatever it is, if you're a conscientious Wikipedia editor, your impulse may be to investigate it, so you can fix it. But that's easier said than done. To give you a concrete example, a while ago I was reading the article on the “Westward Expansion Trails” — the Oregon Trail and its pals — and came across this passage (all formatting original):
The Oregon Trail was the only practical way for settlers in wagons without tools, livestock, or supplies to cross the mountains. Many believe that without the trail most of the American west would today be part of Canada or Mexico. It is now know as the great Ele road. During the twenty-five years 1841–1866, 250,000 to 650,000 people "pulled-up-stakes" and headed west.
Now, the whole article needed work, but there was something really fishy about that bolded sentence about “the great Ele road”. Again, the bolding was in the original article. I set about investigating when this text was added to the article, what other changes were made in the same edit, who made the edit, and what else they'd done on Wikipedia. Was this editor trustworthy? Did they just make a mistake by putting the whole sentence in bold and using the wrong verb tense, but otherwise knew what they were talking about? Or were they some random vandal?
In order to answer my questions, I had to manually apply the bisection method to the article history, looking at page after page of previous versions, until I finally found the one edit where this sentence was inserted. Turns out the editor had never written anything else on Wikipedia, so I took the sentence out of the article and copied it to an archive page, with an explanation of why I thought it was vandalism.
The whole process took about twenty minutes. Don't we have computers for stuff like this? I envision a user in a similar situation being able to highlight the passage in question, right-click, and bring up a little panel that shows only those edits that have affected the selected passage. The user could navigate around in this panel, without losing their place in the original article, to see the content of these edits, information about other edits by the same users (and whether any of them were deemed to be vandalism by other Wikipedia editors), the “historical context” of each edit (that is, which other edits came before and after it, whether it looks like there was an edit war going on, etc.), guidance about the trustworthiness of the editors in question, and so on. If I'd had a tool like this, I could've made my decision in seconds, and moved on with my life. Instead here I am writing a comps project proposal about it. =)
Your task is to create a tool that enables various reverse-lookup operations from the current text of an article to relevant aspects of the article history. Questions an editor might like to be able to investigate include:
There are loads of features you could add to your tool to support these kinds of investigations, and others. Your tool could be a browser plugin, a Wikipedia user extension, a Greasemonkey script, or even something else. It's not desirable to create a separate web application (or a standalone desktop program), since the user should have immediate access to your tool while browsing Wikipedia.
In order to do a good job on this project, you will need to research published computational techniques for interpreting document differences in general, as well as for detecting edit-warring and other editing behaviors by examining edit histories.