Fix My Bug

Overview
(Begin here!)

The goal of our project was to design a plug-in for an IDE that would suggest fixes for simple compile errors. Our approach was based on a 2010 paper that suggested solutions to errors based on what other users had done before; we used a similar approach, but leveraged a much bigger dataset to create a database of buggy code snippets and their associated fixes. We specifically focused on compile errors rather than runtime errors, because they were much easier to isolate, both in the end user’s code and in our own dataset.

Data

We used data from the Blackbox Data Collection Project, which holds the data generated by users of the BlueJ IDE. The Blackbox database stores a treasure trove of IDE data, including compile logs (the success or failure of each compile, plus the error messages displayed to the programmer), source code, and user session info. We downloaded information about each compile into a local SQLite database, which we then queried to isolate failed compiles and their corresponding fixes.

Plugin

We integrated our project into the Eclipse IDE as a plugin, which you can download from our downloads page. This had a few key advantages. First, it made it easy for our program to access the user’s code, and second, it allowed us to make the user experience as streamlined as possible. Fixes can be inserted into the client’s code with the press of a button.

Database Matching

We use a simple matching N-gram count approach to find entries in our database that are similar to the user’s buggy code. This works very quickly, but is prone to inaccuracies and false positives; thus, we calculate a more precise edit distance for the top 100 N-gram results, using the Smith-Waterman algorithm. We then return the fifteen results whose buggy code snippets most closely resemble the user’s error.

Tokenizing & Harmonizing

To compare the user’s buggy code to the buggy code in our database despite differences in function and variable names, we convert it into a series of tokens using the ANTLR lexer.

Once we find a fix, we convert the tokenized snippet back into Java code, inserting the user’s variable and function names. The fix may involve the insertion of a new variable or value, which is difficult to convert into Java since we don’t have a variable name or value to match with it; in this case we print out a tag that gives the user all the information we have, allowing them to decide the right name or value to give the inserted token.