Carleton chemistry professor Deborah Gross does research studying small airborne particles present in the air we breathe, to better understand human health as well as climate change implications. A number of years ago, she and I worked together to build the software tool Enchilada: that impressive acronym stands for Environmental Chemistry through Intelligent Atmospheric Data Analysis. Enchilada provides Deborah and her colleagues with the ability to process data from her monitoring equipment, and do a variety of forms of data analysis on it. For example, given a dataset of airborne particles collected from a major pollution site, Enchilada offers a variety of visualizations to help the user understand what the composition of the pollution might be. It can also run clustering algorithms to help the user find prevalent categories of particles in the dataset.
Enchilada hasn't been actively worked on in about 8 years. In fact, we haven't done a release or even rebuilt it during that time. It needs help. I suspect the last time we built a release we were using Java 5 or 6. The first step in this project will simply be to revive Enchilada, get it building with modern development tools, and with current versions of Java.
Enchilada uses a database system to store all of its data within. Currently, it uses Microsoft SQL Server. We made the choice to go with SQL Server back when the project first started, which was a very long time ago. At the time, we tried a few different systems, and on the Windows platform (which is where our users were), SQL Server was clearly faster than the alternatives that we tried. That said, there are a number of disadvantages associated with SQL Server. It's costs additional money; it requires a separate install since we can't bundle it with Enchilada; and it prevents Enchilada from working on Macs.
Since we made the choice to go with SQL Server back in 2005, the database world has changed pretty dramatically. A variety of NoSQL database systems have appeared and surged in popularity; and can in some situations be considerably faster. The existing open source relational database alternatives to SQL Server have also gotten much faster.
The main goal of this project will be to learn about how a variety of open-source database systems work (both relational and NoSQL), how to tune them, how to optimize them, and to implement each one within Enchilada to see if performance can be improved. If all goes well, the hope is to release a new modern updated version of Enchilada that instead uses a faster open-source database instead of SQL Server. Regardless, the real point of this project is to use Enchilada as a base on which to do a comparative study of database systems. By the time you're done with this project, as a team you'll have a wealth of experience in setting up, implementing, comparing, and understanding a variety of cutting-edge database tools.
You'll be expected to produce the following deliverables associated with your project:
CS 334 (Database Systems) isn't required to work on this project, but it would help a lot.
The user community for Enchilada is generally only interested in running it on Windows, and so we have only done releases targeting that operating system. It is true that the Enchilada is written in Java, so in principle it could run on other platforms, but the user interface code has been tweaked fairly heavily to work optimally under Windows. It will not run well (currently) elsewhere, though that might be fixed. Likewise, SQL Server requires Windows (or apparently also Linux, though we haven't tried it.) SQL Server will not run on Mac OS.
This project will therefore require significant development work to be done on the Windows operating system. This can be done in our labs (we can set up computers in CMC 307), or on your own laptop if you have one that runs Windows. Porting Enchilada to work on Mac is an option (see above), but you should expect that this would take time. The key benchmarks against SQL Server would regardless need to be done on Windows computers.