Supervised machine learning techniques (neural networks, decision trees, support vector machines, etc.) are amazingly useful in lots of real world applications. They have been used by doctors to help determine whether or not someone is ill, and also by the credit industry to determine whether or not someone should receive a credit card, to name just a few applications. A key desirable aspect of machine learning systems is explainability. If someone is rejected for credit, or determined to be ill, that person wants to know why. An answer such as "our algorithm says so" isn't very satisfying. Some machine learning algorithms, such as decision trees, produce models that are simple enough that they can be used for good explanations.
The support vector machine (SVM) is a machine learning algorithm that has been shown to be highly effective. In many applications, it provides more accurate results than simpler techniques, such as decision treees. SVMs, however, result in a "black box" classifier for which it is difficult to provide a simple explanation. There are a variety of techniques out there for approximating the outcome of an SVM in such a way to make the result more explainable, but this approximation tends to be less accurate. I have an idea for how to make SVMs be considerably more explainable without needing to approximate.
This project is new machine learning research. As far as I know, the technique that we will implement has never been done before. Therefore, the ultimate goal is to attempt to write up and publish this work (assuming it works). After we have implemented and experimented with this new technique, we will write a paper to be submitted to a major machine learning / data mining conference. KDD (Knowledge Discovery and Data Mining), has a deadline of mid-to-late February for submitting papers. This means that there will be some pressure to finish some aspects of the project a little sooner than the typical comps schedule allows, but this means that your comps crunch time won't come at the end of the term.
It is true that since no one has tried to do this before, the results are somewhat unpredictable. This technique might work better than I expect it to; or might work considerably worse. One of the challenges of doing original research is in not quite knowing how it is going to turn out!
Part of the project will involved building a graphical user interface for users to experiment with in order to understand the explanations produced by the SVMs. This software would be released to the community-at-large, to coincide with submission of the research paper.
Here is a list of the concepts and technologies that will be necessary.
Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. 2nd edition. Chapter 20.6: currently available from textbook web page.
David R. Musicant. Data Mining via Mathematical Programming and Machine Learning. (Chapters 1 and 2). Available linked from my web page.
Glenn Fung, Sathyakama Sandilya and Bharat Rao. Rule extraction for linear support vector machines. Computer Aided Diagnosis & Therapy Solutions, Siemens Medical Solutions, June 2004. KDD 2005 accepted for oral presentation.
Haydemar Nunez, Cecilio Angulo, and Andreu Catala. Rule extraction from support vector machines. ESANN'2002 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 24-26 April 2002, pp. 107-112.