2023–24 Projects:
Spam, or unsolicited commercial email, is a common annoyance. The amount of spam that appears in our email boxes seems to increase on a regular basis. Fortunately, a number of good techniques have been developed for detecting spam from non-spam. Free tools such as SpamAssassin are becoming popular, as well as a variety of commercial tools. However, most of these tools are designed to be run on mail servers. For users without superuser access to their servers, the only options are to run email software with spam filtering built in. This project will involve writing an email proxy server that will run on a user's local machine and sit between a user's email software of choice and the actual email server.
Here is a list of the concepts and technologies that will be necessary.
H. Drucker, D. Wu, and V. N. Vapnik, "Support Vector Machines for Spam Categorization," IEEE Transactions on Neural Networks, Vol. 20, No. 5, Sep. 1999.
M. Sahami, S. Dumais, D.. Heckerman, and E. Horvitz, "A Bayesian Approach to Filtering Junk E-Mail," in Proc. AAAI-1998, Jul. 1998.
Jason D. M. Rennie, "ifile: An Application of Machine Learning to E-Mail Filtering," in Proc. KDD-2000 Text Mining Workshop, Aug. 2000.
X. Carreras and L. Marquez, "Boosting Trees for Anti-Spam Email Filtering," in Proc. Euro Conference Recent Advances in NLP (RANLP-2001), Sep. 2001.