CS 395: Data Mining
Syllabus

Instructor Information

Textbook

Important Dates

Class Website

Your Grade

Collaboration

You are encouraged to work together, given the following ground rules:
  1. Non-computer assignments: You should turn in your own assignment. You may work with other people, but each of you should be turning in your own.
  2. Computer assignments: You may work together on these in pairs, if you wish. Include everyone's names in documentation at the top. Make sure to cite any ideas you get from other people, websites, books, papers, or any other references.
  3. Take-home exams: Do these completely on your own. You can discuss them only with me.
  4. Final project: You may do this in pairs, if you wish.

Programming Environment

You may use any programming language that you wish, so long as it is supported on our departmental machines and you provide me with ample instructions on how to compile, run, and test your code.

Homework Policy

Details

We will cover the following topics:

Papers

J. Friedman. Data Mining and Statistics: What's the Connection?

N. Roussopoulos, S. Kelley, and F. Vincent. Nearest Neighbor Queries. SIGMOD 1995.

J. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets. Proc. of the 24th Int'l Conference on Very Large Databases.

P. Bradley, U. Fayyad, and C. Reina. Scaling Clustering Algorithms to Large Databases. KDD 1998.

S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases. Information System Journal, 26:(1), 2001, Elsevier Science Ltd.

R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. Proc. of the 20th Int'l Conference on Very Large Databases.

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web.

S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the link structure of the World Wide Web. IEEE Computer, August 1999.

W. Wang, J. Yang, and R. Muntz. STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of the 23rd VLDB Conference.

J. Breese, D. Heckerman, C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering.  Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence.

M. Berry, S. Dumais, T. Letsche. Computational Methods for Intelligent Information Access. Proc. of the 1995 ACM/IEEE Supercomputing Conference.

Knowledge Discovery in Databases vs. Personal Privacy, symposium organized by Gregory Piatetsky-Shapiro.