CS 395: Data Mining
Syllabus
Instructor Information
- Instructor: Dave Musicant
- Office: CMC 326
- Email: dmusican _AT_ carleton.edu
- Office phone: (507)646-4369
- Office hours: Mondays, 4-5, Tuesdays, 9-11, Wednesdays, 4-5
Textbook
- Data Mining Introductory and Advanced Topics, Margaret H. Dunham, Prentice-Hall,
2003.
Important Dates
- Take home exam 1: Assigned Monday, 4/28. Due Friday, 5/2 in class.
- Take home exam 2: Assigned Friday, 5/30. Due Wednesday, 6/4 in class.
- Final project due: Monday, 6/9, end of last exam.
Class Website
Your Grade
- Assignments: 40%
- Take home exam 1: 20%
- Take home exam 2: 20%
- Class project: 20%
Collaboration
You are encouraged to work together, given the following ground rules:
- Non-computer assignments: You should turn in your own assignment.
You may work with other people, but each of you should be turning in your
own.
- Computer assignments: You may work together on these in pairs, if
you wish. Include everyone's names in documentation at the top. Make sure
to cite any ideas you get from other people, websites, books, papers, or
any other references.
- Take-home exams: Do these completely on your own. You can discuss
them only with me.
- Final project: You may do this in pairs, if you wish.
Programming Environment
You may use any programming language that you wish, so long as it is supported
on our departmental machines and you provide me with ample instructions on
how to compile, run, and test your code.
Homework Policy
- Each assignment will have a specific time for which it will be due.
An assignment turned in late within one day of the due time will be docked
25%. A program turned in later than one day of the due date but within two
days will be docked 50%. An assignment turned in any time after this until
the last day of classes will be docked 75%. This same policy applies to take-home
exams.
- College policy dictates that there can be no grace period on the final
project.
Details
We will cover the following topics:
- Introduction (Dunham chapters 1-3)
- Classification and Regression Techniques (Dunham chapter 4 + supplemental
readings)
- Clustering (Dunham chapter 4 + supplemental readings)
- Association Rules (Dunham chapter 5 + supplemental readings)
- Web Mining (Dunham chapter 7 + supplemental readings)
- Spatial Mining (Dunham chapter 8 + supplemental readings)
- Temporal Mining (Dunham chapter 9 + supplemental readings)
- Collaborative Filtering (supplemental readings)
- Text Mining (supplemental readings)
Papers
J. Friedman. Data Mining and
Statistics: What's the Connection?
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest
Neighbor Queries. SIGMOD 1995.
J. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST
- A Framework for Fast Decision Tree Construction of Large Datasets. Proc.
of the 24th Int'l Conference on Very Large Databases.
P. Bradley, U. Fayyad, and C. Reina. Scaling Clustering
Algorithms to Large Databases. KDD 1998.
S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering
Algorithm for Large Databases. Information System Journal, 26:(1), 2001,
Elsevier Science Ltd.
R. Agrawal and R. Srikant. Fast
Algorithms for Mining Association Rules. Proc. of the 20th Int'l Conference
on Very Large Databases.
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank
Citation Ranking: Bringing Order to the Web.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan,
S. Rajagopalan, and A. Tomkins. Mining the link
structure of the World Wide Web. IEEE Computer, August 1999.
W. Wang, J. Yang, and R. Muntz. STING: A Statistical
Information Grid Approach to Spatial Data Mining. Proceedings of the
23rd VLDB Conference.
J. Breese, D. Heckerman, C. Kadie. Empirical
Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings
of the 14th Conference on Uncertainty in Artificial Intelligence.
M. Berry, S. Dumais, T. Letsche. Computational
Methods for Intelligent Information Access. Proc. of the 1995 ACM/IEEE
Supercomputing Conference.
Knowledge
Discovery in Databases vs. Personal Privacy, symposium organized by Gregory
Piatetsky-Shapiro.