Data Mining Readings
As each reading is assigned, you should do the following;
- Carefully read through the assigned paper or book chapter.
- Before the class on which the paper is due, post to the course
Caucus conference with at least one question that you have about the reading,
or comment on something you think is interesting.
- At the same time, also post to the course Caucus conference a sample
take-home exam question. I will use a subset of these on the exams.
For class on Wednesday, 4/2:
J. Friedman. Data Mining and
Statistics: What's the Connection?.
For class on Monday, 4/7:
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest
Neighbor Queries. SIGMOD 1995.
For class on Monday, 4/14:
J. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST
- A Framework for Fast Decision Tree Construction of Large Datasets.
For class on Wednesday, 4/23:
P. Bradley, U. Fayyad, and C. Reina. Scaling Clustering
Algorithms to Large Databases.
For class on Wednesday, 5/7:
S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering
Algorithm for Large Databases.
For class on Monday, 5/19:
R. Agrawal and R. Srikant. Fast
Algorithms for Mining Association Rules
This is a long paper. Focus on the Apriori algorithm, and to a lesser degree
on the AprioriTID algorithm. Don't spend much time reviewing AIS and SETM.