Data Mining Readings

As each reading is assigned, you should do the following;

Carefully read through the assigned paper or book chapter.
Before the class on which the paper is due, post to the course Caucus conference with at least one question that you have about the reading, or comment on something you think is interesting.
At the same time, also post to the course Caucus conference a sample take-home exam question. I will use a subset of these on the exams.

For class on Wednesday, 4/2:
J. Friedman. Data Mining and Statistics: What's the Connection?.

For class on Monday, 4/7:
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest Neighbor Queries. SIGMOD 1995.

For class on Monday, 4/14:
J. Gehrke, R. Ramakrishnan, and V. Ganti. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets.

For class on Wednesday, 4/23:
P. Bradley, U. Fayyad, and C. Reina. Scaling Clustering Algorithms to Large Databases.

For class on Wednesday, 5/7:
S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases.

For class on Monday, 5/19:
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules
This is a long paper. Focus on the Apriori algorithm, and to a lesser degree on the AprioriTID algorithm. Don't spend much time reviewing AIS and SETM.