CS 395 Assignment: Clustering

Let's cluster!

Use a random subset of the census-income data consisting of 1000 points, using the continuous features as usual (but not instance-weight, and not year). Create software that uses the k-means clustering algorithm do find clusters in the data. You should try a vary the number of clusters, and choose an appropriate number. Use two different techniques for picking initial seeds, and determine how it affects your results. Turn in on paper an explanation of the methodologies that you used, the results that you found, and a clear description of the final clusters that you discovered.

Similarly, cluster using k-means on the voting-records dataset. Leave out the classification that indicates the political party. In addition to presenting your results, examine whether or not the clusters that k-means found in any way correlate with political party.

(Update for next time: emphasize that classes are unknown when clustering, use a different dataset with no classification)