CS 395 Assignment: Clustering
Let's cluster!
Use a random subset of the census-income data consisting of 1000 points,
using the continuous features as usual (but not instance-weight, and not
year). Create software that uses the k-means clustering algorithm do find
clusters in the data. You should try a vary the number of clusters, and choose
an appropriate number. Use two different techniques for picking initial seeds,
and determine how it affects your results. Turn in on paper an explanation
of the methodologies that you used, the results that you found, and a clear
description of the final clusters that you discovered.
Similarly, cluster using k-means on the voting-records dataset. Leave out
the classification that indicates the political party. In addition to presenting
your results, examine whether or not the clusters that k-means found in any
way correlate with political party.
(Update for next time: emphasize that classes are unknown when
clustering, use a different dataset with no classification)