CS 395 Assignment #2: k-Nearest Neighbor

For this assignment, you will run a variety of k-Nearest Neighbor variations.

Create a subset of the file census-income.data consisting of the first 10,000 points. Likewise, create a subset of the file census-income.test containing the first 10,000 points. For both subsets, include only the continuous attributes and the classification (-50000, 50000+). Using your subset of census-income.data as a training set, train a k-nearest neighbor classifier and measure performance on both the training set and the test set. You should vary your technique in the following ways:
Report your results. Specifically, submit plots showing the trend as k varies from 1 to 20 for each of the three distances, focusing on both training set and test set accuracy. What to you find?

You should turn in on paper the plots that you generate, as well as a description of what you have uncovered.