CS 395 Assignment #2: k-Nearest Neighbor
For this assignment, you will run a variety of k-Nearest Neighbor variations.
Create a subset of the file census-income.data consisting of the first 10,000
points. Likewise, create a subset of the file census-income.test containing
the first 10,000 points. For both subsets, include only the continuous attributes
and the classification (-50000, 50000+). Using your subset of census-income.data
as a training set, train a k-nearest neighbor classifier and measure performance
on both the training set and the test set. You should vary your technique
in the following ways:
- Try it for k = 1, 2, 3, ..., 20
- Try it with Euclidean distance, Manhattan distance, and cosine distance
Report your results. Specifically, submit plots showing the trend as k varies
from 1 to 20 for each of the three distances, focusing on both training set
and test set accuracy. What to you find?
You should turn in on paper the plots that you generate, as well as a description
of what you have uncovered.