CS 395 Assignment: Decision Trees
For this assignment, you will run a variety of decision tree variations.
I have placed a dataset that reflects congressional voting records in the
directory /Accounts/courses/cs395/dmusican/voting-records. Use the RainForest
algorithms RF-Write, RF-Read, and simple RF-Hybrid to build a decision trees
on this dataset using the ID3 criterion. Since this entire dataset is small
and fits entirely in memory, you should simulate not having enough memory.
Keep it simple: assume that precisely four AVC-groups can fit in memory at
once. Report the following information:
- Define R = time to read a tuple into memory and W = time to write
it to disk. How much time does each algorithm above take on this dataset?
- What does the final decision tree look like? You should use the entire
dataset as a training set.
- What is your training accuracy?
Assuming that this entire dataset is used as training data, what level of
training is achieved? What does your decision tree look like?
For RF-Hybrid, you should use the simple form described in the first paragraph
of section 4.3 of the Gehrke paper.