Write a program in Lisp, Java, or C++ to implement the ID3 decision tree algorithm. You should read in a space delimited dataset and output your decision tree and the training set accuracy in some readable format. For example, here is the tennis dataset. The first line will contain the names of the fields:
outlook temperature humidity wind playtennis sunny hot high FALSE no sunny hot high TRUE no overcast hot high FALSE yes rainy mild high FALSE yes rainy cool normal FALSE yes rainy cool normal TRUE no overcast cool normal TRUE yes sunny mild high FALSE no sunny cool normal FALSE yes rainy mild normal FALSE yes sunny mild normal TRUE yes overcast mild high TRUE yes overcast hot normal FALSE yes rainy mild high TRUE no
The last column is the classification attribute, and will always
contain contain the values yes
or no
.
For output, you might want to do something like Weka does: turn the decision tree on its side, and use indentation to show levels of the tree as it grows from the left. This is probably easier than drawing the tree growing down from the top of the screen.
outlook = sunny | humidity = high: no | humidity = normal: yes outlook = overcast: yes outlook = rainy | windy = TRUE: no | windy = FALSE: yes Training set accuracy = 100%
If you choose to code in Lisp, look up the
make-hash-table
function. You may find it useful. If you
code in Java, you may find that the HashMap
helps
similarly.