Turn in your results electronically in either a text file or in a well-accepted word processor file format. This is an individual assignment.

  1. "Overfitting with decision trees is often avoided by pruning them." Explain why this is true using the computational learning theory results that we developed regarding the size of the hypothesis space (which are also explained in section 18.5 of your textbook).
  2. The statment given in part 1 above might lead one to believe that one should prune as much as possible. How is it that the results we proved in class lead one to draw the conclusion that one should prune decision trees, but yet the same results don't suggest that one should prune too far?
  3. At the top of page 715, your textbook uses the phrase "consistent hypothesis." I don't think your textbook explicitly defines this. What is a consistent hypothesis? How does it connect with the terminiology we used in class?
  4. We showed in class that P(Hbad has a "spy") <= |H|an. We want to keep this probability low. Why?
  5. The previous question suggests we wish to keep the bound on the right hand side of the inequality above as low as possible. Since a is a number less than 1, that can be done by making a small. This would seem to lead to the following yet ridiculous conclusion: since a is the accuracy of the classifier, and we want the bound low, we should strive to get a as low as possible. Untangle this logic.