CS 201: Data Structures

Searching in Java: how fast are Java's collection classes?

Partners: you may work alone or with a partner of your choosing

Note to future users of this assignment. The basic idea is solid. However, the students ran into a consistent problem analyzing dependency on N (the size of the dictionary). It was common to see T decreasing as a function of N for binary search, HashMap, and TreeMap. This was because (1) their timing loops included both the reading of the file to be spell-checked and the printing of misspelled words, and (2) they used subsets of a larger dictionary file. Thus, a smaller dictionary meant lots more printing of misspellings, and the printing time swamped the search times for the faster algorithms. When I tried it myself, I used the full words.txt as my smallest dictionary, adding "words" like goat0001, goat0002, goat0003, etc. to increase the dictionary size from there without changing the list of misspellings. So my printing time remained constant, and I was also using much larger N's than they were. My results looked like they came right out of the textbook. I would definitely run this assignment again, but I would tell them how to handle the dictionary (or just tell them not to print results) to avoid this problem.

In computer science, searching refers to the process of determining whether a given collection of objects contains an object with specified properties. The collection often consists of a identically typed objects, each containing a key that identifies the object, and a search consists of the attempt to find the particular object that matches a specified search key. For example, if you search a database of Person objects by full name, the particular full name you're looking for (e.g. "Elmer Fudd") is your search key.

Often, searching is even simpler than that: you have a list of strings, and you want to know whether your search string is in the list. That's the case where the object and its key are one and the same, and it's also the case we're going to look at now.

Java provides several collection classes for storing objects. For this assignment, you will perform timing exeriments to compare the search speeds of four Java search structures:

The question at hand is this. Given a collection of N strings stored in one of these four ways, and a collection of K search strings, what can you say about the relationship between N, K, and the time it takes to search for K strings?

What to hand in

Hand in via Moodle:

Suggestions and offerings

As always...

Start early, ask questions, and have fun!