One application of searching is spell-checking. Here, the database is a dictionary, and one search is performed for each word in the text being checked. For this assignment, you will use a rudimentary spell-checker (spell.c) to test various implementations of a dictionary. That is, your job will be to write several versions of a dictionary data type, and collect performance data and memory usage information about each version in the context of spell-checking.
I have provided you with one implementation of a dictionary in dictionary.c and dictionary.h. My dictionary stores the words in an array of strings (that is, a 2-dimensional array of char's), sorted in alphabetical order. To search this dictionary, I use binary search. One of the first things you should do is test my program by compiling dictionary.c and spell.c together into an executable called, say, spell. To run this program, type spell dictionaryfile textfile, where dictionaryfile is the name of a file containing the words in the dictionary, and textfile is the name of the file you want checked for spelling errors. Two dictionary files you could use are words.txt and littlewords.txt (the latter of which is named for the lengths of its words, not the number of words).
By Monday, 5/19/97, hand in .h and .c files for a dictionary implementation based on a hash table. Also, hand in ON PAPER a discussion of the relative performance of the four implementations (my binary search plus your linear search, AVL tree, and hash table). To test performance, you should first choose a dictionary file and a text file to test on all four implementations. Then, run and time each of your implementations. Also, compute how much memory each implementation uses to store the dictionary (printing out values of "sizeof(...)" can help here). You may, of course, try many dictionary/textfile pairs and report your results. In any case, the report you hand in should include not just your data, but also your interpretations of the data, with an emphasis on the relative merits of the various dictionary implementations.
I have set MAX_WORD_LENGTH to 24 in my implementation of the dictionary. You can set it smaller for littlewords.txt.
You should not change spell.c or the prototypes for the four Dictionary functions. You will need to change dictionary.h to introduce your new types, and perhaps new functions called by the four essential functions, but don't change the prototypes for InitializeDictionary, AddToDictionary, IsInDictionary, or PrintDictionary.
The dictionary files I have provided you are both alphabetized, with one word per line. words.txt contains 24,259 words of length up to 22 (that's electroencephalography, by the way), and includes each letter as a word (not just "I" and "a") as well as a lot of proper nouns and acronyms ("AAA", for example). Unfortunately, although it does include "jump", it doesn't include "jumped," "jumping," "jumper," or "jumps." The other dictionary file, littlewords.txt, consists of 79,339 words of length 8 and under. It loses out on the long words, but it does a very good job of getting all the forms of short words (including not only the forms of "jump" listed above, but also "jumpier," "jumpiest," and my favorite, "jumpily").
I don't know about you, but I'd be inclined to create a small dictionary file and a small text file for testing my code at intermediate stages.
Start early, keep in touch, and have fun.