Each spell-checker should use more-or-less the same main program. Its structure, in pseudo-code, might look like this:
Open the dictionary file. For each word in the dictionary add the word to the list or hash table. Close the dictionary file. Open the big monster text file. For each word in the BMTF make the word lower case, and * search the list or hash table for the word. (For debugging, maybe print misspelled words or at least count them.) Close the BMTF.
Note that many of these pseudo-code lines will be handled by function calls in the real code.
The central purpose of this assignment is to introduce you to the performance differences for binary search and hash tables. Both are wonderful search techniques, but how do they compare in theory and in practice? That's the big question. To answer this question, you'll need to time the searching operations. To do this, first comment out the line with the *. That is, do absolutely everything except the searching, and time the program using the UNIX "time" command. Then uncomment the * line and run the program again, timing the result. The difference should be a decent measure of the total searching time. Note that the BMTF really needs to be big so that the reported search time is not 0.
You will need to declare an array of Strings, but you won't be able to declare such a big one inside any function. The reason for this is that the system stack, where all local variables live, has a maximum size that cannot accomodate 80,000 String objects. So I'm going to do what all CS teachers take a solemn oath never to do--I'm going to tell you to use a global variable. If the CS Police find out, I'm history, so try to keep it quiet. Anyway, try this:
#include#include String dictionary[80000]; int main( void ) { etc.
That's about all I'm going to tell you about this one. You need to be able to build the dictionary, one word at an alphabetical time, and you need to be able to search for a word in the dictionary using binary search. That's not binary search tree--just binary search.
#includeHere, "HashNode" is a linked-list node type defined by you in the header file hashtable.h. I'm using Figure 5.5 on page 185 as my model here. Weiss is a bit sneakier, putting some old code (his "List" type) to use in his implementation of an open hash table. You may use your judgement on this.#include #include "hashtable.h" HashNode *dictionary[160000]; int main( void ) { etc.
The big decision you'll need to make for this spell-checker is what your hash function will be. It should be chosen to give you a good spread of words through the hash table. Note that adding up the ASCII values of the characters will be no good--no word would come even close to a hash value of 10,000, let alone 160,000.
That's about all I have to say about the hash table version.
Try your code on a little dictionary before you attack the huge one.
Watch your account's quota. The big dictionary takes up a pretty big chunk of it. The UNIX command "ls -l" shows you the sizes in bytes of all your files. Note that you should also remove old executables, because they take up an obscene amount of space.
You may take code from Weiss, but you should cite it if you do.
As always, start early, keep in touch, and have fun.