CS 127 Spelling Checker

Assigned on Friday, 10/25.
Design document due Monday, 10/28 in class.
Project due Due Friday, 11/1 by 4 PM. Be aware that the labs in the CMC close at 5 PM.

The Assignment

For this assignment you will write a spell checker, by storing a dictionary as a sorted list of words. Its structure, in pseudocode, might look like this:
        Open the dictionary file.
        For each word in the dictionary
           add the word to the list.
        Close the dictionary file.

        Open the big monster text file (BMTF).
        For each word in the BMTF
           make proper adjustments to case of letters.
           search the list for the word.
           if word is misspelled, store it.
        Close the BMTF.

        Display all misspelled words to the screen in alphabetical order.
Note that many of these pseudocode lines will be handled by function calls in the real code.

The Dictionary

The central purpose of this assignment is to write an application which uses binary search. While you're debugging and testing your code at first, MAKE SURE to use a small dictionary file and a small text file. Otherwise, you'll put too much strain on our servers. Later, you'll need to try it on big files. I've placed a big dictionary in the file /Accounts/courses/cs127/dmusican/web2.txt . Try the UNIX command "wc" on it, and you will discover that this dictionary has 234,936 words in it. The words are sorted (alphabetically, not lexicographically), and one per line. You will need to build the dictionary in memory, and be able to search for a word in the dictionary using binary search.

Please do as much of the large scale testing as you can on lab machines or your own, and not on prism.

The Input

You should not use standard input (cin) to obtain either of the files. Your program should ask the user for the location of the dictionary and the location of the document to be checked.

Capital letters, punctuation, and all that jazz

The dictionary mentioned above contains words with upper and lower case letters.

In order to keep this problem from becoming a terribly tedious exercise in details, you should use the following over-simplified rules.

Punctuation:

If a word in your document is not at the beginning of a sentence: If a word in your document is at the beginning of a sentence:

The Output

You should list all misspelled words to the screen in alphabetical order.

Implementation Details

You will need to decide how to store the dictionary and the misspelled words. They're both rather similar in that they both store sorted strings. On the other hand, the dictionary is static whereas the misspelled words are dynamic. Choose appropriate data structures to deal with each of the two situations. Justify your decision with documentation in your code.

Frequently Asked Questions

Q: "What kind of dictionary is this? Why isn't <insert my favorite word here> in the dictionary?"
A: You get what you pay for. I found this dictionary for free on the web. Consider it to be the authoritative source. If a word is not in web2.txt, it is misspelled.

Good luck!