CS 127, Project 5: Spelling Checker

Project due Due Monday, 10/23/00
Design document due Wednesday, 10/18/00

The Assignment

For this assignment you will write a spell checker, by storing a dictionary as a sorted list of words. Its structure, in pseudocode, might look like this:
        Open the dictionary file.
        For each word in the dictionary
                add the word to the list.
        Close the dictionary file.

        Open the big monster text file.
        For each word in the BMTF
                make proper adjustments to case of letters.
                search the list for the word.
                if word is misspelled, store it.
        Close the BMTF.

        Display all misspelled words to the screen in alphabetical order.
Note that many of these pseudocode lines will be handled by function calls in the real code.

The Dictionary

The central purpose of this assignment is to write an application which uses binary search. While you're debugging and testing your code at first, you'll want to write a small dictionary file and a small text file. But later, you'll need to try it on big files. I've placed a big dictionary in the file /Accounts/courses/cs127/dmusican/dictionary.txt . Try the UNIX command "wc" on it, and you will discover that this dictionary has 35,441 words in it. It's small as far as dictionaries are concerned, but certainly big enough to make the project interesting. The words are already sorted, in lower case letters, and one per line. You will need to build the dictionary in memory, and be able to search for a word in the dictionary using binary search.

Capital letters and other details

The dictionary mentioned above contains words with all lower case letters. A real document, of course, contains words with both capital and lower case letters. You should assume that any word with a capital letter in it is misspelled unless the first letter is capitalized and the word is the first one in a sentence (i.e., follows a period). Under this circumstance, you should convert the first letter to lower case and check it against the dictionary. If this word actually is misspelled, you should report it back to the user in all lower case letters. This way, the user knows that the problem is actually in the spelling of the word itself, and not just because of the capital letter.

You should remove all punctuation from a word before checking its spelling. If the word is misspelled, report it to the user without the punctuation.

The Output

You should list all misspelled words to the screen in alphabetical order. Each misspelled word should only be listed once.

Implementation Details

You will need to decide how to store the dictionary and the misspelled words. They're both rather similar in that they both store sorted strings. On the other hand, the dictionary is static whereas the misspelled words are dynamic. Choose appropriate data structures to deal with each of the two situations. Justify your decision with documentation in your code.

Design

You need to make some decisions as to how to store and manage the information. To help you stay on track, I'm going to ask you to submit a design document as to how you'll attack the problem. The design document should contain the following information: You should plan on getting each of your classes to function independently. Among the many benefits of this approach is the constant availability of a partial solution that can be handed in for grading, or demonstrated to a customer, or released to the public for early testing.

You should start coding your project before turning in the design document. Rapid prototyping of your ideas is a good way to validate your design.

Good luck!