CS 127, Project 5: Spelling Checker
Project due Due Monday, 10/23/00
Design document due Wednesday, 10/18/00
The Assignment
For this assignment you will write a spell checker, by storing a dictionary
as a sorted list of words. Its structure, in pseudocode, might look like
this:
Open the dictionary file.
For each word in the dictionary
add the word to the list.
Close the dictionary file.
Open the big monster text file.
For each word in the BMTF
make proper adjustments to case of letters.
search the list for the word.
if word is misspelled, store it.
Close the BMTF.
Display all misspelled words to the screen in alphabetical order.
Note that many of these pseudocode lines will be handled by function calls
in the real code.
The Dictionary
The central purpose of this assignment is to write an application which
uses binary search. While you're debugging and testing your code at first,
you'll want to write a small dictionary file and a small text file. But
later, you'll need to try it on big files. I've placed a big dictionary
in the file /Accounts/courses/cs127/dmusican/dictionary.txt .
Try the UNIX command "wc" on it, and you will discover that this dictionary
has 35,441 words in it. It's small as far as dictionaries are concerned,
but certainly big enough to make the project interesting. The words are
already sorted, in lower case letters, and one per line. You will need
to build the dictionary in memory, and be able to search for a word in
the dictionary using binary search.
Capital letters and other details
The dictionary mentioned above contains words with all lower case letters.
A real document, of course, contains words with both capital and lower
case letters. You should assume that any word with a capital letter in
it is misspelled unless the first letter is capitalized and the
word is the first one in a sentence (i.e., follows a period). Under this
circumstance, you should convert the first letter to lower case and check
it against the dictionary. If this word actually is misspelled, you should
report it back to the user in all lower case letters. This way, the user
knows that the problem is actually in the spelling of the word itself,
and not just because of the capital letter.
You should remove all punctuation from a word before checking its spelling.
If the word is misspelled, report it to the user without the punctuation.
The Output
You should list all misspelled words to the screen in alphabetical order.
Each misspelled word should only be listed once.
Implementation Details
You will need to decide how to store the dictionary and the misspelled
words. They're both rather similar in that they both store sorted strings.
On the other hand, the dictionary is static whereas the misspelled words
are dynamic. Choose appropriate data structures to deal with each of the
two situations. Justify your decision with documentation in your code.
Design
You need to make some decisions as to how to store and manage the information.
To help you stay on track, I'm going to ask you to submit a design document
as to how you'll attack the problem. The design document should contain
the following information:
-
Descriptions of the classes you will need to solve the problem, including
variables and methods
-
Prototypes of the methods and other functions that you will need
-
Descriptions of the purpose for each method and function
-
Descriptions of the variables, arrays, etc. you will need to store and
manage the information
-
Pseudocode for each of the functions
-
Estimates of how long it will take you to write each of these functions.
Consider how long it will take to translate the pseudocode into actual
C++ code, then remember to leave lots of time for debugging and testing.
-
Your test plan: how will you test each of these functions individually
to make sure they work?
You should plan on getting each of your classes to function independently.
Among the many benefits of this approach is the constant availability of
a partial solution that can be handed in for grading, or demonstrated to
a customer, or released to the public for early testing.
You should start coding your project before turning in the design
document. Rapid prototyping of your ideas is a good way to validate your
design.
Good luck!