CS 117, Winter 2000
Word frequencies, due 2/16/00
For this assignment, you may talk to other people about
coding strategy, algorithms, etc., but I want each of
you to write your own code.
Submit your code using the
Homework Submission Program.
The Goal
You are going to write a program that
- asks the user for the name of a text file,
- reads the words from the file, keeping track
of their frequencies (that is, how many times each
word appears),
- and prints the complete list of words and their
frequencies, sorted in decreasing order by frequency
(words with the same frequency should be printed
in alphabetical order).
So, for example, if the file contains the text:
The moose and the kudu
frolicked in the
meadow with their friends
the okapi and the gnu.
the output should be:
the 5
and 2
friends 1
frolicked 1
gnu 1
in 1
kudu 1
meadow 1
moose 1
okapi 1
their 1
with 1
Note that punctuation should be removed, and that
"the" and "The" are considered to be the same word.
Advice
This project is the biggest we've done in this class so far.
I recommend that you use the "incremental development" approach
in writing this program. That means that you should
plan a sequence of partial solutions to the problem, each
slightly more ambitious than the one before it.
For example, you might write programs that do the following:
- Get the text file's name from the user, and
print all the words, in lower case form, on the screen.
Don't worry about duplicate words for this program, and
don't count the words. When this works, SAVE A COPY OF
YOUR CODE AND DON'T TOUCH IT AGAIN.
- Get the text file's name, read the words into
an array of structs (like the Word struct we discussed
in class), and then print them out. Again,
don't worry about duplicate words.
SAVE A COPY OF YOUR CODE AND DON'T TOUCH IT AGAIN.
- Read the words into an array of structs. If
you encounter a word that's already in the array,
add one to that word's counter. Otherwise, add
the word to the array and set its counter to 1.
SAVE A COPY OF YOUR CODE AND DON'T TOUCH IT AGAIN.
- Etc.
The idea is to plan a sequence of small steps, each of which
is manageable on its own, that add up to the complete program.
Among the many benefits of this approach is the constant
availability of a partial solution that can be handed in
for grading, or demonstrated to a customer, or released
to the public for early testing.
Start early, keep in touch, and have fun.
Jeff Ondich,
Department of Mathematics and Computer Science,
Carleton College, Northfield, MN
55057
(507) 646-4364,
jondich@carleton.edu