CS 117, Winter 2001
Word frequencies, due 2/21/01
For this assignment, you may talk to other people about
coding strategy, algorithms, etc., but I want each of
you to write your own code.
Submit your code using the
Homework Submission Program.
The Goal
You are going to write a program that
- asks the user for the name of a text file,
- reads the words from the file, keeping track
of their frequencies (that is, how many times each
word appears),
- and prints the complete list of words and their
frequencies, sorted in decreasing order by frequency
(words with the same frequency should be printed
in alphabetical order).
So, for example, if the file contains the text:
The moose and the kudu
frolicked in the
meadow with their friends
the okapi and the gnu.
the output should be:
the 5
and 2
friends 1
frolicked 1
gnu 1
in 1
kudu 1
meadow 1
moose 1
okapi 1
their 1
with 1
Note that punctuation should be removed, and that
"the" and "The" are considered to be the same word.
Advice
I recommend that you use the "incremental development" approach
in writing this program. That means that you should
plan a sequence of partial solutions to the problem, each
slightly more ambitious than the one before it.
For example, you might write programs that do the following:
- Get the text file's name from the user, and
print all the words, in lower case form, on the screen.
Don't worry about duplicate words for this first version of the program, and
don't count the words. When this works, SAVE A COPY OF
YOUR CODE AND DON'T TOUCH IT AGAIN.
- Get the text file's name, read the words into
an array of strings, and then print them out. Again,
don't worry about duplicate words.
SAVE A COPY OF YOUR CODE AND DON'T TOUCH IT AGAIN.
- Read the words into an array of strings. Keep a separate
array of counters. If
you encounter a word that's already in the string array,
add 1 to that word's counter. Otherwise, add
the word to the string array and set the corresponding counter to 1.
SAVE A COPY OF YOUR CODE AND DON'T TOUCH IT AGAIN.
- Etc.
The idea is to plan a sequence of small steps, each of which
is manageable on its own, that add up to the complete program.
Among the many benefits of this approach is the constant
availability of a partial solution that can be handed in
for grading, or demonstrated to a customer, or released
to the public for early testing.
Start early, keep in touch, and have fun.
Jeff Ondich,
Department of Mathematics and Computer Science,
Carleton College, Northfield, MN
55057
(507) 646-4364,
jondich@carleton.edu