CS 117, Fall 2000
Project 2: Words, due Friday 9/22/00
When you are done with this assignment, please submit it using the
Homework
Submission Program.
The problem
I often want to collect some statistics on my writing. If I'm writing a
proposal that has to be shorter than 2000 words, I will want to count the
number of words in each of my drafts. If I'm trying to write software documentation
at a sixth-grade reading level (tragically, that's what the software experts
usually recommend), I'll want to keep an eye on the number of letters and
syllables in my words, the number of words in my sentences, and the number
of sentences in my paragraphs.
Actually, I just like fooling around with this kind of thing. It pleases
me to know how many words there are in Middlemarch, or how many
times the word "whilst" appears in Hamlet (326398 and 4, respectively).
For this assignment, you will write a program that takes a text file
as input and reports
-
the number of words in the file,
-
the average number of letters per word,
-
the longest word and its length (if there is more than one word of greatest
length, you only need to report one of them),
-
the shortest word and its length (if there is more than one word of least
length, you only need to report one of them),
-
the number of occurrences of the words "the",
-
and the word from the file that is alphabetically last (for example, "zyzzyvas"
would probably win if it were in your file).
For now, we won't look at syllables, sentences, or paragraphs.
A little more detail
When you compile your program (g++ -o words words.cpp or something
similar) and run it like so:
words < somefile.txt
your program should produce an easy-to-read report that looks something
like this:
Number of words: 1729
Average word length: 5.72 letters
Longest word: electroencephalography (22 letters)
Shortest word: a (1 letter)
Occurrences of "the": 62
Alphabetically latest word: zygomorphic
A small offering
You might be surprised to learn that the hardest part of writing this program
is reading one word at a time out of the input file. Punctuation, extra
spaces, ends of lines, and the end of the file itself can all cause trouble
if you aren't careful. With this in mind, I have provided a C++ program
printwords.cpp
to help you out.
That's all
Make sure to include a comment at the top of your program giving your name,
the date, and a brief description of what your program does.
Start early, keep in touch, and have fun.
Written originally by Jeff Ondich, Dept. of Mathematics and Computer
Science, Carleton College
Slightly modified and assigned by Dave
Musicant, Department of Mathematics
and Computer Science,
Carleton College,
Northfield, MN 55057
(507) 646-4364,
dmusican@carleton.edu