CS 117 Assignment, due 4/23/97

Word Lengths

You may work with a partner on this assignment.

The program

Your goal is to write a program that will read the words from a text file and report the frequencies of the various word lengths.

For example, if the file contained the following:


  program: (noun) A magic spell cast over a computer allowing
  it to turn one's input into error messages.

your program should report that there were 2 words of length 1, 2 of length 2, 0 of length 3, 5 of length 4, etc. (Note that I have considered "one's" to be a 5-character word rather than a 4-letter word. Do whatever is easiest when handling words containing apostrophes.)

I would like you to report your results in histogram form, like so:


1:2     xx
2:2     xx
3:0
4:5     xxxxx
5:5     xxxxx
6:0
7:1     x
8:3     xxx

Since it's more important to get the word counting right than to get the histogram looking good, I recommend that you get the counting working first, perhaps reporting the results like this:

Length  Frequency
-----------------
1       2
2       2
3       0
4       5
5       5
6       0
7       1
8       3

Once you have the counting working properly, you can move on to the histogram.

If you you run your program on, say, a dictionary file in which there are 2 1-letter words, 52 2-letter words, 514 3-letter words, 2011 4-letter words, 3275 5-letter words, etc., you could make one "x" count for 250 words, and thus produce a histogram that doesn't overflow onto hundreds of lines. You can define the "words-per-x" as a constant using const, or you could have the program decide itself how many words per x there should be once it has read the file.

Suggestions

Use an array of integers to keep track of how many words of each length there are. You might declare this array like so:

type     IntArray = array[1..30] of integer;

var      frequency : IntArray;
At the start of your program, you should set frequency[1], frequency[2], etc. all equal to 0. Then, read the words one at a time. For each word, add 1 to frequency[?], where ? is the length of the word. By the end of the run of the program, frequency[1] should contain the number of 1-letter words in the input file, frequency[2] the number of 2-letter words, etc. You may use ReadWord again.

You may assume that no word is longer than 30 letters.

You should probably create your own small text file for early testing of your program. Once things are working pretty well, you should try your program on the dictionary file words.txt, or on /LocalLibrary/Intel_LocalLibrary/Literature/ByTitle/CanterburyTales/Group_A/The Milleres Tale, or on any other file you might find in /LocalLibrary/Intel_LocalLibrary/Literature or on the Web. An interesting source of text files is http://www.promo.net/pg/, the home page for Project Gutenberg.

Start early, keep in touch, and have fun.



Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057
(507) 646-4364, jondich@carleton.edu