CS 117 Assignment, Due 8:30 AM, 1/22/96

Word Lengths

Hand in via HSP.

Your goal is to write a program that will read a dictionary, stored as a list of words in a text file, one word per line, and report the frequencies of the various word lengths.

For example, if the file looked like this:


armadillo
bee
cougar
deer
emu

your program should report that there were 0 words of length 1, 0 of length 2, 2 of length 3, 1 of length 4, etc. One way to display this information would be:


Length  Frequency
-----------------
1       0
2       0
3       2
4       1
5       0
6       1
7       0
8       0
9       1

Here are a couple of suggestions on how to approach this problem. Use an array of integers, like


type     intArray = array[1..30] of integer;

var      frequency : intArray;
To begin, you should set frequency[1], frequency[2], etc. all equal to 0. Then, read the words one at a time, one character at a time. While you read the characters of a word, you should count those characters, using an integer variable named something like wordLength. Once you have computed the length of a word, you can say frequency[wordLength] := frequency[wordLength]+1.

Watch out for one little quirk. When you reach eoln, the newline character is still in the stream of characters coming from the keyboard (or the file, if you have used "program < datafile" to present the dictionary to your program). You will want to get rid of that pending newline character so you can start reading the next line. To do this, just use "readln" with no parameters.

You may assume that no word is longer than 30 letters. You may also assume that no line of the dictionary file contains extraneous spaces.

You should probably create your own small dictionary for early testing of your program. Once things are working pretty well, you should try your program on the file /usr/dict/words, which contains about 25,000 words. Watch out, though. It includes a bunch of "words" that aren't really English words. All of the single letters are 1-letter words, for example, so you should see at least 26 1-letter words.

If you get something like the above working, then you might want to try displaying your data in histogram form. For example, if there are 2 1-letter words, 52 2-letter words, 514 3-letter words, 2011 4-letter words, 3275 5-letter words, etc., you could make one "x" count for 250 words, and produce a histogram like so:


1:2
2:52
3:514   xx
4:2011  xxxxxxxx
5:3275  xxxxxxxxxxxxx

etc.


We'll talk more about this program on Wednesday. Start early, keep in touch, and have fun.



Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057
(507) 663-4364, jondich@carleton.edu