CS 201: Data Structures (Spring 2018)

HW11: Cloud dreamer

Due: Friday, 05/25 at 22:00

1. Goals

To implement a tree-based structure that can function similarly to a map.

2. Setup

This is a pair programming assignment. You will continue with your same partner (unless I have told you otherwise).

Download the starter files and unzip.

3. Introduction

Word clouds, also known as "tag clouds," provide an interesting view of the words used in a document. Here, for example, is a word cloud based on the text of Pride and Prejudice:

The key feature of a word cloud is that words are displayed in a size proportional to the number of times they are used in the text on which the cloud is based. For example, Elizabeth Bennet and Mr. Darcy are quite central to Pride and Prejudice, so these words appear large, as expected.

Very common words, also known as stopwords, are typically not included in the cloud. Otherwise, all English word clouds would be dominated by "the", "and", "a", "in", etc.

For this assignment, you will implement a search tree, with which you will count the non-stopwords that appear in the text file of your choice.

4. Specification

Your program will come in three main pieces:

CloudDreamer specifics

Your CloudDreamer's main method will parse the command-line arguments to support the following three ways of running the program:

TreeWordCountMap specifics

Inside your TreeWordCountMap class, you will have two small nested classes to store (word, count) pairs.

The weird Node above might give you a clue that, instead of implementing a binary search tree, whose code is in your textbook, you will implement a slight variant.

You may recognize that this is similar to a 2-3 tree. However, making the tree balanced is optional.

Finally, your TreeWordCountMap class must include the following public methods:

/** If the specified word is already in this map, then its count is
    increased by one. Otherwise, the word is added to this map with a count
    of 1.
*/
public void increment(String word);

/** Returns true if word is contained in this map. */
public boolean contains(String word);

/** Returns a list of WordCount objects, one per word in this map,
    sorted alphabetically by word.
*/
public List<WordCount> getWordCountsByWord();

/** Returns a list of WordCount objects, one per word in this map,
    sorted in decreasing order by count.
*/
public List<WordCount> getWordCountsByCount();

Make one of your getWordCountsBy* methods O(n). The other can be slower.

Note that the grader and I might test your code using our own main programs, and thus you must adhere to the specifications for WordCount, increment, contains, getWordCountsByWord, and getWordCountsByCount.

(Bail out) If you have trouble getting all this to work, implement a standard binary search tree instead so you can get points for the CloudDreamer part of the assignment (and partial credit for TreeWordCountMap). If you do this, take special care to not be copying code from any source, including the textbook.

5. Code notes

6. Submission and grading

Submit to Moodle a zip archive containing: If you modified WordCloudMaker.java, submit that, too.

Follow the previously established guidelines.

Assignment requirements

This is a partial list of the things that we'll be looking for when evaluating your work (in addition to style):

Grading

This assignment modified from one designed by Sherri Goings and Jeff Ondich.

Start early, have fun, and discuss questions on Moodle.