CS 201: Data Structures (Winter 2018)

Lab4: Compare Hash Code Functions

1. Goals

To work with different hash code functions and gain a better understanding of how characteristics of the hash code relate to collisions.

2. Setting up

3. Hash Code Functions

Read through the four hash code functions that are provided, and make sure you understand what each one of them is doing. They get more complicated with number (i.e., hashCode0() is the simplest and hashCode3() is the most complicated). Change the comments to accurately reflect the methods.

Check your understanding by trying a few words to see whether they have the same hash codes for each function:

That is, in main, calculate the hash codes according to each function for each of these words, and write down in your text file how many collisions you get for each hash code function. Note that this is independent of the size of the hash table you're working with: any collisions would happen regardless of how many buckets (slot in the array) you use if the hash codes are the same.

Next, imagine you have a lot of words (not the four above). Look at each function and decide whether having 1,000 buckets versus 10,000 buckets would possibly affect the number of collisions if you used the compression function compressToSize(int,int). That is, if you gained additional buckets in the hash table, would your hash function actually assign things to those new buckets? Write your answer down for each hash code function.

4. Measuring Collisions

Write a method public static int[] collisionCounter(int numBuckets, String fileName, int hashCodeFunctionToUse) to count how many collisions you would get with a hash table of a given size if you stored each word in the file into a hash table of a certain size. We won't worry about resolving these collisions: we'll just count how many distinct words would go into the same bucket. To do this:

A skeleton implementation of this method that does the file I/O is included in your class already; add your code to that method to make it function properly.

Once you've done this, write three additional methods that compute:

Stubs for these methods are provided for you in the file.

You're now set up to investigate how well each hash code function performs based on the number of buckets available. To investigate, you'll vary three things:

Note that 1543 and 196613 are prime; the others are not.

Write code to perform your experiments in main, and use good style to avoid lots of duplicate code and make it easy to extend if you add more hash functions or array sizes. Write down your results in your text file, and talk to your partner about what they mean.

Note: enable1.txt is a Public Domain word list. You can read more about how it was made and by whom. pride.txt is from Project Gutenberg.

5. Extensions

This lab modified from one designed by Anna Rafferty.