CS 201: Data Structures (Winter 2017)
Lab05: Compare Hash Code Functions
1. Goals
To work with different hash code functions and gain a better understanding of how characteristics of the hash code relate to collisions.
2. Setting up
- Download lab05.zip.
-
You will be working with
HashCodes.java
. - Open an empty text file, and use it to write all your notes from this lab. Email this file and the code you write to your partner so you both have access and can use it for HW13.
3. Hash Code Functions
Read through the four hash code functions that are provided, and make sure
you understand what each one of them is doing. They get more complicated with
number (i.e., hashCode0()
is the simplest and
hashCode3()
is the most complicated). Change the comments to
accurately reflect the methods.
Check your understanding by trying a few words to see whether they have the same hash codes for each function:
- ham
- pea
- pie
- tofu
That is, in main
, calculate the hash codes according to each
function for each of these words, and write down in your text file how many
collisions you get for each hash code function. Note that this is independent
of the size of the hash table you're working with: any collisions would happen
regardless of how many buckets (slot in the array) you use if the hash
codes are the same.
Next, imagine you have a lot of words (not the four above). Look at each
function and decide whether having 1,000 buckets versus 10,000 buckets would
possibly affect the number of collisions if you used the compression function
compressToSize(int,int)
. That is, if you gained additional buckets
in the hash table, would your hash function actually assign things to those new
buckets? Write your answer down for each hash code function.
4. Measuring Collisions
Write a method public static int[] collisionCounter(int numBuckets,
String fileName, int hashCodeFunctionToUse)
to count how many collisions you
would get with a hash table of a given size if you stored each word in the file
into a hash table of a certain size. We won't worry about resolving these
collisions: we'll just count how many distinct words would go into the same
bucket. To do this:
-
First, create an array with length
numBuckets
. This will store the count of how many distinct words go in each bucket. Each bucket should start off storing 0 words. -
Create a
Set
to store whether you've seen a word previously: we don't want to count the same word multiple times. Use the built-in JavaSet
with theHashSet
implementation. (As you might imagine, this is aSet
that uses hashing as an implementation, but that's somewhat irrelevant here.) -
Read through the file, delimiting on white space. For each
String
, check if you've seen it previously. If not, calculate its hash code using whichever function is indicated byhashCodeFunctionToUse
(i.e., ifhashCodeFunctionToUse = 0
, then you should usehashCode0
). Add 1 to the count for the bucket where this word would go, based on the hash code and the compression function (compressToSize(int,int)
). Add the word to yourSet
to indicate that you've now seen it. - At the end, return your array with the counts.
A skeleton implementation of this method that does the file I/O is included in your class already; add your code to that method to make it function properly.
Once you've done this, write three additional methods that compute:
- the proportion of buckets that have no words in them,
- the maximum number of words stored in a single bucket, and
- the average number of words per non-empty bucket.
You're now set up to investigate how well each hash code function performs based on the number of buckets available. To investigate, you'll vary three things:
- the text file you're counting:
words.txt
andpride.txt
; - the hash function you're using; and
- the length of your array: 1048, 1543, 2048, 196613, and 200000.
Write code to perform your experiments in main
, and use good
style to avoid lots of duplicate code and make it easy to extend if you add
more hash functions or array sizes. Write down your results in your text file,
and talk to your partner about what they mean.
5. Extensions
-
Write your own hash code function for
Strings
. Try to get better performance than for the four you were given in this lab. You probably shouldn't expect to beat the built-in hash code function (but if you do, I'd love to hear about it!). -
Discuss with your partner: Does using the memory address of a
String
make sense as a hash function? Why or why not? -
Write a hash code function for a
List
of integers orString
s.
This lab modified from one designed by Anna Rafferty.