CS 201: Data Structures (Spring 2017)
HW12: Hashing
Due: Wednesday, 05/31 at 17:00 (Note unusual due time by college regulation.)
1. Goals
To get a better understanding of hashing.
2. Setup
This is a solo written assignment.
This assignment is primarily a problem set, and should be completed individually. You are welcome to talk with others about the problems, but you must write up your assignment by yourself. If you talk with others about the problems, please note their names at the top of your homework submission.
The problems reference work you did during an in-class lab. You may use any notes from that session, or even turn in parts of your notes from the lab verbatim. You will also need your code from the lab as you will turn it in. You should note your and your partner's name at the top of the lab; if you missed class, you should do the lab on your own. Other than that, you should not use any notes from talks with others while writing up your assignment. If you need notes, it likely means you still don't quite understand how to do the problem. Go back over your notes, ask more questions, or try to solve a similar problem on your own. Then come back a day or so later and try to write up your solution without your notes.
As always, you will submit your work on Moodle. This means that you will need to create a single PDF of your solutions to the written exercises. You have a few options for this:
- Write it by hand and scan it into a PDF. Please make sure it is legible. The library has a scanner. You can use "print to PDF" to turn multiple scans into one PDF file.
- Write it in a word processing program and export or print it to PDF.
- Write it in LaTeX and typeset it as a PDF.
Remember, the goal of your writeup is to communicate your understanding to us, so we can't accept writeups that we cannot read (whether that's due to handwriting, image quality, or file format issues).
3. The assignment
In class, you worked on a lab about hash codes. Make sure you have your notes/code from that lab before starting these questions.
-
If you didn't finish the lab in class, finish it now. Give results for the
number of empty buckets and the average number of items per non-empty bucket
when using 196613 and 200000 buckets with each of the five hash functions (including Java's built-in one)
from both
enable1.txt
andpride.txt
, and explain which hash function you would prefer to use (and why) based on these results. You're welcome to supplement your results with any thing else you measured about collisions in the lab: use your evidence-based reasoning skills! -
Briefly, explain why
hashCode2(String)
from the in-class lab isn't a very good hash function for English words. Give a couple of distinct reasons. - Imagine I decide to have a hash table with 264 buckets. My hash code function maps to each bucket based on the first four letters of the word (e.g., aaaa has its own bucket, aaab has its own bucket, and so on). Explain why this hash function may be inefficient for storing English words.
-
I have a
ZooAnimal
class that has instance variables for the animal's name, species, year of birth, and a path to an image of the animal. Propose a way to implementhashCode()
forZooAnimal
s such that twoZooAnimal
s that are equal (based on homework 7) will return the same hash code andZooAnimal
s that are not equal will tend not to return the same hash code (i.e., just returning the same code for allZooAnimal
s is not a correct answer). You may use any methods you want. -
Imagine we have seven buckets in our table (i.e., our array is of
length 7). We've decided to handle collision resolution via linear probing.
Our hash code takes the first letter of a word and turns it into an integer
based on its position in the alphabet such that
a=1,b=2,c=3,...,z=26
. For compression, we just use modulo the array length.-
Draw the array after adding the following items, in the order below:
group set ring lattice field
-
What indices in the array do I examine if I'm trying to find out if the
hash table contains the word magma? I examine an index if I
access that index (e.g., if I say
array[i]
at some point, then I've examined indexi
).
-
Draw the array after adding the following items, in the order below:
-
Now imagine we have ten buckets in our hash table, and we do collision
resolution using quadratic probing. We use the standard compression function
of the modulus of the array length. We'll be storing integers in our array,
and the hash code function returns the integer mod 10: that is, the
remainder when the integer is divided by 10. Draw the array after adding the
following items, in the order below; assume the hash table does not resize
during these additions:
38 52 47 19 78 67 71 24
-
When implementing a Map using hashing, we usually assume that the object to be stored
has a
hashCode
function and that the compression function is part of the Map implementation. Why don't we have both these functions in the same class: i.e., why wouldn't it make sense for the Map implementation to include code for both the hash code and compression functions, or for each object to implement a hash code function and make sure that hash code returns a value in the correct range? (Make sure you address both cases.) - Extra practice: You do not need to turn these in, but I encourage you to try the self test questions in your textbook related to hashing, especially those related to collision resolution using separate chaining and the different open addressing methods (linear probing, quadratic probing, and double hashing).
4. Submission and grading
Submit to Moodle azip
archive containing:
- a PDF file with your solutions, and
- completed
HashCodes.java
that you used to answer the questions about the lab.
Remember that you must write up your answers alone, without looking at notes from collaborations with others, and you should include the names of any collaborators at the top of your writeup. If you didn't talk with anyone about the problem set, please note that (e.g., "Worked alone").
Grading
- Answers in writeup and clarity of writeup [15 points]
This assignment modified from one designed by Anna Rafferty.
Start early, have fun, and discuss questions on Moodle.