CS 127, Data Structures

Project 7: Experimental CS: Benchmarking Hashing vs. Sorting & Binary Searching

Due Monday, 11/6/00, at 5 PM.

Overview

For this project, you will run an experiment to compare the effectiveness of hashing, vs. the effectiveness of sorting and binary searching. Does the theoretical analysis hold up? We'll see!

Details

1. Create a file, file1, of n random integers.

2. Create a second file, file2 of random integers that contains n/2 integers contained in file1 and n/2 new random integers.

3. The idea for this assignment is to search file1 for each of the integers in file2. Because of the way you created the files half of the key integers in file2 will be found and half will not. You will implement this search in two ways: first by sorting and binary searching and second by a hashing scheme of your choice. The goal is to see if there is a time difference between the two methods.

4. You will need to time the two algorithms. Look at timing.cpp for some sample code on how to do this. Additionally, you will need to keep track of various counts:
 

sort_comp // keeps track of total number of comparisons of keys needed to sort data
bs_comp_success // keeps track of total number of comparisons of keys for all successful searches
bs_comp_fail // keeps track of total number of comparisons of keys for all failed searches
hash_create_count // keeps track of total number of times hash function is called while creating hash table
hash_create_comp // keeps track of total number of comparison of keys needed while creating hash table
hash_create_next // keeps track of total number of times that the hashing scheme has to move to another slot while creating hash table
hash_count // keeps track of total number of times hash function is called while searching hash table
hash_comp // keeps track of total number of comparison of keys needed while searching hash table
hash_next_success // keeps track of total number of times that the hashing scheme has to move to another slot during successful searches
hash_next_fail // keeps track of total number of times that the hashing scheme has to move to another slot during failed searches


     You can use these counts to compute various averages. For example,

                          (sort_comp + bs_comp_success + bs_comp_fail)/n

     measures the average cost (in comparisons) needed to sort and binary search the data.

                                (sum of last 7 counts)/n

     measures the average cost (in comparisons and function calls) needed to create and search the hash table. Think of some other useful averages and report them. For example, you might want to compute separate averages for success and failure, or you might estimate how much more expensive a function call is than a comparison and use this to compute a weighted average. The rough outline of your code should be:

  input file1;

  start clock;
    create sorted list S from file1;   // use efficient mergesort on linked list
                                       // and then put result into an array
    for each i in file2, use binary search to find i in S or discover i not in S
  stop clock;

  start clock;
    create hash table T from file1;
    for each i in file2, use hash table search to find i in T or discover i not in T
  stop clock;

  report times;
  report all counts and averages
 

Submissions

Hand in electronically your search code (in a program called benchmark.cpp), not your integer files or the program that created them. Also turn in a text file called report.txt reporting on the results you got for various n. This report should describe the hashing scheme you used, the times, and an analysis of your average counts.