2. Create a second file, file2 of random integers that contains n/2 integers contained in file1 and n/2 new random integers.
3. The idea for this assignment is to search file1 for each of the integers in file2. Because of the way you created the files half of the key integers in file2 will be found and half will not. You will implement this search in two ways: first by sorting and binary searching and second by a hashing scheme of your choice. The goal is to see if there is a time difference between the two methods.
4. You will need to time the two algorithms. Look at timing.cpp
for some sample code on how to do this. Additionally, you will need to
keep track of various counts:
sort_comp // keeps track of total number of comparisons of keys needed to sort data
bs_comp_success // keeps track of total number of comparisons of keys for all successful searches
bs_comp_fail // keeps track of total number of comparisons of keys for all failed searches
hash_create_count // keeps track of total number of times hash function is called while creating hash table
hash_create_comp // keeps track of total number of comparison of keys needed while creating hash table
hash_create_next // keeps track of total number of times that the hashing scheme has to move to another slot while creating hash table
hash_count // keeps track of total number of times hash function is called while searching hash table
hash_comp // keeps track of total number of comparison of keys needed while searching hash table
hash_next_success // keeps track of total number of times that the hashing scheme has to move to another slot during successful searches
hash_next_fail // keeps track of total number of times that the hashing scheme has to move to another slot during failed searches
You can use these counts to compute various
averages. For example,
(sort_comp + bs_comp_success + bs_comp_fail)/n
measures the average cost (in comparisons) needed to sort and binary search the data.
(sum of last 7 counts)/n
measures the average cost (in comparisons and function calls) needed to create and search the hash table. Think of some other useful averages and report them. For example, you might want to compute separate averages for success and failure, or you might estimate how much more expensive a function call is than a comparison and use this to compute a weighted average. The rough outline of your code should be:
input file1;
start clock;
create sorted list S from file1;
// use efficient mergesort on linked list
// and then put result into an array
for each i in file2, use binary search to find
i in S or discover i not in S
stop clock;
start clock;
create hash table T from file1;
for each i in file2, use hash table search to
find i in T or discover i not in T
stop clock;
report times;
report all counts and averages