2. Create a second file, file2 of random integers that contains n/2 integers contained in file1 and n/2 new random integers.
3. The idea for this assignment is to search file1 for each of the integers in file2. Because of the way you created the files half of the key integers in file2 will be found and half will not. You will implement this search in three ways:
4. You will need to time the two algorithms. Look at timing.cpp for some sample code on how to do this. Additionally, you will need to keep track of various counts that reflect the amount of work that the scheme had to do. For example, for the binary search::
sort_comp // keeps track of total number of comparisons of keys needed to sort dataLikewise, for hashing in an array, you need to keep track of
bs_comp_success // keeps track of total number of comparisons of keys for all successful searches
bs_comp_fail // keeps track of total number of comparisons of keys for all failed searches
hash_create_count // keeps track of total number of times hash function is called while creating hash tableYou should modify these as you see fit, as well as keep track of the appropriate information for hashing by chaining.
hash_create_comp // keeps track of total number of comparison of keys needed while creating hash table
hash_create_next // keeps track of total number of times that the hashing scheme has to move to another slot while creating hash table
hash_count // keeps track of total number of times hash function is called while searching hash table
hash_comp // keeps track of total number of comparison of keys needed while searching hash table
hash_next_success // keeps track of total number of times that the hashing scheme has to move to another slot during successful searches
hash_next_fail // keeps track of total number of times that the hashing scheme has to move to another slot during failed searches
You can use these counts to compute various averages. For example,
(sort_comp + bs_comp_success + bs_comp_fail)/n
measures the average cost (in comparisons) needed to sort and binary search the data.
(sum of last 7 counts)/n
measures the average cost (in comparisons and function calls) needed to create and search the hash table. Think of some other useful averages and report them. For example, you might want to compute separate averages for success and failure, or you might estimate how much more expensive a function call is than a comparison and use this to compute a weighted average. The rough outline of your code should be:
input file1;
start clock;
create sorted list S from file1;
// use efficient mergesort on linked list
// and then put result into an array
for each i in file2, use binary search to find
i in S or discover i not in S
stop clock;
start clock;
create array based hash table T from file1;
for each i in file2, use hash table search to
find i in T or discover i not in T
stop clock;
start clock;
create chaining based hash table T from file1;
for each i in file2, use hash table search to
find i in T or discover i not in T
stop clock;
report times;
report all counts and averages
You should make sure in each of your examples that you choose n large enough so that you can get good experimental results. Comparing two programs that run in 0.05 seconds and 0.07 seconds and concluding that the first one is faster is not good experimental technique - anything could have contributed to the difference! This does not necessarily mean that you should make n as large as possible. A moderately sized n with multiple trials on more than one random data set can be revealing.
Keep n moderate in size so that we don't overburden the servers as you run your experiments. Make sure to keep n small while you're testing your program.