Minibase Assignment #4: Sort-Merge Join

Assigned on Wednesday, May 22.
Due on Monday, June 3 at 5 PM.

Introduction

In this assignment, you will implement the sort-merge join algorithm. You should begin by reading the chapter on Implementation of Relational Operations, in particular, the section on Sort-Merge Join.


Getting Started

Create a directory for the assignment, and change to that directory. Copy the file /usr/local/mini_hwk/assign/SM_Join/src/Makefile to your current directory. To obtain the rest of the files that you will need, type make setup. The files are:

You can find other useful header files in /usr/local/mini_hwk/assign/SM_Join/include .


What You Have to Implement

class sortMerge {
public:

sortMerge(
char *filename1, // Name of heapfile for relation R.
int len_in1, // # of columns in R.
AttrType in1[], // Array containing field types of R.
short t1_str_sizes[], // Array containing size of columns in R.
int join_col_in1, // The join column number of R.
char *filename2, // Name of heapfile for relation S
int len_in2, // # of columns in S.
AttrType in2[], // Array containing field types of S.
short t2_str_sizes[], // Array containing size of columns in S.
int join_col_in2, // The join column number of S.
char* filename3, // Name of heapfile for merged results
int amt_of_mem, // Number of pages available for sorting
TupleOrder order, // Sorting order: Ascending or Descending
Status& s // Status of constructor
);

~sortMerge();
};

The sortMerge constructor joins two relations R and S, represented by the heapfiles filename1 and filename2, respectively, using the sort-merge join algorithm. Note that the columns for relation R (S) are numbered from 0 to len_in1 - 1 (len_in2 - 1). You are to concatenate each matching pair of records and write it into the heapfile filename3. The error layer for the sortMerge class is JOINS.

You will need to use the following classes which are given: Sort, HeapFile, and Scan. You will call the Sort constructor to sort a heapfile. To compare the join columns of two tuples, you will call the function tupleCmp (a commented version of this function is included at the end of SMJTester.C). tupleCmp should be passed two parameters, each of which is a pointer to strings to be compared. Once a scan is opened on a heapfile, the scan cursor can be positioned to any record within the heapfile calling the Scan method position with an RID argument. The next call to the Scan method getNext will proceed from the new cursor position.

You may assume that TupleOrder is always ascending.

Handing in Your Code

Use hsp to hand in the entire directory containing your code. Do not submit the individual files, but rather submit the directory as a whole. For example, if I am sitting in the directory above SM_Join, I would execute the following command:

hsp musicant CS395 SM_Join

Good luck!