Minibase Assignment #3: Index Nested Loops Join
Assigned on Wednesday, May 15.
Due on Wednesday, May 22 at 5 PM.
Introduction
In this assignment, you will implement the index nested loops join algorithm
for equijoins. You will be given libraries for the lower layers (Disk
Space Manager, Buffer Manager, Heap File, and B+ Tree), and some driver routines
to test the code.
You should begin by reading Chapter 12 of the textbook, particularly the
description of Index Nested Loops Join in Section 12.5.1.
Getting Started
Create a directory for the assignment, and change to that directory. Copy
the file /usr/local/mini_hwk/assign/INL_Join/src/Makefile to your
current directory. To obtain the rest of the files that you will need,
type make setup. The files are:
- Makefile: A Makefile to compile your project.
- index_nl.h: This file contains the specification for the class
index_nested_loop, which you have to implement. The implementation should
be in a file that is called index_nl.C. You are free to add functions,
data members, or type definitions to this class.
- main.C, test_driver.C, INLJTester.C: Test programs. You are
given four test cases in INLJTester.C. We will grade your code
based on these four test cases plus three additional test cases.
test_driver.C and main.C contain driver code that runs the
tests.
- correct_output: Output of a correct implementation.
You can find other useful header files in /usr/local/mini_hwk/assign/INL_Join/include
.
What You Have to Implement
class index_nested_loop{
public:
index_nested_loop(
char* filename1, // Name of heap file for relation R
int len_in1, // # of columns in R
AttrType in1[], // Array containing field types of R
short t1_str_sizes[], // Array containing size of columns in R
int join_col_in1, // The join column of R
char* filename2, // Name of heap file for relation S
int len_in2, // # of columns in S
AttrType in2[], // Array containing field types of S
short t2_str_sizes[], // Array containing size of columns in S
int join_col_in2, // The join column of S
char* filename3, // Name of heap file for join results
IndexType in_type, // Index type for inner relation (S)
Status& s // Status of constructor
);
~index_nested_loop();
};
The index_nested_loop constructor joins two relations R and S, represented
by the heap files filename1 and filename2, respectively,
using the index nested loops join algorithm. The columns for relations R
(S) are numbered from 0 to len_in1-1 (len_in2-1). You should
concatenate each matching pair of records and write the concatenated records
into the heap file filename3.
You will need to use the following classes, which are given: HeapFile,
Scan, BTreeFile and BTreeFileScan. You will call BTreeFile methods to build
a B+ tree index for the inner heap file S. Then you scan the outer heap
file R; for each tuple in the outer heap file R, retrieve the matching tuples
in S by scanning the B+ tree on the inner heap file S.
You can assume that the value of the input parameter in_type
will always be B_Index.
The parameter s is a return parameter. If all goes well, the
constructor will finish and return s==OK. If an error occurs,
you should exit the constructor and return an error code in s.
The index_nested_loop destructor should destroy the B+ tree index that
you created for S, in addition to the usual clean up of local and temporary
state.
You are expected to handle variable length keys in this assignment. You
do not need to worry about alignment (if you do not understand what alignment
is, you can safely ignore the rest of this paragraph). You may assume that
the length of all columns will be a multiple of four, so that all columns
are aligned on a four-byte boundary.
Notes on Some of the Functions You Will Use
- HeapFile::insertRecord(char* recPtr, int recLen, RID& outRid)
inserts the record of length recLen pointed to by recPtr
into the heap file and passes back the newly allocated RID of the record
in outRid.
- HeapFile::getRecord(const RID& rid, char* recPtr, int&
recLen) takes in a RID, rid, and passes back the record associated
with rid in the character array pointed to by recPtr. It
also passes back the length of the record in recLen. The caller
is responsible for allocating memory for recPtr.
- Scan::getNext(RID& rid, char* recPtr, int& recLen)
retrieves the next record in a sequential scan of a heap file. Passes
back the RID of the next record in rid, copies the actual record
into the array pointed to by recPtr, and passes back the length of
the record in recLen. The caller is responsible for allocating memory
for recPtr.
- BTreeFile::insert(const void* key, const RID rid) inserts the
pair (key,rid) into a B+ tree.
- BTreeFile::new_scan(const void* lo_key, const void* hi_key)
creates a BTreeFileScan object to retrieve RIDs from a B+ tree
and returns a pointer to this object. De-allocating this object is the responsibility
of the caller.
- IndexFileScan::get_next(RID& rid, void* keyptr) retrieves
the next (key,rid) pair from an index file scan. Passes back the
RID in rid and the key in the memory area pointed to by keyptr
. Allocating memory for keyptr is the responsibility of the caller.
For this assignment, the index file scan we will be using is a BTreeFileScan
.
Error Reporting in the Code
As in previous project assignments, you should follow the
Minibase Error Protocol
. The subsystem code for the index_nested_loop class is JOINS.
Handing in Your Code
Handing in Your Code
Use hsp to hand in the entire directory containing your code. Do not submit
the individual files, but rather submit the directory as a whole. For
example, if I am sitting in the directory above INL_Join, I would execute
the following command:
hsp musicant CS395 INL_Join
Good luck!