Minibase Assignment #3: Nested Loops Join


Introduction

In this assignment, you will implement the "tuple nested loops" and "block nested loops" join algorithms for equijoins. You will be given libraries for the lower layers (Disk Space Manager, Buffer Manager, Heap File), and some driver routines to test the code.

You may find the description of join algorithms in Section 14.4.1 useful.

You can find the API for relevant files by clicking on the appropriate link for Java or C++.


Getting Started

Start by grabbing the Java or C++ zip files and extract to a directory that you have created. The project contains the following files, among others:


Additional Tools That You Have

Make sure to read the Minibase API about the HeapFile, PageIterator, and RecordIterator classes. You should use these classes for iterating over the data in the heap files, rather then writing your own code to do so (unless you have lots of extra time).


Structure of Data Records

A data record is, as usual, an array of bytes (Java) or chars (C++). This record contains multiple fields. The number of fields is indicated by numCols. Each field might have different length, which is indicated by the colSizes array.

Suppose a heap file with the name "infile" stores students' records. Suppose that the format of the student record is as follows:

int sid
byte name[10] (or char name[10])
int age

Then when calling the tupleNestedLoop method, if this is to be the first relation in the join, we set filename="infile", numCols1=3,and colSizes1={4,10,4}.


Internal Design

When carrying out the tuple nested loops algorithm, you should use a nested loop on a RecordIterator object on each heap file.

When carrying out the block nested loops algorithm, you should use a PageIterator to retrieve a page at a time for each relation. Given a page in memory for each relation, use the HFPage record retrieval methods to compare all of the records for the two pages in memory.


Error Reporting in the Code

Throw exceptions appropriately. The API describes what exceptions should be thrown when something goes wrong. If you are coding in C++, take a look at DBFile.h and DBFile.cpp to see how to throw exceptions within Minibase.


Handing in Your Code

If you are working in a team, only one of you should submit the code. Make sure that both of your names appear in program documentation at the top.

If you are coding in C++, we will run purify on your code to make sure that that your program exhibits no memory leaks or other bad memory behavior. You should run purify on your code before turning it in to see if you have memory issues.

Zip up all of your code, and submit it via Moodle.

Good luck, have fun, and ask questions!