Minibase Assignment #1: Buffer Manager

Introduction

In this assignment, you will have to implement a simple buffer management layer for the Minibase database system.

You can find the API for relevant files by clicking on the appropriate link for Java or C++.

The code for the underlying Disk Space Manager will be given to you via the class DBFile. Read carefully the API for the DBFile class, since you will be using this in building your buffer manager. Similarly, read carefully the API for BufferManager, since this is the class you will actually create.


Getting Started

Start by grabbing the Java or C++ zip files and extract to a directory that you have created. The project contains the following files, among others:


The Buffer Manager Interface

You will have to implement a buffer manager that uses the clock replacement policy. The buffer manager interface that you will implement allows a program from higher levels of the database system to:
  1. Allocate and deallocate pages on disk.
  2. Bring a disk page to the buffer pool and pin it.
  3. Unpin a page in the buffer pool.

The list of the specific methods that you need to implement are contained in the Java and C++ APIs.


Design Overview and Implementation Details

The buffer pool is a collection of frames (page-sized sequences of main memory bytes) that is managed by the Buffer Manager. It should be stored as an array bufferPool[] of Page objects. In addition, you should maintain an array frameTable[] of descriptors, one per frame. Each descriptor is a is an object that contains a page id, a filename, a pin count, and a dirty bit.

A simple hash table should be used to figure out what frame a given disk page occupies. Given a page id and an associated filename, the hash table should return which frame number contains that page. In principle, you should use some fancy techique to hash on both page id and filename. I'd recommend just concatenating the two together to make a screen, and hash on that (this is flawed on principle, but sufficient for this assignment). You may use your own hash table code if you wish, but I'd recommend using HashMap (Java) or hash_map (C++). If you're using hash_map on strings in C++, make sure that you follow this example that shows you how to define string equality correctly.

When a page is requested for pinning, the buffer manager should do the following: Check the buffer pool (by using the hash table) to see if it contains the requested page. If the page is not in the pool, it should be brought in as follows:

  1. Choose a frame for replacement, using the Clock replacement policy.
  2. If the frame chosen for replacement is dirty, flush it (i.e., write out the page that it contains to disk, using the DBFile appropriately).
  3. Read the requested page (again, by using DBFile) into the frame chosen for replacement; the pin count and dirty bit for the frame should be initialized to 0 and false, respectively.
  4. Delete the entry for the old page from the Buffer Manager's hash table and insert an entry for the new page. Also, update the entry for this frame in the frameTable array to reflect these changes.
  5. Pin the requested page by incrementing the pin_count in the descriptor for this frame and return a pointer to the page to the requester.

Error Reporting in the Code

Throw exceptions appropriately. The API describes what exceptions should be thrown when something goes wrong. If you are coding in C++, take a look at DBFile.h and DBFile.cpp to see how to throw exceptions within Minibase.


The Clock Replacement Policy

There are many ways of deciding which page to replace when a free frame is needed. Commonly used policies in operating systems are FIFO, MRU and LRU. Even though LRU is a commonly used policy, it has a high overhead and is not the best strategy to use in a number of common cases that occur in database systems (e.g., repeated sequential scans of the same file). Instead, many database systems use the Clock algorithm which approximates LRU behavior with less overhead.


Clock Replacement Diagram

The above figure illustrates the execution of the clock algorithm. Conceptually, all the frames in the buffer pool are arranged in a circle around the face of a clock. Associated with each frame is a referenced bit. Each time a page in the buffer pool is unpinned, the referenced bit of the corresponding frame is set to true. Whenever we need to choose a frame for replacement, the current frame pointer, or the "clock hand" (an integer whose value is between 0 and poolSize-1), is advanced, using modular arithmetic so that it does not go past poolSize-1. This corresponds to the clock hand moving around the face of the clock in a clockwise fashion. For each frame that the clock hand goes past that is unpinned, the referenced bit is examined and then cleared. If the bit had been set, the corresponding frame has been referenced "recently" and is not replaced. On the other hand, if the referenced bit is false, the page is selected for replacement. If the selected buffer frame is dirty (i.e. it has been modified), the page currently occupying the frame is written to disk.


Handing in Your Code

If you are working in a team, only one of you should submit the code. Make sure that both of your names appear in program documentation at the top.

If you are coding in C++, we will run purify on your code to make sure that that your program exhibits no memory leaks or other bad memory behavior. You should run purify on your code before turning it in to see if you have memory issues.

Zip up all of your code, and submit it via Moodle.

Good luck, have fun, and ask questions!