Minibase Assignment #1: Buffer Manager

Assigned on Friday, 9/19.
Due electronically on Wednesday, 10/1, at 4:45 PM.

Introduction

In this assignment, you will have to implement a simple buffer management layer (without support for concurrency control and recovery) for the Minibase database system. The code for the underlying Disk Space Manager will be given. Documentation for Minibase is available at http://www.cs.wisc.edu/coral/minibase/project.html. In particular, you should read the description of the DB class, which you will use extensively in this assignment.

Note: The actual Minibase buffer manager layer differs from the one that you will build in that the actual Minibase buffer manager contains methods to support concurrency control and recovery.


Getting Started

Create a directory for the assignment, and change to that directory. Copy all the files from /Accounts/courses/cs347/proj1/ to your current directory. The project contains the following files, among others:


The Buffer Manager Interface

You will have to implement a buffer manager that uses the clock replacement policy. The buffer manager interface that you will implement allows a program from higher levels of the database system to:
  1. Allocate and deallocate pages on disk.
  2. Bring a disk page to the buffer pool and pin it.
  3. Unpin a page in the buffer pool.

The methods that you have to implement are described below:

class BufMgr {
private:
// You can add as much stuff as you want here.

public:
Page* bufPool;
// The physical buffer pool of pages.
// This is made public just because we need to access it in
// the test driver. It could be private for real use.

BufMgr (int numbuf, Replacer *replacer = 0);
// The constructor allocates "numbuf" pages (frames) for the buffer
// pool in main memory. Do not worry about the second parameter
// ("replacer"). It can always be kept at its default value of 0.

~BufMgr ();
// The destructor should flush all dirty pages in the buffer pool to
// disk before shutting down. It should also deallocate the buffer
// pool in main memory.

Status pinPage (PageId PageId_in_a_DB, Page*& page,int emptyPage=0);
// Check if the page with the given ID is in the buffer pool. If it is,
// increment its pin_count and return a pointer to this page in the
// parameter "page". If the pin_count was 0 before the call, the page
// was a candidate for replacement, but is no longer a candidate.
// If the page is not in the buffer pool, choose a frame (from the set
// of replacement candidates) to hold this page, read the page (using
// the appropriate DB class method), pin it, and return a pointer to the
// page in the parameter "page".
// Must also write out the old page in the chosen frame if it is dirty
// before reading new page.
// You can assume that emptyPage == 0 for this assignment.
// This function, like all other functions in the project, return a
// Status code that indicates success or an error.

Status unpinPage (PageId PageId_in_a_DB, int dirty);
// Unpin the page in the buffer pool with the given ID.
// Should be called with dirty==TRUE if the client has
// modified the page. If so, this call should set the dirty bit
// for this frame. Furthermore, if pin_count>0, should decrement it.
// If pin_count==0 before this call, return an error.

Status newPage (PageId& firstPageId, Page*& firstpage, int howmany=1);
// Allocate a run of pages of size "howmany" on disk and pin the first
// page in memory in the buffer pool.
// Call DB object to allocate the pages only if there is an available
// frame for the first page.
// If the buffer is full, i.e., you cannot find a frame for the first
// page, return an error.

Status freePage (PageId pageId);
// This function should be called to delete a page on disk.
// This function must call the DB class to deallocate the page.

Status flushPage (int pageId);
// Used to flush a particular page of the buffer pool to disk.
// Should call the write_page method of the DB class.

Status flushAllPages ();
// Flush all pages of the buffer pool to disk.
}; //class BufMgr

Design Overview and Implementation Details

The buffer pool is a collection of frames (page-sized sequences of main memory bytes) that is managed by the Buffer Manager. It should be stored as an array bufPool[numbuf] of Page objects. In addition, you should maintain an array bufDescr[numbuf] of descriptors, one per frame. Each descriptor is a record with the following fields: page number, pin_count, dirtybit.

The pin_count field is an integer, page number is a PageId object, and dirtybit is a boolean. The descriptor for a frame describes the page that is stored in that frame. A page is identified by a page number that is generated by the DB class when the page is allocated, and is unique over all pages in the database. The PageId type is defined as an integer type in minirel.h.

A simple hash table should be used to figure out what frame a given disk page occupies. The hash table should be implemented (entirely in main memory) by using an array of pointers to lists of <page number, frame number> pairs. The array is called the directory and each list of pairs is called a bucket. Given a page number, you should apply a hash function to find the directory entry pointing to the bucket that contains the frame number for this page, if the page is in the buffer pool. If you search the bucket and don't find a pair containing this page number, the page is not in the pool. If you find such a pair, it will tell you the frame in which the page resides. This is illustrated in Figure 1:

The hash function should distribute values in the domain of the search field uniformly over the collection of buckets. If we have HTSIZE buckets, numbered 0 through M-1, a hash function h of the form


works well in practice. HTSIZE should be chosen to be a prime number.

You may alternatively use Standard Template Library (STL) to handle the hashing if you wish, and I would encourage you to do so. It's worth learning. Read up on hash_map and its associated iterators.

When a page is requested, the buffer manager should do the following: Check the buffer pool (by using the hash table) to see if it contains the requested page. If the page is not in the pool, it should be brought in as follows:

  1. Choose a frame for replacement, using the Clock replacement policy.
  2. If the frame chosen for replacement is dirty, flush it (i.e., write out the page that it contains to disk, using the appropriate DB class method).
  3. Read the requested page (again, by calling the DB class) into the frame chosen for replacement; the pin_count and dirtybit for the frame should be initialized to 0 and FALSE, respectively.
  4. Delete the entry for the old page from the Buffer Manager's hash table and insert an entry for the new page. Also, update the entry for this frame in the bufDescr array to reflect these changes.
  5. Pin the requested page by incrementing the pin_count in the descriptor for this frame and return a pointer to the page to the requester.

Error Reporting in the Code

Please follow the Minibase Error Protocol. An example illustrating the use of the error protocol is available in the file ErrProc.Sample. It covers much of what you need to know about the protocol. You can also look at new_error.h for more details.

The Clock Replacement Policy

There are many ways of deciding which page to replace when a free frame is needed. Commonly used policies in operating systems are FIFO, MRU and LRU. Even though LRU is a commonly used policy, it has a high overhead and is not the best strategy to use in a number of common cases that occur in database systems (e.g., repeated sequential scans of the same file). Instead, many database systems use the Clock algorithm which approximates LRU behavior with less overhead.


Figure 2: Clock Replacement

The above figure illustrates the execution of the clock algorithm. Conceptually, all the frames in the buffer pool are arranged in a circle around the face of a clock. Associated with each frame is a referenced bit. Each time a page in the buffer pool is unpinned, the referenced bit of the corresponding frame is set to true. Whenever we need to choose a frame for replacement, the current frame pointer, or the "clock hand" (an integer whose value is between 0 and numbuf-1), is advanced, using modular arithmetic so that it does not go past numbuf-1. This corresponds to the clock hand moving around the face of the clock in a clockwise fashion. For each frame that the clock hand goes past that is unpinned, the referenced bit is examined and then cleared. If the bit had been set, the corresponding frame has been referenced "recently" and is not replaced. On the other hand, if the referenced bit is false, the page is selected for replacement. If the selected buffer frame is dirty (i.e. it has been modified), the page currently occupying the frame is written to disk.


Handing in Your Code

If you are working in a team (and please do!), only one of you should submit the code. Make sure that both of your names appear in program documentation at the top.

I will run insure on your code to make sure that that your program exhibits no memory leaks or other bad memory behavior. You should run insure on your code before turning it in to see if you have memory issues.

Use hsp to hand in the entire directory containing your code. Do not submit the individual files, but rather submit the directory as a whole. For example, if I am sitting in the directory above proj1, I would execute the following command:

hsp musicant CS347 proj1

Good luck!