CS 334 Final Assignment

This is a team assignment, if you wish it to be.

We've briefly mentioned that in recent years, a number of new database systems have arisen to challenge the dominance of the relational model. They're loosely coupled under the category "NoSQL", which is a vague name that is generally used to mean "something other than relational." For this assignment, you will choose a NoSQL database system and write a paper about it comparing and contrasting it with relational database systems. (Your paper should be a minimum of 5 pages, single-spaced, 1 inch margins, page count does not include references.)

Choices

There are a large number of systems out there to choose from. To help narrow things down some, here are some major categories of NoSQL database systems. You can click the link on each to see which ones are the most popular. Note that we'll be spending a day in class on MongoDB in particular. If you choose MongoDB, I will be expecting you to research content well above and beyond what happens in class.

Key-value stores

The starting idea for a key-value store is essentially a permanent version of a hash table; given a key, be able to look up a stored value. Useful for quickly looking up images, objects, and so on.

Wide column stores

These are designed for storing sparse matrices, where it would make sense for a row and a column together to serve as a key. Useful for results from web crawlers, recommendation systems, and other forms of sparse data.

Graph databases

These are database systems designed to hold graphs, i.e. databases of data and connections between them. Useful for a variety of kinds of social networks (including co-authorship, fraud detection, etc.)

Document store

These resemble key-value stores, but the values themselves have a hierarchy and structure of their own. Useful for web content, publishing, document search, etc.

Native XML databases

These are systems that store the data directly in XML as opposed to some other internal format, which has signficant advantages regarding interoperability with other systems. You can use the large set of XML tools available to operate directly on the data, in addition to what the database does for you.

What the paper should contain

I want you to show me that you can take everything that you've learned about database systems this term, and apply that knowledge to learning about a new one. In my mind, the perfect paper would be one that effectively repeats the content of our course, but does so instead in the context of a different database system. Specifically, here are questions I would like to see answered within your paper:

  • Why does this database system exist? What does it try to do differently from a relational database system?
  • How does its fundamental data model differ from the relational model? (We did some contrasting with the hierarchical and network models early on as one example on how to do this.)
  • If your database system has a query language and if it is different from SQL, how is it different? Why?
  • How does the database system work internally? How is data stored? Does it index? How so? What sorts of algorithms dominate?
  • How are queries to the database system evaluated and optimized?
  • Does the database system support transactions? How is it different?

The above is likely more than you can cover in your paper in the time that you have. I would therefore like you to be able to say at least something in passing about all of the above topics, but your paper should focus in depth on how the database works internally, and how that differs from a relational system. Two pages (or more) of your paper should be specifically dedicated to discussion of how the internals of the system work.

Optional video component

If you wish, you may install the database system yourself, and experiment with it. You can import some data, write some queries and/or other code, and interact with it. If you go this route, you would submit a video that you make showing narrating your use of the system, and describing what you're doing. This portion of the project, if you wish, could be used to replace portions of the paper that would cover the user side of the database, such as its data model, query language, how you interact with it, and so on. The video portion would not replace the discussion of the database internals, which you would cover in the paper. If you submit a video component, it should be five minutes long, and then you would only need to 3 pages in your paper instead of 5 pages.

Use Jing to make your video. It is simple to use, and limits your video to 5 minutes, which is perfect. Jing makes it easy to publish your video at screencast.com; you should do that, and include a link to your video in your paper.

References

Your paper should have at least three references. Wikipedia is likely a poor choice. You are welcome to take a look at Wikipedia to get yourself oriented, and possibly to find more sources. That said, my own impression from a quick sampling of the Wikipedia articles for some of these database systems is that they are very high level and lacking in precision.

Deadlines

  • On the last day of class, you should turn in to me a list of the references that you intend to use. I will get you feedback within 24 hours. If you want feedback sooner than that, I'm happy to do so; submit your references sooner, and I'll respond within 24 hours from whenever you submit them.
  • The paper itself, and optional video, are due at the end of the last final exam. The paper should be submit via Moodle, and the video should be submitted via Jing to screencast.com. Include the link to the screencast in your paper.

Good luck, and have fun!

Org version 7.9.3f with Emacs version 24

Validate XHTML 1.0