Folder name (in your git repository): books
I assigned you a new random partner Tuesday, 9/21 (check Slack #announcements).
This project involves several due dates for subtasks, so a .
Take a look at books1.csv, a file full of data about books and authors. There are only a few dozen books in this dataset, so you wouldn't want to base an important book-related application on it. But for learning about how to manipulate datasets like this, a couple dozen books will be plenty.
Note, by the way, that when you look at a CSV file on github.com, the GitHub user interface renders the file in pretty columns, but the file itself is just text and doesn't look pretty when you open it with vim or Atom or whatever. To get your hands on files that are stored in my GitHub repo for this class, you should just clone it:
Back to books1.csv. The file's format is known as comma-separated values (CSV). CSV is a very simple format used to store tables of textual data. Each line of text represents a row in the table, and the fields/columns in each row are separated by commas. These few lines illustrate the principle: "title,publication year,author"
The only thing that makes CSV at all tricky is when the data in one of the table cells contains either a comma or a newline character. For example, consider the novel "Right Ho, Jeeves" by P.G. Wodehouse. If you just comma-separate the fields, you get this:
which will make software misinterpret " Jeeves" as the second column of this row, instead of the tail end of the first column. CSV solves this problem with quotation marks:
But of course now you have the question of what happens if the title of your book includes quotation marks. You should read up on how CSV handles these situations.
For this and possibly future assignments, you'll be using the books1.csv file as your database. Your programs will read data from this file as needed to satisfy the requirements of the assignment. To do this, you'll use Python's csv module.
Writing programs that use command-line arguments to determine their behavior is an important skill. In my day-to-day life as a programmer, I write a lot of short programs (and some long ones) to do all manner of tasks for me. Sometimes in very very short programs that I plan to run exactly once, I'll hard-code input values into the program. Those programs are often like: "Open file something.txt, read its contents, do something with the contents, and print out the results". In cases like this, I'll often just put the "something.txt" right in the code (also known as hard-coding the filename).
But even when I expect to run a program only once, I generally have to run it a few times during debugging, and then I often find that it's more useful than I thought, and I end up running it on multiple different input files, sometimes sorting the output one way, other times sorting the output another way, and so on. In such cases, I always wish I had taken one or two minutes to set up a sensible command-line argument syntax for the program.
For this assignment, you are going to write a command-line tool for extracting information from the books dataset. The assignment will have three phases. First, you will design a command-line interface for the tool. Second, you'll Then, after revising your design based on feedback from discussion group, you will implement the resulting interface.
due 11:59PM Thursday, September 23
You can easily imagine many features appropriate for a command-line tool concerned with a books-and-authors dataset. Since this project is less about the utility of the final product than about the techniques we use to create it, we're going to restrict this program to the following features:
Here's what you need to do for this task:
You can use the standard Unix manual pages as a model for how to write a command-line syntax synopsis and a usage statement. Take a look at "man mv", etc.
At the beginning of class on Friday, I will provide feedback on a handful of representative command-line designs and usage statements, after which you should revise your usage.txt before doing Task #3.
due 11:59PM Monday, September 27
One purpose of this multi-task project is to give you an introduction to test-driven development (TDD). Roughly, the process goes like this:
For us, the class in question will be called BooksDataSource, and its purpose will be to provide Python programmers with convenient access to the data in our books dataset.
The trick to writing good unit test suites is to think deeply about the many ways your interfaces might be called. Your tests should, for example, test typical cases, weird cases, and illegal cases. (For a really simple example, a unit test suite for a square-root function ought to include attempts to compute the square-roots of positive integers, positive non-integers, negative numbers, and zero, and depending on the language and the completeness of the interface specification, maybe the square-root of "moose" or other non-numerical input.) You should think hard about the mistakes programmers can make, the bad data users can generate, and the ways malicious programmers might try to exploit errors or omissions in your code.
For Task #2, your jobs are:
Grading rubric for Task #2:
due 11:59PM Saturday, October 2
Time to write the program itself!
Grading rubric for Task #3:
In 3-4 separate sessions on October 5 and 6. I'll announce the dates and locations on our Slack #announcement channel a few days ahead of time.
Here are the instructions for preparing for the code review.
Grading rubric for Task #4:
due 11:59PM Monday, October 11
Here are the instructions for preparing a revision of books.py.
Grading rubric for Task #5:
There are two main approaches to implementing a command-line syntax: handle the command-line in its raw form (i.e. the list of strings sys.argv) or use a Python module designed to make command-line parsing easier.
For extremely simple programs, using sys.argv directly can be the easiest way to go. Here's a simple example of using sys.argv to parse command-line arguments.
For any program whose command line is going to have a little bit of complexity, it's usually better to use a module like argparse instead of using sys.argv directly. Here's a brief argparse example that you might find helpful: argparse_example.py.
There are many command-line-parsing modules for Python: argparse, getopt, docopt, optparse, click, etc. I'm suggesting argparse for this project because it comes standard with any installation of Python, and it is illustrative of the power (and sometimes the frustration) of using a module like this.
(And don't forget Slack—our #questions channel is meant for you!)