CS 257: Software Design

Books: command-line arguments and CSV

Folder name (in your git repository): books

I'll assign you a new random team on Friday, 1/8. That weekend, read the assigned articles, and think about features and command-line syntax before your Discussion Group #1 meeting the following week (Monday or Tuesday).

Goals

The data: books, authors, and comma-separated values

Take a look at books.csv, a file full of data about books and authors. There are only a couple dozen books in this dataset, so you wouldn't want to base an important book-related application on it. But for learning about how to manipulate datasets like this, a couple dozen books will be plenty.

The format of books.csv is known as comma-separated values (CSV). CSV is a very simple format used to store tables of textual data. Each line of text represents a row in the table, and the fields/columns in each row are separated by commas. These few lines illustrate the principle: "title,publication year,author"

Jane Eyre,1847,Charlotte Brontë (1816-1855) To Say Nothing of the Dog,1997,Connie Willis (1945-) The Stone Sky,2017,N.K. Jemisin (1972-)

The only thing that makes CSV at all tricky is when the data in one of the table cells contains either a comma or a newline character. For example, consider the novel "Right Ho, Jeeves" by P.G. Wodehouse. If you just comma-separate the fields, you get this:

Right Ho, Jeeves,1934,Pelham Grenville Wodehouse (1881-1975)

which will make software misinterpret " Jeeves" as the second column of this row, instead of the tail end of the first column. CSV solves this problem with quotation marks:

"Right Ho, Jeeves",1934,Pelham Grenville Wodehouse (1881-1975)

But of course now you have the question of what happens if the title of your book includes quotation marks. You should read up on how CSV handles these situations.

For this and possibly future assignments, you'll be using the books.csv file as your database. Your programs will read data from this file as needed to satisfy the requirements of the assignment. To do this, you'll use Python's csv module.

Command-line arguments in Python

Writing programs that use command-line arguments to determine their behavior is an important skill. In my day-to-day life as a programmer, I write a lot of short programs (and some long ones) to do all manner of tasks for me. Sometimes in very very short programs that I plan to run exactly once, I'll hard-code input values into the program. Those programs are often like: "Open file something.txt, read its contents, do something with the contents, and print out the results". In cases like this, I'll often just put the "something.txt" right in the code (also known as hard-coding the filename).

But even when I expect to run a program only once, I generally have to run it a few times during debugging, and then I often find that it's more useful than I thought, and I end up running it on multiple different input files, sometimes sorting the output one way, other times sorting the output another way, and so on. In such cases, I always wish I had taken one or two minutes to set up a sensible command-line argument syntax for the program.

For this assignment, you are going to write a command-line tool for extracting information from the books dataset. The assignment will have two phases. First, you will design a command-line interface for the tool. Then, after revising your design based on feedback from your discussion group, you will implement the resulting interface.

What features should your program have?

You can easily imagine many features appropriate for a command-line tool concerned with a books-and-authors dataset. We're going to restrict our work to the following features:

Your jobs

  1. Select one partner's cs257 git repository to work in. Make sure all partners are given push-access to the repository, and that all partners have a local clone.
  2. Create a folder called "books" at the top level of the repository. (Note the very top of this web page, where I have the notation "Folder name: books". For future assignments, you should use that "Folder name" indicator to tell you where to store your work for the assignment. This is how the grader and I will find your work.)
  3. In time for DG1 during Week 2 (Jan 11 or 12), prepare a first draft of your tool's command-line syntax, and write a short usage/help statement for the tool.
  4. By 11:59PM Tuesday, 1/12 (i.e. after DG1): Finalize your tool's command-line syntax and usage statement. Put your usage statement in books/usage.txt. Add, commit, and push this so I can see it in your repository.
  5. By 5:00PM Friday, 1/15: Implement your tool's features, making sure to adhere to the command-line syntax you described in usage.txt. (If you decide to change the command-line syntax after 1/12, then you should also change usage.txt.) Use argparse to implement your command-line interface, and in particular, make sure that your program prints a usage statement (e.g. by printing the contents of usage.txt) if the users request it (via a suitable help flag) or if what they typed doesn't satisfy your expected command-line syntax.

Implementing a command-line interface

There are two main approaches to implementing a command-line syntax: handle the command-line in its raw form (i.e. the list of strings sys.argv) or use a Python module designed to make command-line parsing easier.

For extremely simple programs, using sys.argv directly can be the easiest way to go. Here's a simple example of using sys.argv to parse command-line arguments.

For any program whose command line is going to have a little bit of complexity, it's usually better to use a module like argparse instead of using sys.argv directly. Here's a brief argparse example that you might find helpful: argparse_example.py.

There are many command-line-parsing modules for Python: argparse, getopt, docopt, optparse, click, etc. We're using argparse for this project because it comes standard with any installation of Python, and it is illustrative of the power (and sometimes the frustration) of using a module like this.

Constraints and suggestions

Rubric

1 usage.txt present 1 usage.txt makes sense 1 usage.txt supports all three main features required by the assignment, as well as a help feature 1 comment with author names at top of books.py 1 user can get help from the command line (--help or whatever) 3 all three required features work correctly 4 code organization quality, including quality of naming X/12

Start early, ask questions, and have fun!

(And don't forget Slack—our #questions channel is meant for you!)