CS 257 Assignment

I updated this assignment on Monday, 9/21 in the afternoon. It was confusing to me (let alone to anybody else) as originally written. Changes are in red.

I'll assign you a new random team on Friday, 9/18. This weekend, read the assigned article, and think about features and command-line syntax before your Discussion Group #1 meeting next week (Monday or Tuesday). You can certainly start coding, but recognize that we won't finalize your command-line syntax until your DG#1 meeting.

Goals

Learn how to work with command-line arguments in Python
Learn the basics of reading comma-separated values files using Python's csv module
Think about command-line interface design issues.

The data: books, authors, and comma-separated values

Take a look at books.csv, a file full of data about books and authors. There are only a couple dozen books in this dataset, so you wouldn't want to base an important book-related application on it. But for learning about how to manipulate datasets like this, a couple dozen books will be plenty.

The format of books.csv is known as comma-separated values (CSV). CSV is a very simple format used to store tables of textual data. Each line of text represents a row in the table, and the fields/columns in each row are separated by commas. These few lines illustrate the principle: "title,publication year,author"

Jane Eyre,1847,Charlotte Brontë (1816-1855) To Say Nothing of the Dog,1997,Connie Willis (1945-) The Stone Sky,2017,N.K. Jemisin (1972-)

The only thing that makes CSV at all tricky is when the data in one of the table cells contains either a comma or a newline character. For example, consider the novel "Right Ho, Jeeves" by P.G. Wodehouse. If you just comma-separate the fields, you get this:

Right Ho, Jeeves,1934,Pelham Grenville Wodehouse (1881-1975)

which will make software misinterpret " Jeeves" as the second column of this row, instead of the tail end of the first column. CSV solves this problem with quotation marks:

"Right Ho, Jeeves",1934,Pelham Grenville Wodehouse (1881-1975)

But of course now you have the question of what happens if the title of your book includes quotation marks. You should read up on how CSV handles these situations.

For this and possibly future assignments, you'll be using the books.csv file as your database. Your programs will read data from this file as needed to satisfy the requirements of the assignment. To do this, you'll use Python's csv module.

Command-line arguments in Python

Writing programs that use command-line arguments to determine their behavior is an important skill. In my day-to-day life as a programmer, I write a lot of short programs (and some long ones) to do all manner of tasks for me. Sometimes in very very short programs that I plan to run exactly once, I'll hard-code input values into the program. Those programs are often like: "Open file something.txt, read its contents, do something with the contents, and print out the results". In cases like this, I'll often just put the "something.txt" right in the code.

But even when I expect to run a program only once, I generally have to run it a few times during debugging, and then I often find that it's more useful than I thought, and I end up running it on multiple different input files, sometimes sorting the output one way, other times sorting the output another way, and so on. In such cases, I always wish I had taken one or two minutes to set up a sensible command-line argument syntax for the program.

For this assignment, you are going to write a command-line tool for extracting information from the books dataset. The assignment will have two phases. First, you will design a command-line interface for the tool. Then, after revising your design based on feedback from your discussion group, you will implement the resulting interface.

The details

What features should your program have?

You can easily imagine many features appropriate for a command-line tool concerned with a books-and-authors dataset. We're going to restrict our work to the following features:

Given a search string S, print a list of books whose titles contain S (case-insensitive).
Given a search string S, print a list of authors whose names contain S (case-insensitive). For each such author, print a list of the author's books.
Given a range of years A to B, print a list of books published between years A and B, inclusive.

What you need to do

Select one partner's cs257-assignments-USERNAME git repository to work in. Make sure all partners are given access to the repository, and that all partners have a local clone.
Create a folder called "books" at the top level of the repository. (Note the very top of this web page, where I have the notation "Folder name: books". For future assignments, you should use that "Folder name" indicator to store your work for the assignment. This is how the grader and I will find your work.)
By 11:59 Tuesday, 9/22: Design your tool's command-line syntax, and write a usage/help statement for it. Put your usage statement in books/usage.txt. Add, commit, and push this so I can see it in your repositories.
By 5:00 Friday, 9/25: Implement your tool's features, making sure to adhere to the command-line syntax you described in usage.txt. Make sure that your program prints a Usage statement (e.g. by printing the contents of usage.txt) if the users request it (via a suitable help flag) or if what they typed doesn't satisfy your expected command-line syntax.

Phase 1: design your tool's command-line syntax

Write a short list of features ("short" as in 2-3 features) that you want the tool to have. (For example, "print out the list of all authors whose names contain a specified string" or "print out a list of books by the author with the specified name", etc.)
Create a Unix-manual-page-style statement of your tool's command-line syntax. Your proposed syntax should support the features you listed in the previous item.
Be ready to paste your feature list and your command-line syntax into a Google doc at your discussion group meeting on either 9/21 or 9/22.

Phase 2: revise your command-line syntax based on discussion-group feedback, and implement the resulting features. Make sure that your program prints a Usage statement if the users request it (via a suitable help flag) or if what they typed doesn't satisfy your expected command-line syntax. Due 5:00PM 9/25.

Constraints and suggestions

In your usage.txt, when you're writing command-line syntax synopsis, go ahead and include the "python3" part of what the user would have to type to execute the program. Like so:
python3 books.py some-operation [options]
You may have just one such line in your synopsis, or you might have three (one for each of the features described above). Use the SYNOPSIS section of man-pages for various Unix commands as a rough guide.
You may choose how you want your output to look. Do you want to include the author's name and publication date when you print a book? That's up to you. When you print an author with the author's books, do you indent the books below the author? Do you print a blank line between authors? Again, that's up to you. But please try to make the output easy to read.
You may, but need not, add options to indicate how your program's output will be sorted, displayed, etc.
The official Python documentation for the csv module includes some good, simple example code.
It's also good to search the internet for things like "python csv examples", but be careful to pay attention to cues about the credibility of whatever websites you land on.
When you run a Python program and import the sys module, you have access to a list of strings called sys.argv. This list includes all the command-line arguments: sys.argv[0] is the name of your program file (e.g. "books.py"), sys.argv[1] is the first argument after that (e.g. the input file name for this assignment), etc. I want you to use sys.argv directly for this assignment.
Do not be seduced by the many tutorials and blogs that will encourage you to use the modules argparse, getopt, docopt, optparse, click, etc. to do your command-line parsing. To reiterate the previous bullet point: I want you to manipulate sys.argv directly for this assignment.
But why, Jeff? Command-line parsing without one of those handy modules is so fussy! Why would you make us do that?
- I want you to understand your tools, not just copy code from Stack Overflow or somebody's blog. All of the modules mentioned above are intended to simplify and systematize your access to sys.argv, and I want to make good and sure you understand that at its core, your command line is just a list of strings.
- Once you've done the fussy work once, you'll be better equipped to choose which argument parsing library you would prefer to use.
- I want to set us up for a discussion comparing what command-line parsing looks like in raw form (just with sys.argv) vs. several of the main contender libraries. Not only is it a code cleanliness question, it's also an engineering context question, since only argparse is shipped standard with Python, while all the other libraries (getopt, docopt, optparse, click...) have to be downloaded and installed before you can run a program that uses it. But argparse probably produces the ugliest code...so what's the right choice?
Here's a simple example of using sys.argv to parse command-line arguments.
Don't try to make this program more complicated than necessary. If you're inclined to keep working once you have the program functioning, use your extra energy to make your program as simple and easy to read as possible instead of adding new features.

Start early, ask questions, and have fun!

(And don't forget Slack—our #questions channel is meant for you!)

CS 257: Software Design

Books: command-line arguments and CSV

Folder name: books