Work with a partner (or two, if that's how it shakes out). You should plan to stick with this partner for phases 1, 2, and 3 of the Books assignment.
Take a look at books.csv, a file full of data about books and authors. There are only a couple dozen books in this dataset, so you wouldn't want to base an important book-related application on it. But for learning about how to manipulate datasets like this, a couple dozen books will be plenty.
The format of books.csv is known as comma-separated values (CSV). CSV is a very simple format used to store tables of textual data. Each line of text represents a row in the table, and the fields/columns in each row are separated by commas. These few lines illustrate the principle: "title,publication year,author"
The only thing that makes CSV at all tricky is when the data in one of the table cells contains either a comma or a newline character. For example, consider the novel "Right Ho, Jeeves" by P.G. Wodehouse. If you just comma-separate the fields, you get this:
For the Books assignment, phases 1, 2, and 3, you'll be using the books.csv file as your database. Your programs will read data from this file as needed to satisfy the requirements of the assignment. To do this, you'll use Python's csv module.
Writing programs that use command-line arguments to determine their behavior is an important skill. In my day-to-day life as a programmer, I write a lot of short programs (and some long ones) to do all manner of tasks for me. Sometimes in very very short programs that I plan to run exactly once, I'll hard-code input values into the program. Those programs are often like: "Open file something.txt, read its contents, do something with the contents, and print out the results". In cases like this, I'll often just put the "something.txt" right in the code.
But even when I expect to run a program only once, I generally have to run it a few times during debugging, and then I often find that it's more useful than I thought, and I end up running it on multiple different input files, sometimes sorting the output one way, other times sorting the output another way, and so on. In such cases, I always wish I had taken the one or two minutes it would have required me to set up a sensible command-line argument syntax for the program.
For the Books assignment, all three phases, I will specify a command-line syntax your code must implement. Among other things, this will enable me to automate the testing of the whole class's programs because they'll all follow the same syntax. But more important, a well-designed and correctly implemented command-line syntax will make it easier for you and your users to use your programs, which in turn will make your software more useful to human beings, which is the point of software development.
For phase 1 of the Books assignment, you will:
Start early, ask questions, and have fun!
(And don't forget Slack—our #questions channel is meant for you!)