Lab: convert a CSV dataset to postgresql

This lab will walk you through the process of converting unnormalized CSV data into a better structure for querying via SQL.

The plan

Setup

I'm assuming here that you will be working on stearns. But if you succeed at installing and running postgres on your own computer, you can do this work locally instead.

Design the tables

I'm going to give you a database design for the purposes of this lab. Soon, you will be doing your own database design based on your project's dataset.

The goals of the following database design are:

Here are the tables we're going to use:

CREATE TABLE authors ( id integer NOT NULL, surname text, given_name text, birth_year integer, death_year integer ); CREATE TABLE books ( id integer NOT NULL, title text, publication_year integer ); CREATE TABLE books_authors ( book_id integer, author_id integer );

The books_authors table is an example of something often known as a linking table. The idea is that each row of books_authors concisely expresses the idea that "author X is one of the authors of book Y".

Create the tables in postgres

Create separate csv files for each table

Next, we want to convert the books2.csv file into three separate CSV files, one per table in our database design.

Load the CSV files into their corresponding tables

At the psql prompt, do this:

\copy authors FROM 'authors.csv' DELIMITER ',' CSV NULL AS 'NULL'

and similarly for the other two tables.

Check to make sure it worked

Try these from the psql prompt.

SELECT * FROM authors; SELECT * FROM books; SELECT * FROM books_authors;

Do you see what you expect to see?

Run some interesting queries

Try this at the psql prompt. Try to make sense of what it's asking.

SELECT books.title, books.publication_year FROM books, authors, books_authors WHERE authors.surname = 'Willis' AND authors.id = books_authors.author_id AND books.id = books_authors.book_id;
SELECT authors.given_name, authors.surname, books.title, books.publication_year FROM books, authors, books_authors WHERE books.id = books_authors.book_id AND authors.id = books_authors.author_id;
SELECT authors.given_name, authors.surname, books.title, books.publication_year FROM books, authors, books_authors WHERE books.id = books_authors.book_id AND authors.id = books_authors.author_id AND authors.surname LIKE 'B%' ORDER BY authors.surname, authors.given_name;

Now try to come up with a query that reports all the authors of the book whose title is "Character Limit".

All done!

Feel free to play around with any of this to figure it out better.