This is what I did to create a small PostgreSQL database out of some population data.
Found the data in tabular form on the web at http://www.demographia.com/db-uscity1790.htm
Copied and pasted the data into an Excel spreadsheet. The tabs in the data put the numbers into the right columns (lucky me!), so I just rearranged the two-part table into a single table by cutting and pasting the bottom table to the right of the top table. Compared the city names to make sure everything was lining up properly.
I also had two identical city name columns at this point, so I just deleted the second one.
Saved the spreadsheet as a comma-separated-values (CSV) file called populations.csv.
Opened the CSV file with vi to take a closer look. Noticed some quirks:
echo 'tr "\r" "\n" < $1 > $1.tmp' tr "\r" "\n" < $1 > $1.tmp echo mv $1.tmp $1 mv $1.tmp $1
Designed a simple-minded SQL table, and wrote the CREATE statement to build the table:
DROP TABLE IF EXISTS populations; CREATE TABLE populations ( city text, year int, population int );
Getting the CSV file into PostgreSQL: a bone-headed way.
This approach is to turn my data into a bunch of SQL INSERT statements and then execute those statements.
Combined the DROP/CREATE statements with the INSERT statements into a single file called populations.sql.
Logged into thacker.mathcs.carleton.edu and used the psql command and the login information that Mike Tie sent me to load my table into my PostgreSQL database, like this:
psql -f populations.sql
Note that the installation of psql on thacker assumes my user name is the same as my login name, and prompts me for my password after I type the command. It also dumps status information back to my screen (in this case, I saw a lot of "INSERT 0 1" lines appear after I typed my password).
Finally, I ran the psql command prompt to try a quick test:
psql
and then at the psql prompt:
SELECT * FROM populations WHERE year=1900 ORDER BY population DESC;
Getting the CSV file into PostgreSQL: a smarter way.
Really, there must be a better, more general way than that. Yes, there is.
So, let's convert our data into a CSV file that matches the structure of the database's table, and then use PostgreSQL's built-in CSV loader.
Starting with the CSV file populations.csv that Excel saved and I tweaked as described above, I wrote yet another Python script to convert the population data, which I executed like so:
python populations2tablerows.py < populations.csv > poprows.csv
Logged into thacker
Ran "psql -f createtable.sql" where createtable.sql contains the "CREATE TABLE" SQL command. psql asked me for my PostgreSQL password, as usual.
Ran "psql" to get the "jondich=>" prompt.
Just checking, is my table there? At the psql prompt, type "\dt" (no quotation marks, no semi-colon).
At the psql prompt, enter:
\copy populations FROM '/path/to/poprows.csv' DELIMITER ',' CSV
Note: psql's backslash directives (e.g. "\copy") do not require or allow a semi-colon at the end. SQL queries (e.g. "SELECT * FROM populations") do require a semi-colon.
Did it work?
SELECT * FROM populations WHERE year=1900 ORDER BY population DESC;