You have tried out a couple HTTP-based APIs,
and learned a little bit
how to use an API
from within a Python program. You have also
seen a few basic elements of API implementation with Flask.
Now, you're going to design a simple database and a simple API to give access to that
database, and then implement the API using Flask and PostgreSQL.
This is the data-flow sequence we'll be trying to implement. Client makes a request
to Server, which makes a query of Database, which responds to Server, which responds
to Client.
The data
We're going to use US COVID-19 data from the COVID Tracking Project,
but instead of using their API directly,
you're going to download their all-states data
and load it into a database of your own to provide the data source for your API.
The database schema
Your database will consist of two tables. First, there's the states table:
CREATE TABLE states (
id SERIAL,
name text,
abbreviation text
);
Each row of this table will represent one state, like (13, 'Minnesota', 'MN') or
(53, 'Guam', 'GU').
Note that I'm making up 13 and 53 for this example. Those IDs will be whatever you deem suitable.
Each row of the second table, covid19_days, will store COVID-19 statistics
for one state for one day.
CREATE TABLE covid19_days (
date date,
state_id integer,
new_deaths integer,
new_positive_tests integer,
new_negative_tests integer,
new_hospitalizations integer
);
So, for example, here's Minnesota for 8 Oct 2020:
('2020-10-08', 13, 6, 1271, 14147, 98)
The API
Here are the API endpoints you'll implement:
REQUEST: /state/{state-abbreviation}/daily
RESPONSE: a JSON list of dictionaries, each of which represents the
COVID-19 statistics from the specified state on a single date. Each
dictionary will have the following fields.
date -- YYYY-MM-DD (e.g. "2020-10-08")
state -- upper-case two-letter state abbreviation (e.g. "MN")
deaths -- (integer) the number of deaths on this date
positive -- (integer) the number of positive COVID-19 tests on this date
negative -- (integer) the number of negative COVID-19 tests on this date
hospitalizations -- (integer) the number of new hospitalizations
for COVID-19 on this date
REQUEST: /state/{state-abbreviation}/cumulative
RESPONSE: a single JSON dictionary representing the cumulative
statistics for the specified state. The dictionary will have the
following fields.
start_date -- YYYY-MM-DD (e.g. "2020-10-08")
end_date -- YYYY-MM-DD (e.g. "2020-03-11")
state -- upper-case two-letter state abbreviation (e.g. "MN")
deaths -- (integer) the total number of deaths between the
start and end dates (inclusive)
positive -- (integer) the number of positive COVID-19 tests
between the start and end dates (inclusive)
negative -- (integer) the number of negative COVID-19 tests
between the start and end dates (inclusive)
hospitalizations -- (integer) the number of hospitalizations
between the start and end dates (inclusive)
REQUEST: /states/cumulative?sort=[deaths|cases|hospitalizations]
RESPONSE: a JSON list of dictionaries, each representing the
cumulative COVID-19 statistics for each state. The dictionaries are
sorted in decreasing order of deaths, cases (i.e. positive tests), or hospitalizations,
depending on the value of the GET parameter "sort". If sort is not
present, then the list will be sorted by deaths. Each dictionary
will have the same fields as in the
/state/{state-abbreviation}/cumulative endpoint response.
Your tasks
Suggestions
- Populating your database tables can be
trickier than it looks, but it's an important skill to develop. A few suggestions:
- I recommend doing this by converting the data into two separate CSV files,
one for each table, and then using psql's \copy directive to load the CSV
files into the tables.
- I would start by using a spreadsheet program (e.g. Excel, Numbers, or Google Sheets)
to clear out the clutter. Load all-states-history.csv into the spreadsheet, and delete
all the columns that aren't included in our covid19_days table, and then
export the resulting spreadsheet as a new CSV file, say covid.
- The trickiest bit will be replacing the state abbreviations in the
state column of all-states-history.csv with state ID numbers in the state_id
column of covid19_days.csv. Need ideas about how to do this? Let me know.
- NOTE: there are more than 50 states in this dataset.
- Think ahead of time about how to break your work into small steps. Talk this
through with your partner. Choose steps that are short and easy to tell
when you're done. If you look at your list of steps and see a step that says
"load the data into the database", think again—how can you break that
big step into little steps?
- Focusing on code quality from the beginning will help you complete this assignment
more quickly than if you don't. Pick your names carefully. Decompose your code
into single-purpose functions. If there's a task that needs to be performed in more
than one API endpoint, write a function to perform that task (i.e. try to create
reusable functions). etc.
- Start early. There are enough steps in this project that you should plan to
work on it over multiple days. That gives you enough time to step away, take a walk,
read a book, sleep at night(!), etc. when you get stuck, and then come back fresh.
I can't tell you how many times I have broken through a coding impasse by
going out and walking my dog.
- Pay attention to your personal development process. Keep a list of questions
and lessons learned. Notice what worked for you and what didn't. If you feel uneasy
about a chunk of code (e.g. "this function's parameters are kind of awkward"),
take that feeling seriously! A big part of creating good code architecture is
using your aesthetic judgment, so if some code feels wrong to you, that's important.
- Finding this hard? That's because it's hard! Persistence and a willingness
to ask questions will get you there.
- Have fun! (You're writing an API server. Cool!)