Web Application: selecting a dataset
You will work with your web application team for this assignment.
The web application
For the remainder of the term, you'll be working on various aspects of a database-driven web application. Think about a site like the World Population Clock or Baseball Reference or this super-cool baby name visualizer or Worldometer or IMDb or this ridiculous little toy for movie lovers. All of these concern themselves with various ways of searching and reporting on aspects of complex (or not-so-complex) datasets.
Our application will use a pretty typical setup:
- a browser on the user's machine
- a web application on the server that accepts normal web requests for pages (e.g., index.html) and supporting files (e.g., styles.css, scripts.js, elephant.jpg)
- an API implemenation on the server that receives requests and returns JSON-formatted data
- HTML files that present the structure of the content of the web pages
- CSS files that specify the appearance of the web pages
- Javascript files that enable interactivity that's easier and more pleasant to use than just following links from page to page
- a database that the server application(s) can consult
There are performance, usability, and maintainability/extensibility tradeoffs in this structure that we'll discuss as we go along.
First step: pick a dataset
For the purposes of this project, you're going to start by selecting a dataset suitable for the pedagogical goals of the project. Normally, you would enter into a project knowing what data is involved, since there wouldn't be a project at all unless you or somebody else had an idea for what you wanted to build. But class projects are a little weird. Let's roll with it.
We want data that has the following properties:
- Intrinsically interesting to you, since you'll be working with the data for several weeks, and working with boring data is no fun
- Composed of multiple fundamental entities (like books/authors/publishers or movies/directors/actors/writers or car models/makes/years), so you will get practice doing non-trivial database design
- Supportive of multiple interesting search and display options, so you will get practice doing non-trivial API and user interface design
- Has a license that allows you to use the data for academic purposes, so your project will be legal
- Is available for bulk download, preferably in CSV form, so you can convert their data into your database structure on our server
Where can you find interesting data?
First, if you do some brainstorming about a website you would find interesting, you can certainly search for a relevant dataset on your own. But if you need some inspiration, the Carleton library has assembled a good list of sites with datasets suitable for this class. Check it out.
Here are some other places that might have something interesting:
- Data is Plural
- World Health Organization
- US government (even though some datasets have been disappearing recently)
- US census
- US Congress
- etc.
In my experience, many students find their way to kaggle.com. That's not necessarily bad, but Kaggle datasets are extremely variable in quality and interest. Many of them are pretty bad or just kinda weird. So don't look only there.
Your tasks
- Find three candidate datasets that meet the criteria above.
- (Due 5:00PM Friday, April 11) On the Slack channel #general, post brief descriptions of your candidate datasets, including links to the sites that have the data. Write a sentence or two about what you like or don't like about each dataset (keeping the criteria listed above in mind).
- Wait for comment from me on Slack by class time Monday, April 14. I plan to respond to your Slack posts as soon as I can as they come in, so you can get feedback quicker by posting sooner. Once you're settled on a dataset, you can begin the following assignment, due 11:59PM Tuesday, April 15 (roughly "write a short project proposal", but I'll post a more detailed description of this assignment by Friday).