Web Application: selecting a dataset
You will work with your web application team for this assignment.
The web application
For the remainder of the term, you'll be working on various aspects of a database-driven web application. Think about a site like Worldometer or the World Population Clock or Baseball Reference or IMDb or this ridiculous little toy for movie lovers. All of these concern themselves with various ways of searching and reporting on aspects of complex (or not-so-complex) datasets.
Our application will use a pretty typical setup:
- a browser on the user's machine
- a web application on the server that accepts normal web requests for pages (e.g. index.html) and supporting files (e.g. styles.css, scripts.js, dog.jpg)
- an API implemenation on the server that receives requests and returns JSON-formatted data
- HTML files that present the structure of the content of the web pages
- CSS files that specify the appearance of the web pages
- Javascript files that enable interactivity that's easier and more pleasant to use than just following links from page to page
- a database that the server application(s) can consult
There are performance, usability, and maintainability/extensibility tradeoffs in this structure that we'll discuss as we go along.
First step: pick a dataset
For the purposes of this project, you're going to start by selecting a dataset suitable for the pedagogical goals of the project. Normally, you would enter into a project knowing what data is involved, since there wouldn't be a project at all unless you or somebody else had an idea for what you wanted to build. But class projects are a little weird. Let's roll with it.
As we discussed last week in our discussion groups, we want data that has the following properties:
- Intrinsically interesting to you, since you'll be working with the data for several weeks, and working with boring data is no fun
- Composed of multiple fundamental entities (like books/authors/publishers or movies/directors/actors/writers), so you will get practice doing non-trivial database design
- Supportive of multiple interesting search and display options, so you will get practice doing non-trivial API and user interface design
- Has a license that allows you to use the data for academic purposes, so your project will be legal
- Is available for bulk download, preferably in CSV form, so you can convert their data into your database structure on our server
Where can you find interesting data?
You might be tempted to restrict your focus to Kaggle. You can do that, but Kaggle datasets are extremely variable in quality and interest. So don't stop there.
Here are some other places to look:
Your tasks
- Find three candidate datasets that meet the criteria above.
- (Due 11:59PM Tuesday, October 25) On the Slack channel #general, post brief descriptions of your candidate datasets, including links to the sites that have the data. Say a little something about what you like or don't like about each dataset (keeping the criteria listed above in mind).
- Wait for comment from me on Slack by the end of the day Wednesday, Oct 26. I plan to respond to your Slack posts as soon as I can as they come in, so you can get feedback quicker by posting sooner. Once you're settled on a dataset, you can being the following assignment, due 11:59PM Friday, Oct 28 (roughly: prepare feature lists and wireframe drawings based on your chosen dataset).