CS257 Software Design Monday, 3 October 2022 + newline characters + Olympics data: an open-ended project - Life offers very few unambiguously correct choices - Which data do you want to include? What are the implications of your possible choices? - What entities deserve their own tables and why? players/athletes/people: name events: sport category + specific Women's Track & Field + 400m nocs how to represent person P participated during games G in event E and had these results (i.e. medal or not) Pretend my tables have athletes: id, name [5, Christine Jacoba Aaftink] events: id, category, name [17, Speed Skating, Speed Skating Women's 500 Meters] no 1 Speed Skating Speed Skating Women's 500 Meters no 2 Speed Skating Speed Skating Men's 500 Meters events table id event_category_id name yes 1 1 Speed Skating Women's 500 Meters yes 2 1 Speed Skating Men's 500 Meters event_categories: id, name 1 Speed Skating 2 Swimming 3 Alpine Skiing games: id, year, season, city [37, 1988, Winter, Albertville] linking table: athlete_id, event_id, games_id, medal [5,17,37,NULL] + Rough-and-ready database normalization - Pick essential entities (whatever that means) & give them their own tables - Look for repeated strings as a guide - Linking tables like books_authors are powerful - Don't forget: you know almost zero SQL so far. It can do lots of things you don't know yet. + Python's csv module revisited - Do a "git pull" in your clone of my cs257_2022_fall repo - Looking through postgresql/olympics-convert.py - python dictionaries - the \ - csv readers and writers - the problem of what to call a row of a csv file