General materials
- Syllabus
-
Office hours
-
Piazza questions and answers
- Textbook: Mining Massive Datasets, second edition
- Online
- Hardcover
- Some comments about readings
- Partner assignments
Week 1
- Warmup.
- Due Tues, Jan 6, at 11:55 pm.
Week 2
- Reading
2: Sections 3.1.1, 3.2 (opening), 3.2.1, 3.2.4, 3.3 (opening), 3.3.1,
3.3.2. Post a single question or comment.
- Due Mon, Jan 12, before class starts.
- Associated reading: throughout this week we'll be talking about chapter 3
through and including section 3.4.
Week 3
Week 4
- Associated reading: we will be covering chapter 5 on PageRank and associated topics. Specifically, we'll be covering the entire chapter except for 5.2.
- Reading
4: Sections 5.1.2, 5.1.3, and 5.1.4 in the textbook. Post a single
question or comment.
- Due Mon, Jan 26, before class starts.
- Reading
5: Chapter 6 intro, and all of 6.1 (including 6.1.1, 6.1.2, 6.1.3, and
6.1.4). Post a single question or comment.
- Due Fri, Jan 20, before class starts.
Week 5
- Associated reading: we will be portions of chapter 6 on frequent itemsets
and association rules. Specifically, we'll be covering 6.1, 6.2, and
6.4. This
chapter, which is
from an
alternative textbook, is also a great place to look if you want to see the
same ideas said differently.
- Reading
6: Section 6.4 intro, 6.4.1, 6.4.2, 6.4.3. Post a single question or comment.
- Due Wed, Feb 4, before class starts.
- Association rules, part 1. To be done with
partner if you have one; see end of assignment for part 1 breakdown.
- Due Thu, Feb 5, at 11:55 pm.
Week 6
- Reading
7: Clustering! In Chapter 8 of the Tan book, read the Chapter 8 intro,
and all of 8.1. (This is the same material as the intro of Chapter 7 of our
usual MMDS textbook, but I think the Tan book does this part better.) Then
read section 7.1.3 in our MMDS text on "The Curse of Dimensionality." Post a
single question or comment about something in any of the reading.
- Due Wed, Feb 11, before class starts.
- Association rules, part 2. To be done with
partner if you have one; see end of assignment for part 2 breakdown.
- Due Wed, Feb 11, at 11:55 pm.
Week 7
- Associated reading: Chapter 7 of our usual MMDS (Mining Massive Datasets)
textbook covers clustering, and we'll be covering section 7.1 (all of
it), 7.2 (all of it), 7.3.1, 7.3.2, 7.3.3, and 7.4 (all of it). As with
association rules, there is also
a free chapter from
the Tan et. al. book, and it covers some of the same material better than
the MMDS book (but not all of it). I'll be picking and choosing approaches from
each text when talking in class. In the Tan et. al. book, we'll be
simultaneously covering sections 8.1, 8.2, 8.3, and possibly some bits of
8.5.
Week 8
- Reading
8: In our usual MMDS textbook, read sections 9.1.1, 9.2 (prologue), 9.2.1,
9.2.2, 9.2.4, and 9.2.5. (You are welcome to read the skipped sections if you like.) Post a single question or comment.
- Due Fri, Jan 27, before class starts.
- Agglomerative Clustering. To be done with
partner if you have one.
- Due Sat, Feb 28, at 11:55 pm, which an automatic extension until Mon,
March 2, at 11:55 pm, for anyone who wants it.
Week 9
- Associated reading: We'll be covering essentially all of Chapter 9 from our
usual MMDS (Mining Massive Datasets) textbook. We may deviate a bit in how we
cover section 9.4 on dimensionality reduction.
Week 10
- Peer
evaluations. Submit this form separately for each partner that you worked
with. ( If you worked entirely alone all term, you do not need to submit.) If
you forget to submit this, I'll treat it as if you received a negative
evaluation from a partner. Don't forget!
- Due Wed, Mar 11, before class.
Finals Week
- Final project. To be done with partner if you have one.
- Due Mon, Mar 16, at 9:30 pm (end of last final exam). I am forbidden
by college policy to grant any extensions unless you gain approval from the Dean
of Students office.