CS 324: Some comments about readings
Philosophy
Here's some philosophy as to what I hope to achieve with the course readings:
the goal is to pre-prepare your brain for the things we do in class. There have
been a number of educational studies now that show that new ideas take a little
time and effort to get into medium and long term memory, which is where you need
them to actually be able to work with them. If the first time you see a new idea
is in class, your short-term memory tends to lock up with it, and it's harder to
be able to do interesting things with it. Another take on this concept is summarized here.
To that end, my goal is for these assigned readings to help familiarize you
with the most important ideas for class. There are two key issues I want to make
sure you know about:
- There is lots of content in the textbook that is 100% relevant for class, but
I don't assign via this mechanism, because I think it's more important to keep
the size somewhat small so you can definitely do it. But please don't
misinterpret -- if I don't assign a reading, it does not mean that it is not
important content for our course. Or, said with not so many negatives, there are
"unassigned" readings that will parallel what we do, and are certainly important
because they mimic what we'll do in class.
- It's sometimes challenging for me to
pick out precisely targeted readings with exactly the content I want you to prep
for class, because the book is not always organized in the same way that I
choose to cover the content. But we'll do our best. I will try to make clear
what portions of the readings are relevant, even if I am not "assigning" them
for pre-class readings.
Your tasks
For each assigned reading, you should post either a question you have about
the reading, or some comment associated with it. You don't need to work very
hard at this; if there's something you don't understand, ask it;
alternatively, if there's something that catches your attention, mention it or
quote it. You don't need to be innovative. In fact, if everyone in the class
has the same question, that's a really important point for me to
notice. Though we're not using Moodle in general for the course website, I've
linked to a Moodle forum, with readings numbered, where you can post questions
and comments.
Some specifics about our text
Finally, some specifics: I'll be skipping assigned readings in chapters 1 and 2 in the textbook.
- Chapter 1 is a mixture of the typical overview-of-the-field sort of
material. We'll talk about it in class, but I won't be assigning the reading
in particular. The latter part of the chapter has a number of specific
technical skills. Some of those skills are worthwhile in concept, but we'll go
back and cover them in context as they happen. If you're interested, go ahead
and look at it.
- Chapter 2 is all about MapReduce, a
great framework for doing computing in parallel. It's neat content, and I
thought long and hard about including it. I decided against it (mostly)
because it's fairly time consuming to work through and it isn't really data
mining. MapReduce is fantastic distributed programming effort that is
applicable to any sort of algorithm you want to parallelize. Also, the
textbook doesn't really refer to MapReduce again after this chapter except for
very occasionally. On the rare occasions it doesn, where it's relevant, we'll
talk through it. Again, if you're interested, take a look. If you want to
play around, there's a great simple version of a MapReduce-like framework
called Spark that you can
install on your own computer to do parallel programming with the multiple
cores on your machine. The Spark framework specifically targets Java and
Python (and Scala, if you're curious) as major languages that it supports.