CS 322: Natural Language Processing
Course Information
Textbook
Speech and Language Processing: An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition, 2nd edition, by
Daniel Jurafsky and James H. Martin. Good book.
The Plan
This course will be organized around a sequence of problems
chosen to give you experience with a collection of core
NLP techniques. Our interaction with each problem will
go roughly like this:
- I'll describe the problem briefly in class.
- We will discuss approaches to the problem
based on whatever ideas you have, just to get a feel
for the space of solution possibilities.
- I'll introduce a particular approach that I would
like you to pursue (often, I imagine the class will have
foreshadowed or outright named the approach I have in
mind during step 2).
- You will go out and start working to write code or
use existing software to solve the problem.
- While step 4 is going on outside of class, I'll
lecture on the core solution techniques, answer questions
about ideas or problems you're having, demonstrate relevant
software, etc.
- If appropriate to the problem, you'll collect data
to evaluate the success of your solution and submit
a report (including code, if any).
- We'll spend a class day having each group report on
its experiences and results.
For most of the problems, I'm going to ask you to work in groups of
two or three, partly to make our wrap-up discussion work better,
and partly because having somebody to bounce ideas off is very
valuable for these sorts of problems. That said, I'll give you
a break or two from partner work.
Here are the problems. Since each one will take somewhere from 3 to 7
class days, we'll probably be able to fit about 5 problems into
the term, but we'll see how it goes. Close to the end of the
term, I'll give you a take-home exam to give you the opportunity
to revisit the core ideas of the course.
- Document Classification. Can n-gram language models
be used to detect the difference between a paragraph from
the Washington Post and a paragraph from an Agatha Christie novel?
This problem will introduce not just n-grams, but also some
essential techniques for evaluating the effectiveness of
NLP algorithms.
- Spelling Suggestions. How do you decide to issue
error messages like "Did you mean 'receive'?" or
"Did you mean 'emu handler'?" There are lots of ways to
do this, but we'll use a dynamic programming technique called
minimum edit distance as part of our solution.
- Yoda-fication, Elmo-fication, and Oden-ification of Sentences.
How can you create a tool that will make predictable transformations
of sentences? For Yoda, for example, one might devise a tool to turn sentences
like "He is too impatient" to "Too impatient he is". Just as with
Eliza, we could use a collection of little tricks to pull off any
particular transformation goal. But we'll take a more general route
by first using a parser to determine a parse tree describing the
syntax of a sentence, and then applying tree-transformation rules
to the parse tree to obtain the transformed sentence. (Note that
though this problem has a silly goal, a more complex version of
the same problem could be used to do important parts of machine
translation.)
- Interlude: everybody gets to find a cool NLP tool out in the
world and show it to the class.
- Part-of-speech Tagging. Given a sentence, mark each word with
a part of speech (or a list of parts of speech accompanied by
probabilities). Once again, there are many approaches to this
problem, but we will use this problem to motivate a study of
Hidden Markov Models, which are a very important tool in NLP.
- [Something about Semantic Analysis, not yet decided]
- What Does "it" Mean? (a.k.a. Anaphora Resolution)
Given a sequence of sentences, identify
the pronouns, and figure out which noun each pronoun points to.
- Real-Word Spelling Error Detection/Correction. Go ahead and reed
a book, take a wok, or bear you're sole.
Grading
Your grade in the course will be determined by your reports
the problems we work on (85%) plus a take-home exam (15%).