CS 322: Natural Language Processing
Part-of-speech tagging
Due 5:00PM Monday, November 22. Hand in your code, data, and report
to your Courses hand-in folder.
You may work with a partner on this assignment.
For this assignment, you will implement and test a Hidden Markov Model-based
part-of-speech tagger, as discussed in class and in Section 5.5 of the
textbook. Follow this outline:
- Familiarize yourself with the Penn Treebank Project's part-of-speech
tag set. See p. 131 of your textbook. If you want a more detailed
discussion of this tag set, visit the
Penn Treebank Project site
and grab the detailed
description of the tag set.
- Choose some tagged data available on-line to
train a Hidden Markov Model for part-of-speech tagging. You can
find likely candidates by looking at the Brown Corpus links
at the
NLTK website. Make sure to save some data for testing your
model.
- Run your HMM as a part-of-speech tagger on your test data.
- Put your report in a readme.txt file. Your report should include:
- A description of how you initialized your HMM.
- A brief description of the code you used to build your
part-of-speech tagger and how to use it.
- A list of your test sentences, showing their correct
taggings and the taggings produced by your tagger.
- The accuracy of your tagger. That is, the ratio between
the number of words your tagger tagged correctly and the
total number words in your test set.
You might find this spreadsheet HMM
useful, or maybe not.