CS395: Lexical Analyzer

Due: 1:10PM Friday, January 16

Your job for this assignment is to write a lexical analyzer for a subset of Java, using C or C++.

Rules

What subset of Java?

I want you to choose a subset of Java that allows you to lexically analyze the average CS117 program. For example, it is not necessary for you to include relatively advanced constructions like implements, finally, or the ?: operator, but you should definitely recognize for, public, and the <= operator.

A bit more specifically:

Structure

The heart of your analyzer should be a function with interface:

Token getnexttoken(),

where Token is a struct with a type field and some kind of data field, as discussed in class. Your program will call getnexttoken repeatedly, until the type of the returned token is some sort of end-of-input marker. When called, getnexttoken will read from standard input, and you should make sure that getnexttoken is the only portion of the program that is allowed to read from standard input.

For now, you will need to write a main program that loops on calls to getnexttoken and then prints a human-readable display of the sequence of tokens to standard output. Send error messages, if any, to standard error. Later, the calls to getnexttoken will be made by the parser, but that's a different assignment.

Advice

Plan your development incrementally. Get the basic infrastructure set up early, and then add one little feature at a time. Don't go more than an hour without making sure you can compile and run your partially completed program. An analyzer that handles numbers and keywords correctly and nothing else is better than one that tries to handle everything but fails.

Start early, have fun, and keep in touch.