Your job for this assignment is to write a lexical analyzer for a subset of Java, using C or C++.
I want you to choose a subset of Java that allows you to lexically analyze the average CS117 program. For example, it is not necessary for you to include relatively advanced constructions like implements, finally, or the ?: operator, but you should definitely recognize for, public, and the <= operator.
A bit more specifically:
Operators. Since lookahead is necessary to distinguish between the common operators < and <=, you should include both of those operators. You would be hard-pressed to write a moderately interesting program without + - * / % && || . = == < <= > >= != ++ --, so include all of those.
Keywords. Again, use the "117 test" to decide on your list of keywords to recognize: class, int, double, public, static, void, for, if, else, while, etc.
Delimiters. Here, you'll want at least { } [ ] ( ) ;.
String (double quotes) and character (single quotes) constants. No compromise here. You'll need to recognize these.
Number constants, both integer and real. Don't worry about leading + and -, since those will be caught as operators. Also, don't worry about scientific notation (e.g. 2.3E-2).
The heart of your analyzer should be a function with interface:
Token getnexttoken(),
where Token is a struct with a type field and some kind of data field, as discussed in class. Your program will call getnexttoken repeatedly, until the type of the returned token is some sort of end-of-input marker. When called, getnexttoken will read from standard input, and you should make sure that getnexttoken is the only portion of the program that is allowed to read from standard input.
For now, you will need to write a main program that loops on calls to getnexttoken and then prints a human-readable display of the sequence of tokens to standard output. Send error messages, if any, to standard error. Later, the calls to getnexttoken will be made by the parser, but that's a different assignment.
Plan your development incrementally. Get the basic infrastructure set up early, and then add one little feature at a time. Don't go more than an hour without making sure you can compile and run your partially completed program. An analyzer that handles numbers and keywords correctly and nothing else is better than one that tries to handle everything but fails.
Start early, have fun, and keep in touch.