COS 371: Programming Languages

Spring 2020

proj8: Interpreter

Due: Thursday, 05/21 at 11:30

1. Goals

To "finish" the interpreter by adding a myriad of missing features.

2. Introduction

This last stage consists of a list of features to add to your interpreter. Most portions within it are independent, which gives you more flexibility at this busy point in the term to possibly divide some of the effort among your team members. Despite, the brevity of description in the section titled "required additional features," don't underestimate the amount of work: there are more required individual pieces to do than any of the other checkpoints, not to mention optional extensions.

Under college policy, I cannot grant any extensions past the end of the final exam time.

3. Tasks

  1. Extend your collection with at least 4 pairs of test files to test the functionality specified in this checkpoint. (More is better!) As before, each pair of files should be named test.eval.input.XX and test.eval.output.XX, where XX is a test number.
  2. Add support for some more special forms and built-in functions as described below.
  3. Pursue some possible extensions. Ideas have been listed at the end of the past few checkpoints. Many more ideas are listed below. Choose some (or come up with some of your own!) and try them.
  4. Modify your Makefile to compile your new interpreter code, if necessary. When I'm in your directory and I execute the command make interpreter, I should get an executable file called interpreter in that directory.
  5. In a file called readme.txt, describe your interpreter, including what (optional) features you implemented, what (if anything) doesn't work, additional thoughts, jokes, etc.

4. Required additional features

Support the correct application of various Scheme expressions.
  1. Primitive procedures: *, -, /, <=, eq?, pair?, and apply.

    The primitive * must handle any number of arguments. The predicate pair? takes one argument. For others, you may choose to support only two arguments (i.e., print "evaluation error" if not). Optionally, implement their correct behaviour. In DrRacket, (/ 5 2) returns a fraction; you should just return the float 2.5. Optionally, implement fractions. Do not confuse this primitive apply and the C function apply.

  2. Library procedures: =, modulo, zero?, equal?, list, and append.

    You may write these in C. However, since these may be derived from other primitives, it may be faster to implement these in Scheme, see below for details.

  3. Special forms: lambda, let*, letrec, and, or, cond, set!, and begin.

    See R5RS or Dybvig for descriptions of these behaviours. Below are notes on some of these.

    • (lambda args body) where args is a single symbol.

      This version of lambda allows the definition of variadic functions.

    • let*, letrec.

      Like let, you may choose to support only a single body expression.

    • (cond list-of-clauses).

      Recall that list-of-clauses is a list of clauses, each of the form (test expr). The last test may be replaced by an else. If all tests evaluate to false and there is no else clause, the return value is unspecified. I recommend VOID_TYPE in this case.

      You do not need to handle the => alternate form for a clause, or clauses with zero or multiple expressions, even though these are all valid Scheme:

      (cond ((cons 1 2) => car))
      (cond (#t 23 42 251))
      (cond (23))
      
    • set!.

      Like define, except it changes an existing binding instead of creating a new one.

5. Highly encouraged possible extensions

Here are some "most bang for the buck" extensions that drastically improve the usability of your interpreter, are actually feasible, and require some amount of ingenuity (not just repetitive routine coding).

Please also review the optional ideas suggested throughout the project. Remember to explain what you have done in a readme.txt file.

  1. Garbage collection.

    Recall that a Scheme program is a list of S-expressions. A simple version of mark-and-sweep that cleans up between evaluating these top-level S-expressions is not very difficult to implement, and can drastically improve your memory usage. (Proper garbage collection when there are still active function calls is much, much harder.)

  2. A simple interface.

    The classic core of an interpreter is the read–eval–print loop, a.k.a. REPL. Adding this functionality to your code allows for interactive usage. For example:

    $ ./interpreter
    > (define factorial
    .   (lambda (n)
    .     (if (= n 0)
    .         1
    .         (* n (factorial (- n 1))))))
    > (factorial 5)
    120
    >
    

    One issue you will run into with this is you get unnecessary printouts of prompts when you are using a file for input instead of typing the code in interactively. You can figure out whether the interpreter is being called interactively using the isatty and fileno functions (man isatty). You should only print the prompts if stdin is interactive.

    You can end an interactive session by typing Ctrl-D (sending an EOF "end-of-file" character), so there is no need for a "quit" command.

  3. The ' shorthand for quote.

  4. The function load.

    The expression (load "tofu.scm") reads in the file and excutes the Scheme code within as if it were typed directly as part of the input.

  5. More built-in functions to manipulate lists.

    Right now, your interpreter is hard to use in part because of a lack of built-in functions. Fix this. But fix this in Scheme, not in C.

    In a file called lists.scm, implement the following functions (refer to R5RS, Dybvig, or Racket reference for specification) using only special forms and primitives that you've implemented (e.g., car, cdr, cons, null?, pair?, and apply).

    It is more fun to do these on your own, but you may refer to the code listed in Dybvig if you are stuck.

    You are not required to handle circular lists (which can be created with mutations).

    caar        ;; (caar '((1 2) (3 4) (5 6) (7 8)))      ==> 1
    cadr        ;; (cadr '((1 2) (3 4) (5 6) (7 8)))      ==> (3 4)
    ...                                           
    cddddr      ;; (cddddr '((1 2) (3 4) (5 6) (7 8)))    ==> ()
    
    list        ;; (list 1 2 3)                           ==> (1 2 3)               
    list?       ;; (cons 1 2)                             ==> #f
    length      ;; (length '(1 2 3))                      ==> 3
    list-ref    ;; (list-ref '(0 1 2 3 4) 3)              ==> 3 
    list-tail   ;; (list-tail '(0 1 2 3 4) 3)             ==> (3 4)
    member      ;; (member 'a '(1 2 3 a b c))             ==> (a b c)
    assq        ;; (assq 2 '((0 a) (1 b) (2 c) (3 d)))    ==> (2 c)
    append      ;; (append '(a b) '(c d))                 ==> (a b c d)
    reverse     ;; (reverse '(1 2 3))                     ==> (3 2 1)
    
    map         ;; (map (lambda (x) (* x x)) '(1 2 3))    ==> (1 4 9)
    filter      ;; (filter odd? '(1 2 3))                 ==> (1 3)
    foldl       ;; (foldl cons '() '(1 2 3))              ==> (3 2 1)
    foldr       ;; (foldr cons '() '(1 2 3))              ==> (1 2 3)
    
  6. More built-in functions with regards to arithmetic.

    Similarly, in a file called math.scm, implement the following functions (refer to R5RS or Dybvig for specification) using only special forms and primitives that you've implemented (e.g., +, -, *, /, and <=).

    Some of these are variadic; while it is more fun to handle variable number of arguments, you may handle just the case with two arguments.

    =        zero?             max         modulo
    >=       positive?         min         floor
    <        negative?         abs         ceiling
    >        even?             gcd         truncate
             odd?              lcm         round
    

6. More possible extensions

Here are some more ideas for possible extensions. These didn't make the cut of being "highly encouraged" perhaps because they are a bit too easy, too hard, not as useful, not as interesting, or too out there.
  1. Auto-load Scheme libraries given as command-line arguments.

    We can then launch the interpreter like

    ./interpreter lists.scm math.scm
    
    so it is easier to use those libraries. Command-line parameters in C work in a similar way as in Java.
    int main(int argc, char *argv[]) {
       ...
    }
    

    Then argc and argv will be the number of command-line arguments and their values, respectively.

  2. Use of the readline library to provide things like history and editing for your REPL.

  3. Proper tail recursion.

    Right now, when you evaluate a tail call, your eval calls apply, which in turn calls eval. Thus, you are using a lot of unnecessary space on the C call stack. Eliminating this is not too hard.

    Unfortunately, even after doing so, you will still be using unnecessary space on the heap. This is where proper garbage collection comes in. To get proper tail recursion and proper garbage collection to work in concert is much, much more difficult.

  4. Implement set-car! and set-cdr!, and fix your list functions to handle circular lists. The hardest part is printing a chain of cons cells with circular references.

  5. Implement proper behaviour for some of the special forms we simplified. Besides the ones we've discussed, these two are particularly useful: named let and variadic lambda with dotted tail.

  6. Implement macros to support user-defined syntax transformations via define-syntax.

  7. Implement delayed evaluation via special forms delay and force.

  8. Compile your interpreter using WebAssembly to put your project on the web!

7. Grading criteria

Correctness is worth 100 points, the remaining 6 items are worth 4 points each; the total is 124 points. (I reserve the right to make minor adjustments if needed.)
  1. correct (and crash-free) evaluation on tests cases: yours, mine, others'
  2. warning-free compilation
  3. correct memory usage: do you have memory leaks? is your code memory efficient?
  4. code and documentation quality: is your code readable? well organized? does your documentation, including your readme.txt, correctly describe the status of your program?
  5. quality of your test cases
  6. performance on stress tests: I will try to crash your program using big numbers with tests such as
    (define range
      (lambda (n)
        (if (zero? n)
            (quote ())
            (append (range (- n 1)) (list n)))))
    
    or Knuth's test
    (define a
      (lambda (k x1 x2 x3 x4 x5)
        (letrec ((b (lambda ()
                      (begin
                        (set! k (- k 1))
                        (a k b x1 x2 x3 x4)))))
          (if (<= k 0)
              (+ (x4) (x5))
              (b)))))
    
    (a 10 (lambda () 1) (lambda () -1) (lambda () -1) (lambda () 1) (lambda () 0))
    
    Please make sure that you can correctly evaluate these two specific stress tests (on small values); otherwise you risk losing all points in this section.
  7. set of optional features implemented

8. Submission

Your code should have no memory leaks or memory errors when run on any input (correct or incorrect) using Valgrind. We will be checking this during grading, so you should test your code via Valgrind yourself.

Follow the submission instructions from the previous checkpoint; tag your commit with proj8 and push the tag to GitHub.

Peer evaluations

I will ask you to do a thorough peer evaluation of you and your teammates. Please fill it out on Moodle shortly after submitting your code. Your project is not considered complete until you hand in the evaluation.

9. Epilogue

There is one more requirement. You must go enjoy your summer after submitting your work. This has been a difficult term, but I very much enjoyed teaching you. If you continue working on your interpreter for fun, feel free to drop by and show me some cool new features.

This assignment was originally created by David Liben-Nowell and has since been updated by Dave Musicant, Laura Effinger-Dean, and Jed Yang.

Start early, have fun, and discuss questions on Moodle.