CS 217: Programming Languages

Programming Assignment 1: SML

Assigned on Monday, January 6.
Due at 5 PM on Monday, January 20.

Use hsp to turn in an entire directory called sml1 (named so for the first SML assignment), that contains the files problem1.sml, problem2.sml, problem3.sml, and problem4.sml. If you need to resubmit your homework, resubmit the entire directory with an additional underscore and a version number. For example, if I need to resubmit my sml1 directory, I would name it sml1_2 (for version 2).

You should not use any functions from SML's basis or utility libraries, as this might defeat the purpose of some of these exercises.

For all parts of this assignment, you may not use side-effects (e.g. !, ref, etc.).

Note that these problems vary significantly in amount of difficulty. You should start early and look at each problem.

Problem 1

For this exercise, you should submit one file called problem1.sml that contains all the functions required below. You may define any helper functions that you wish.

Write SML functions that implement integer binary search trees. A particular node in a binary search tree can be represented with the following dataype:

datatype treeNode = 
  null |	
  node of int * treeNode * treeNode;

A non-empty binary search tree can therefore be thought of as a tuple of three entries. The first entry is a node's value (an integer), the middle is the node's left subtree, and the last is a node's right subtree. Non-empty left and right subtrees are themselves lists of three elements. An empty tree is represented by null. For example, the treeNode

node(5,null,null)

represents a tree whose only node is the leaf 5. The treeNode

node(5, node(3, null, node(4, null, null)), null)

represents the tree whose root node is 5 and has only one child, a left child, 3: this left child has only one child, a right child, 4. In a binary search tree it is always the case that values in the left subtree are less than the root's value, and values in the right subtree are greater than the root's value. Duplicate entries are not allowed.

Write the following functions:

entry(bst)Returns the node value of bst.Undefined if bst is null.

left(bst) right(bst)
These functions return the left subtree, and right subtree, respectively, of bst. They return null if bst is null.

makeBst(elt, left, right)
This function creates a new tree whose root node is elt, left subtree is left, and right subtree is right.

preorder(bst)
Print each node in bst in preorder (the root of bst, then the left subtree in preorder, then the right subtree in preorder).

insert(v, bst)
Returns a new binary search tree identical to bst with v inserted in the appropriate place. Smaller values are inserted to the left of a node, larger values to the right of a node. Discard values found to be duplicates.

Problem 2

For this exercise, you should submit one file called problem2.sml that contains the functions setEquals, union, and intersect. You may define any helper functions that you wish.

In SML sets can be represented as lists. However, unlike lists, the order of values in a set is not significant. Thus both [1, 2, 3] and [3, 2, 1] represent the same set.

(a) Write an SML function setEquals(s1,s2) that tests whether s1 and s2 (represented as lists) are equal. Two sets are equal if they contain exactly the same members, ignoring ordering. In this part you may assume that sets contain only atomic values (numbers, strings, etc., and not lists.) For example

setEquals([1, 2, 3],  [3, 2, 1]) ---> true
setEquals([1, 2], [3, 2, 1]) ---> false
setEquals(["clyde","blinky","sue"], ["sue","clyde","blinky"]) ---> true

You may assume that no sets contain duplicate members, and that s1 and s2 are of the same type.

(b) Two common operations on sets are union and intersection. The union of two sets is the set of all elements that appear in either set (with no repetitions). The intersection of two sets is the set of elements that appear in both sets.

Write SML functions union(s1,s2) and intersect(s1,s2) that implement set union and set intersection. You may again assume that set elements are always atomic, and that there are no duplicates. You may also assume that the s1 and s2 are of the same type.

union([1, 2, 3], [3, 2, 1]) ---> [1, 2, 3]
union([1, 2, 3], [3, 4, 5]) ---> [1, 2, 3, 4, 5]
union(["a", "b", "c"], ["d", "e", "f"]) ---> ["a", "b", "c", "d", "e", "f"]
intersect([1, 2, 3], [3, 2, 1]) ---> [1, 2, 3]
intersect([1, 2, 3], [4, 5, 6]) ---> []
intersect([1, 2, 3], [2, 3, 4, 5, 6]) ---> [2, 3]

The ordering of the elements in your answer may differ from the above.

Problem 3

For this exercise, you should submit one file called problem3.sml that contains all the functions required below You may define any helper functions that you wish.

(a) Write a pair of SML functions, genList(start, stop) and pairSum(numList, num). The function genList will generate a list of consecutive integers, from start to stop. (If start > stop then the empty list is generated.) For example:

genList(1, 5) ---> [1, 2, 3, 4, 5]

The function pairSum tests whether any two adjacent values in numList sum to num. For example,

pairSum([1, 2, 3], 3) ---> true

since 1+2=3. Similarly,

pairSum(genList(1, 100), 1000) ---> false

since no two adjacent integers in the range 1 to 100 can sum to 1000.

(b) An alternative to completely building a complex data structure is lazy evaluation. That is, part of the structure is built along with a suspension - a function that will supply a few more values upon request. The following SML datatype and function produces a sequence of consecutive integers in a lazy manner.

datatype lazyListEntry =                                                                                                            
  entry of int * (unit->lazyListEntry) |                                                                                            
  null;                                                                                                                             
                                                                                                                                    
fun intSeq(start:int, stop:int) : lazyListEntry =                                                                                   
  if start > stop then                                                                                                              
     null                                                                                                                           
  else                                                                                                                              
     entry(start, fn() => intSeq(start+1,stop));

When called, intSeq defines a pair of values, referred to by the tuple contained as a parameter in entry. The first element in the tuple is the first integer in the sequence. The second element in the tuple is a function that, when called, will return another entry. That entry consists of the next integer in the sequence plus another suspension function. When the end of the sequence is reached, the lazyListEntry contains null, indicating that no more values can be produced.

Rewrite your solution to pairSum as a function called pairSumLazy to take a lazy integer sequence (as defined by intSeq) rather than a list as a parameter.

Problem 4

For this exercise, you should submit one file called problem4.sml that contains all the functions required below You may define any helper functions that you wish.

(a) Assume we have a list L of integers. Define a function listify(L) that divides L into one or more sublists so that each sublist contains integers in nondecreasing (sorted) order. That is, if v1 and v2 are adjacent in L and v1 <= v2 then v1 and v2 are adjacent in the same sublist. However if v1 > v2 then v2 ends one sublist and v2 begins the next sublist. For example,

listify([3,5,1,8,9,2,1,0]) returns [[3,5],[1,8,9],[2],[1],[0]]
listify([1,2,3,4,5,6]) returns [[1,2,3,4,5,6]]
listify([5,4,3,2,1]) returns [[5],[4],[3],[2],[1]]

(b) If the output of listify contains a single sublist, then we know the input list L was in sorted order. Otherwise, we can implement a sorting algorithm (a merge sort) as follows. Take the first two sublists produced by listify and merge them into a single sorted list. Take the next two sublists and merge them. Repeat until all sublists are considered. (The last sublist is copied unchanged if it has no partner to merge with). If only one sublist remains, we are done. Otherwise, repeat the process, merging sublists pairwise. At each stage, the number of sublists is cut in half. For example,

[[3,5],[1,8,9],[2],[1],[0]] is reduced to [[1,3,5,8,9],[1,2],[0]]
and then to [[1,1,2,3,5,8,9],[0]] and finally to [[0,1,1,2,3,5,8,9]].

Implement a function mergesort(L) using listify as a subroutine. Here is an example:

mergesort([3,5,1,8,9,2,1,0]) returns [0,1,1,2,3,5,8,9]

(c) Once a list has been sorted using mergesort, testing for duplicates is easy. Implement duplicates(L), which sorts the elements in L using mergesort and returns true if duplicates are present and false if not.

(d) This approach for finding duplicates is somewhat inefficient. If a duplicate appears in L, we would want the function duplicates to return false as soon as possible, even within listify or mergesort if possible. Create a function duplicates2 (and whatever helper functions you may need to write/rewrite) where duplicate checking is done wherever possible as soon as possible. As soon as a duplicate is detected, true should be immediately returned without any further processing of L.

Note that you should not check for duplicates after the sort is done. Duplicate checking should be integral to your implementation of listify and the merge component of your merge sort. You should use SMLofNJ.Cont.callcc and SMLofNJ.Cont.throw to implement the exception mechanism you will need to return true, if you find a duplicate value.

I am indebted to Charles Fischer at University of Wisconsin-Madison for creating these wonderful problems.