CS 252: Algorithms
Exam 3
Due 5:00 PM Monday, 16 March 2009

This is an open-book, open-Internet exam. Do not consult with people (in person, electronically, or otherwise) other than Jeff Ondich about this exam. Justify your answers, and cite your sources.

  1. (8 points) Recurrence proofs, one more time.

    1. Write a careful induction proof that T(n) = O(n - 1) given that:
      T(n) ≤ T(floor(n/2)) + T(ceiling(n/2)) + k
      T(2) ≤ k
      Note that it turns out to be much easier to prove T(n) = O(n - 1) than to prove T(n) = O(n). Also, note that you will almost certainly want to separate your proof into two cases--the case where n is odd and the case where n is even.
    2. Briefly identify an algorithm whose running time can be modeled by the recurrence relation shown above.
  2. (12 points) Intractability.

    Consider the Using Up All the Refrigerator Magnets problem (UUARM) described in exercise 8 on page 508 of your textbook.

    1. Describe in pseudo-code a naive algorithm for solving UUARM, and determine an upper bound for your algorithm's running time (the tighter the better).
    2. Show that UUARM is in NP.
    3. Among the many problems shown to be NP-Complete in Chapter 8, select one that is suitable for reducing to UUARM. Use your selected problem to demonstrate that UUARM is NP-Complete.
  3. (9 points) Network flows, part 1.

    Consider the network with vertex set V = {s, t, a, b, c, d, e} and edge set E = {(s,a,2), (s,c,6), (s,d,3), (a,b,3), (c,a,2), (c,b,2), (c,e,1), (c,d,4), (b,t,7), (d,e,2), (e,t,6)} where an edge (x, y, w) goes from vertex x to vertex y with weight w.

    1. Draw the network.
    2. Run the Ford-Fulkerson Algorithm (p. 344 of your textbook) on this network by hand. After initialization and after each iteration of the While loop, show the flow values on each edge of the network, and the residual graph. At the end, report the capacity of the flow you have computed.
    3. How many distinct cuts does this network have?
    4. Identify a minimum cut, and also report its capacity. How do you know this cut is a minimum cut?
  4. (9 points) Network flows, part 2. (Based on a problem by David Liben-Nowell.)

    Due to slightly over-optimistic sales projections, I find myself with a basement full of boxes of Basque-Aymara/Aymara-Basque translation dictionary software for Apple II computers. Unable to bring myself to simply discard these lovingly crafted floppy disks, I am thrilled to discover that tomorrow, there is going to be an Andes/Pyrenees computer history conference in Ecuador. If I can just get these boxes to Quito by then, my software will support international understanding and good will.

    I have a schedule F of n flights available today. Each flight Fk in this schedule is described by five values Fk = (sk, tk, dk, ak, ck):

    sk = starting city
    tk = ending city
    dk = departure time
    ak = arrival time
    ck = capacity (number of boxes of my software it can carry)

    Please describe for me an efficient algorithm for figuring out how to get as many boxes from Minneapolis to Quito by tomorrow. You may assume that all the flights run on time, and that it takes no more than one hour to move the boxes from an arriving plane onto a departing plane. Make your algorithm as efficient as possible, and give me an upper bound on its running time. Make your upper bound as tight as possible, and explain where the upper bound came from.

  5. (10 points) Word segmentation.

    A few years ago, my dictionary software company was contacted by a company whose business was domain-name speculation. That is, they bought up domain names they considered likely to be valuable, and tried to sell them for higher prices. One of their techniques was to keep an eye on domain names whose owners were allowing (intentionally or neglectfully) their domain registrations to expire. This company contacted us because they wanted to know which of the 43 million soon-to-be-available domain names actually meant something in any of six European languages.

    My job was to take our dictionary data plus the list of 43 million domains, and try to segment each domain into a sequence of words. For example, if the domain was "miseenscene.com", my software should report (French, "mise en scène") and (English, "mi seen scene"). Obviously, the French interpretation is preferable, since "mise en scène" actually means something, while "mi" is an odd-ball little English word of greatest interest to Scrabble players. But assuming we're just looking for sequences of words that actually appear in a dictionary, both of these interpretations are acceptable.

    An additional constraint I applied to this problem was to seek the "best" segmentation assuming there were several possible segmentations. For example (see problem 5 on page 316 of your textbook), consider the string "theyouthevent". Though ["the", "youth", "event"], ["the", "you", "the", "vent"], and ["they", "out", "he", "vent"] are all legal segmentations into English words, the first one is clearly the best on syntactic and semantic grounds. As you work on this problem, you may assume that you have access to a linear-time function goodness(list of strings) that returns an integer measuring the quality of the list of strings. We can assume, for example, that goodness(["the", "youth", "event"]) is a larger number than goodness(["they", "out", "he", "vent"]).

    1. Write a pseudo-code recursive algorithm to solve this segmentation problem, assuming you have access to a list of English words. Your algorithm should involve a function that takes a string as input, and reports the best segmentation as output (assuming, as discussed above, that you already have a well-designed goodness function). For example, your function would take "thebigmoose" and return ["the", "big", "moose"]. If the segmentation fails (i.e. the input string has no segmentations into words in your word list), the function should return an empty list.
    2. Discuss the worst-case running time of your recursive algorithm, and provide an upper bound.
    3. Propose a dynamic programming modification to your algorithm. What sorts of data structures will your modification require? Will your modifications improve the performance of the algorithm? If so, explain why, and discuss the extent of the improvement (e.g. if it were to turn an exponential algorithm into a polynomial algorithm, or an n5 algorithm into an n3 algorithm, you would want to say so). If not, why not?