Speedup Analysis

Table of Contents

This is an individual assignment. You may talk about ideas with others in the class, and if this is a programming assignment you may help each other debug code, but the code you type or the analysis that you write should be your own. Any ideas that you get from online should be limited in scope, and you should credit them appropriately via program comments.

Complete the following problems. You should turn in either a carefully written solution showing your work that you have scanned and submitted, or alternatively you can format it using an tool such as \(\LaTeX\) or other word processor if you wish and submit a PDF that you've generated. Regardless, your work must be neat and clear. The graders will take off points for hard to read handwriting or poor organization.

This homework will be graded anonymously via Moodle's anonymous marking feature, so do not include your name anywhere in your submission. If you include credits for help from other students, put them in a separate file called credits.txt. We won't look at that until after we're done grading the rest.

1 Brent's Theorem interpretation

Brent's Theorem comes in a variety of forms. In particular, the version that we developed was the one that this Wikipedia article on parallel algorithms claims is an "alternative statement of Brent's Law". It's the last inequality in the article, right above the references. (Some call it a law, some call it a theorem.) Interpret this version of Brent's Theorem in your own words. We proved it in class with some formal logic, but it can also just be justified from some pretty basic principles. Explain what it says and why it must be true from an intuitive perspective.

2 Amdahl's Law is pain

Suppose that the sequential part of a program accounts for 40% of the program's execution time on a single processor. Find a limit for the overall speedup that can be achieved, even if you have an infinite number of processors, assuming the typical Amdahl's Law assumptions.

3 Fixed program with a hard choice

You have a big parallel program to run, which has a fixed amount of work to do, including a fixed sequential portion (i.e., same scenario as for Amdahl's Law). You have a choice of using a ten-processor computer that executes one zillion instructions per second, or a single processor computer that executes five zillion instructions per second. (Zillion is a fictional number. What matters here is that the single processor computer runs 5 times as many instructions per second when compared with each processor of the ten-processor computer.)

How big or small does the sequential portion of the program need to be to make using the slower parallel computer worthwhile? Your answer should give me a specific upper or lower bound on \(S\).

4 Schiller's Law creation

Both Amdahl's Law and Gustafson's Law are based on the assumption that the amount of sequential work is fixed regardless of the number of processors used. Where they differ is:

  • Amdahl's Law is designed for the scenario that you have a single fixed set of steps to run, regardless of the number of processors. So it assumes that the total quantity of work is constant as processors are added, including the parallel portion.
  • Gustafson's Law is designed for the scenario that you have a fixed sequential portion to run (such as reading a dataset from disk into shared memory, perhaps), but then there is an unbounded amount of parallel work that could be done if you only had the time. So it assumes that the total time to run remains constant as processors are added. This means that the amount of sequential work is constant, but the amount of parallel work increases as more processors are added.

For this problem, I want you to create a new "Schiller's Law," which is based on the very common scenario that you have a very massive dataset (assume infinite), and you want to analyze as many fragments of that dataset as you have time do do, and the time to analyze a fragment depends on the size of that fragment. For Schiller's Law, we'll assume just as we do for Gustafson's Law that the amount of time our program will run is constant regardless of the number of processors. This is simply the amount of time that we are willing to wait for an answer. However, we will not assume constant sequential work; rather, we will assume that the amount of sequential work done is a direct multiple of the number of processors. For example, if we are using one processor, we'll have \(W\) units of sequential work to do; whereas if we're using \(P\) processors, we'll have a total of \(WP\) units of sequential work to do. (The scenario in my head is that if you have \(P\) processors, you'll be able to analyze \(P\) times as much data, but you've first got to load up your data from disk or from a network connection, and I'm assuming that you just can't do any parallel work in loading your data.)

Schiller's Law should define speedup just as Gustafson's Law does: compare the time on \(P\) processors with the time it would take on 1 processor to do the same total work that \(P\) processors do.

Your answer should clearly indicate your new Schiller's Law, which should be an expression of speedup based on \(P\).