**CS 111 f19 - Homework 2 (Pair Programming)** *Due: Friday, October 4, at 9pm* You will need to download the following files to do this homework: [prisoner2.py](prisoner2.py), [prisoner3.py](prisoner3.py), and [history.py](history.py). Your solutions to Problems 1–4 should go in prisoner2.py and your solutions to Problems 5–7 should go in prisoner3.py. You will not need to modify history.py at all, but it must be in the same directory as prisoner2.py and prisoner3.py for them to work. Post questions to the [Homework 2 forum](https://moodle.carleton.edu/mod/forum/view.php?id=486426)! --- # Prisoner's Dilemma Let's begin with a story. >Two suspects are arrested by the police. The police have insufficient >evidence for a conviction, and, having separated the prisoners, visit each of >them to offer the same deal. If one testifies for the prosecution against the >other (defects) and the other remains silent (cooperates), the defector goes >free and the silent accomplice receives the full one-year sentence. If both >remain silent, both prisoners are sentenced to only one month in jail for a >minor charge. If each betrays the other, each receives a three-month >sentence. Each prisoner must choose to betray the other or to remain silent. >Each one is assured that the other would not know about the betrayal before >the end of the investigation. How should the prisoners act? This story is an example of a classic game theoretic problem called the *prisoner's dilemma*. A prisoner's dilemma always involves two *game players*, and each has a choice between *cooperating* and *defecting*. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. Before going any further, we'll define some game theory notation. In game theory, we differentiate between a *game*, and a *play*. A *game* refers to the set of possible choices and outcomes for the entire range of situations. A *play* refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a *two-person binary-choice game* is represented by a two-by-two matrix. Here is a hypothetical game matrix (which, by the way, does not represent a prisoner's dilemma). | **B** cooperates | **B** defects ---------------- | :--------------: | :-----------: **A** cooperates | **A** gets 5
**B** gets 5 | **A** gets 2
**B** gets 3 **A** defects | **A** gets 3
**B** gets 2 | **A** gets 1
**B** gets 1 The two players in this case are called **A** and **B**, and the choices are called *cooperate* and *defect*. Players **A** and **B** can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, they announce it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player **A** chooses to cooperate while Player **B** defects, then **A** gets 2 points and **B** gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player **A** knows before the game even starts that if they and **B** both choose to defect, they will each get 1 point. In an *iterated game*, the two players play repeatedly; thus after finishing one game, **A** and **B** may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a *play*, which constitutes a single *round* of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, **A** and **B** play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both **A** and **B** are able to study the results of the previous nine rounds. ## An Analysis of a Simple Game Matrix The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player **A**'s point of view (Player **B**'s point of view is identical): >"Suppose **B** cooperates. Then I do better by cooperating myself (I receive >five points instead of three). On the other hand, suppose **B** defects. I >still do better by cooperating (since I get two points instead of one). So no >matter what **B** does, I am better off cooperating." Player **B** will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both **A** and **B** have a dominant choice--i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. As noted, the matrix shown above does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly. **To re-cap:** in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round. Now consider the following game matrix: | **B** cooperates | **B** defects ---------------- | :--------------: | :-----------: **A** cooperates | **A** gets 2
**B** gets 2 | **A** gets -5
**B** gets 3 **A** defects | **A** gets 3
**B** gets -5 | **A** gets -1
**B** gets -1 In this case, Players **A** and **B** both have a dominant choice--namely, defection. No matter what Player **B** does, Player **A** will generally improve his own score (or at worst lose a little bit) by defecting, and vice versa. However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of losing one point each, they could win two points each. So the "rational" choice of mutual defection has a puzzling self-destructive flavor. The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions: \begin{equation*} DC > CC > DD > CD \end{equation*} \begin{equation*} CC > \frac{DC + CD}{2} \end{equation*} Note how for a given opponent's choice, the player under consideration is always better off defecting. Hence, we see defection is the dominant choice. In the second game matrix we have: \begin{equation*} DC = 5,\,CC = 2,\,DD = -1,\,CD = -3 \end{equation*} so both conditions are met. In the story at the top, you can verify that: \begin{equation*} DC = 0,\,CC = -1,\,DD = -3,\,CD = -12 \end{equation*} Again, these values satisfy the prisoner's dilemma conditions. ## The Two-Player Prisoner's Dilemma Program On this homework you will be exploring possible strategies for both two- and three-player prisoner's dilemma. This will involve both coding these strategies and then describing their behavior. Hence, **you will turn in both your code and a document with your descriptions**. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples: + **Betrayer**: a program using the **betrayer** strategy simply defects on every round of every game. + **Friend**: a program using the **friend** strategy cooperates on every round of every game. + **Chaos**: this program cooperates or defects randomly, with an equal chance of each. + **Judge**: this program cooperates on the first round. On all subsequent rounds, **judge** examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber their cooperations, **judge** will defect; otherwise this strategy will cooperate. + **Mimic**: this program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) in round $n$, then **mimic** will cooperate (defect) in round $n + 1$. All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. In a computer tournament held in the late 1970's by the political scientist Robert Axelrod, the first-prize program employed the **mimic** strategy, and achieved the highest average score in a field of far more complicated programs. A Python program for an iterated prisoner's dilemma game is provided in [prisoner2.py](prisoner2.py) as part of the code for this project. The function play_loop pits two players (or, to be more precise, two *strategies*) against one another for 100 games, then prints out the average score of each player. Player strategies are represented as functions. Each strategy takes two arguments--its own *history* (that is, a representation of all its previous *plays*) and its opponent's *history*. The strategy function returns either the string "c" for *cooperate* or the string "d" for *defect*. At the beginning of an iterated game, each history is empty. As the game progresses, the histories grow to represent the sequence of plays ("c"s and "d"s) from least recent to most recent. Note each strategy must have its own history as its first argument. So in play_loop, strat0 has history0 as its first argument, and strat1 has history1 as its first argument. Each history is represented by a History object. We'll see how to define our own objects towards the end of the term, but for now we'll focus on using them. The seven possible operations on a History object are described [here](history_doc/history.html). These operations are functions associated with a particular object, and are called *methods*. As an example, consider the [get_most_recent](history_doc/history.html#history.History.get_most_recent) method. The description says it *Returns the most recent action*. Remember that actions are either "c" for cooperate or "d" for defect. Thus, it enables you to check our opponent's most recent action, which will be useful for something like the **mimic** strategy. Note that it has one parameter, self. In Python, all methods have this self parameter. All you need to know now is that this parameter is automatically supplied and not something you provide when calling the method. So to get the most recent action of a History stored in the variable other_history, you would do something like ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python last_action = other_history.get_most_recent() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The values from the game matrix are stored in variable named rubric using something called a *dictionary* (more about these later in the term). print_results uses it to calculate the scores at the end of the iterated game. ### Problem 1 The betrayer strategy is already implemented in prisoner2.py. Implement the other four strategies described above: friend, chaos, judge, and mimic. Use play_loop to play games among the five defined strategies. Notice how a strategy’s performance varies sharply depending on its opponent. For example, friend does quite well against mimic or against another friend, but it loses badly to betrayer. Pay special attention to mimic. Notice how it never beats its opponent--but it never loses badly. Create a 25-element table in which you show the average score for tournaments pitting all possible pairings of the five different strategies: betrayer, friend, mimic, chaos, judge. You can use a spreadsheet like Excel or Google Sheets to make the table or just do it in a plain text file. Describe the behavior you observe for the different strategies. ### Problem 2 Write a new strategy careful_mimic. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. If fewer than two rounds have been played, it should cooperate. Play careful_mimic against betrayer, mimic, and judge. Describe the behavior you observe. ### Problem 3 Write a function make_alternating_strategy which takes as input two strategies (say, strat0 and strat1). make_alternating_strategy should return a strategy which alternates between strat0 and strat1. Use this function to make two new strategies (your choice as to what existing strategies these new ones alternate between). Test them against betrayer, mimic, and judge and describe the performance. ### Problem 4 Write a function gentle, which takes as input a strategy (say strat) and a number between 0 and 1 (call it gentleness_factor). The gentle function should return a strategy that plays the same as strat except: when strat defects, the new strategy should have a gentleness_factor chance of cooperating. (If gentleness_factor is 0, the returned strategy performs exactly the same as strat; if gentleness_factor is 0.5, the returned strategy cooperates half the time that strat defects; if gentleness_factor is 1, the returned strategy performs the same as friend.) Use gentle with a low value for gentleness_factor - say, 0.1 - to create two new strategies: slightly_gentle_betrayer and slightly_gentle_mimic. Test them against betrayer, mimic, and judge and describe the results. ## The Three-Player Prisoner’s Dilemma So far, all of our prisoner's dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner's dilemma has focused on two-player games). But it is possible to create a prisoner's dilemma game that involves three--or even more--players. Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does mimic mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as mimic is in the two-player game? Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents. Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary. Now we are ready to discuss the payoffs for the three-player game. We impose three rules (actually, there is no universal definition for the multi-player prisoner's dilemma. The constraints used here represent one possible version of the three-player prisoner's dilemma): 1. Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints: $DCC > CCC$ $DDD > CDD$ $DCD > CCD$ 2. A player should always be better off if more of his opponents choose to cooperate. This rule gives: $DCC > DCD > DDD$ $CCC > CCD > CDD$ 3. If one player's choice is fixed, the other two players should be left in a two-player prisoner's dilemma. This rule gives the following constraints: $CCD > DDD$ $CCC > DCD$ $CCD > \frac{CDD + DCD}{2}$ $CCC > \frac{CCD + DCC}{2}$ We can satisfy all of these constraints with the following payoffs: $CDD=-5,\,DDD=-2,\,CCD=0,\,DCD=2,\,CCC=3,\,DCC=5$ We have provided Python code for a three-player iterated game in [prisoner3.py](prisoner3.py). It has a play_loop function that takes three strategies as input, keeps track of three histories, and prints out results for three players. ### Problem 6 Write strategies betrayer3, friend3, and chaos3 that will work in a three-player game. Try them out to make sure your code is working. Write two new strategies: tough_mimic and soft_mimic. tough_mimic should defect if either of the opponents defected on the previous round. soft_mimic should defect only if both opponents defected on the previous round. Play tough_mimic, soft_mimic, and chaos3 against each other. Describe the observed behavior of these strategies. ### Problem 7 Write a function make_combined_strategy which takes as input two two-player strategies and a *combining* function. make_combined_strategy should return a three-player strategy that plays one of the two-player strategies against one of the opponents, and the other two-player strategy against the other opponent, then calls the *combining* function on the two two-player results. The *combining* function should take two prisoner's dilemma choices and return a single choice. Here's an example: this call to make_combined_strategy returns a strategy equivalent to tough_mimic in Problem 6. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python from prisoner2 import mimic def tough_combine(choice0, choice1): if choice0 == "d" or choice1 == "d": return "d" return "c" new_strat = make_combined_strategy(mimic, mimic, tough_combine) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The resulting strategy plays mimic against each opponent, and then calls the combining function on the two results. If either of the two two-player strategies has returned "d", then the three-player strategy will also return "d". You can use this example and verify that new_strat has the same behavior as tough_mimic to check your solution. ### Problem 8 (OPTIONAL): The Three-Player Prisoner's Dilemma Tournament As described earlier, Axelrod held a computer tournament to investigate the two-player prisoner's dilemma. We are going to hold a three-player tournament. You get to design a strategy for the tournament. You might submit one of the strategies developed in the project, or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. You need to: - Include in your homework submission on Moodle a copy of your function in a Python file called tournament.py. Include a brief description of how the strategy works in comments at the top. - The form of the submitted strategy should be a function that takes three arguments: the player's own history list and history lists for each of the other two players. The function should return either a "c" or a "d" for cooperate or defect. - We reserve the right to disqualify any entries that violate the spirit of the prisoner's dilemma game (e.g., by changing an opponent's history). - We strongly suggest that you try out your function before submitting it. The tournament will be a complete one, that is, every strategy plays against every other pair. Each game will consist of 100 rounds. If there are at least three submissions, the results of the tournament will be posted, so proper bragging rights can be observed. # Advice While completing this homework doesn't require a large amount of code, it can be conceptually challenging. It's in your best interest to get together with your partner early, so you have time to think carefully and ask questions. As a guideline, you should try and finish problems 1-4 be Monday or Tuesday next week. You will need to download [history.py](history.py) into the same folder as prisoner2.py and prisoner3.py. When you need to get information about a player's history, refer to the [documentation for the History object](history_doc/history.html). All the operations you will need are described there. Remember that Python takes care of providing the self parameter for you (e.g., get_length is called with no arguments and get_recent_coop is called with one argument). To get a random decimal number between 0 and 1.0, use [random.random()](https://docs.python.org/3.7/library/random.html#random.random). Remember to put import random at the top of your file. Full documentation for the random module is [here](https://docs.python.org/3.7/library/random.html) Functions are data like anything else in Python. They can be assigned to variables and given as arguments to other functions. Problems 3, 4, and 7 ask you to write functions that take strategies (i.e., other functions) as input. If you have a function assigned to the variable strat0, you can call that function like you would any other: strat0(my_history, other_history). The functions you write for these problems also return strategies in the form of new functions. To do this, you can actually define a function within a function. For example, the make_pow_fn below takes a parameter n and returns a function that takes a number and returns that number to the nth power. Note how the function returned by make_pow_fn **remembers** the value of n it was create with. This is an powerful feature of Python and other programming languages that let's us create functions that create more functions. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3(2), pow4(2)) # prints 8 16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When play_loop prints out the average scores of the competing strategies, it uses strat0.__name__ (that's strat0. followed by two underscores (_), then name, then two more underscores) and strat1.__name__ (and strat2.__name__ in the three-player version) to label these scores. These names will probably not be very informative for strategies created by make_alternating_strategy, gentle, and make_combined_strategy. To fix this, you can actually modify the .__name__ property. Continuing with the make_pow_fn example, we can see that the functions it creates all have the same .__name__ value: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3.__name__) # prints pow_fn print(pow4.__name__) # also prints pow_fn ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ but we can give them different names like this: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python def make_pow_fn(n): def pow_fn(x): return x**n pow_fn.__name__ = "pow" + str(n) return pow_fn pow3 = make_pow_fn(3) pow4 = make_pow_fn(4) print(pow3.__name__) # prints pow3 print(pow4.__name__) # prints pow4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that to combine the string "pow" and the integer n into a single piece of text, we had to first convert n to a string using the str function. Then we could use the plus operator (+) to concatenate the two strings (i.e., smush them together into a single new string) and assign the result to pow_fn.__name__. # What to Turn In Submit the following files via the [Moodle Homework 2 page](https://moodle.carleton.edu/mod/assign/view.php?id=486427). Remember that each file should have comments with your name, the date, and a description at the top. If you are working as a pair, please only one person submit. Note: you do **not** need to submit history.py. - A file called prisoner2.py with your solutions to Problems 1 through 4. You should be adding on to the provided [prisoner2.py](prisoner2.py) file, so your submission should also contain unchanged all the code originally in the provided file. - A file called prisoner3.py with your solutions to Problems 5 through 7. You should be adding on to the provided [prisoner3.py](prisoner3.py) file, so your submission should also contain unchanged all the code originally in the provided file. - A report with your descriptions of the behavior of the different strategies. This can include a spreadsheet. - OPTIONALLY a file called tournament.py with a single function that is your entry in the three-player prisoner's dilemma tournament. There should also be a description of your strategy in comments. - A file called feedback.txt with the following information (you get points for answering these questions!): - How many hours you spent outside of class on this homework. - The difficulty of this homework for this point in a 100-level course as: too easy, easy, moderate, challenging, or too hard. - What did you learn on this homework (very briefly)? Rate the educational value relative to the time invested from 1 (low) to 5 (high).