The prisoner's dilemma is a canonical example of a game analyzed in game theory that shows why two purely "rational" individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and gave it the name "prisoner's dilemma" (Poundstone, 1992), presenting it as follows:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don't have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. Each prisoner is given the opportunity either to betray the other, by testifying that the other committed the crime, or to cooperate with the other by remaining silent. Here's how it goes:

If A and B both betray the other, each of them serves 2 years in prison

If A betrays B but B remains silent, A will be set free and B will serve 3 years in prison (and vice versa)

If A and B both remain silent, both of them will only serve 1 year in prison (on the lesser charge)
It is implied that the prisoners will have no opportunity to reward or punish their partner other than the prison sentences they get, and that their decision will not affect their reputation in the future. Because betraying a partner offers a greater reward than cooperating with them, all purely rational selfinterested prisoners would betray the other, and so the only possible outcome for two purely rational prisoners is for them to betray each other.^{[1]} The interesting part of this result is that pursuing individual reward logically leads both of the prisoners to betray, when they would get a better reward if they both cooperated. In reality, humans display a systematic bias towards cooperative behavior in this and similar games, much more so than predicted by simple models of "rational" selfinterested action.^{[2]}^{[3]}^{[4]}^{[5]} A model based on a different kind of rationality, where people forecast how the game would be played if they formed coalitions and then they maximize their forecasts, has been shown to make better predictions of the rate of cooperation in this and similar games given only the payoffs of the game.^{[6]}
There is also an extended "iterated" version of the game, where the classic game is played over and over between the same prisoners, and consequently, both prisoners continuously have an opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and Prisoner's Dilemma tournaments have been held to compete and test algorithms.
The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behaviour. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.
Contents

Strategy for the classic prisoners' dilemma 1

Generalized form 2

Special case: Donation game 2.1

The iterated prisoners' dilemma 3

Strategy for the iterated prisoners' dilemma 3.1

Stochastic iterated prisoner's dilemma 3.2

Zerodeterminant strategies 3.2.1

Continuous iterated prisoners' dilemma 3.3

Emergence of Stable Strategies 3.4

Reallife examples 4

In environmental studies 4.1

In animals 4.2

In psychology 4.3

In economics 4.4

In sport 4.5

Multiplayer dilemmas 4.6

Arms races 4.7

Related games 5

Closedbag exchange 5.1

Friend or Foe? 5.2

Iterated snowdrift 5.3

See also 6

References 7

Further reading 8

External links 9
Strategy for the classic prisoners' dilemma
The normal game is shown below:

Prisoner B stays silent (cooperates)

Prisoner B betrays (defects)

Prisoner A stays silent (cooperates)

Each serves 1 year

Prisoner A: 3 years
Prisoner B: goes free

Prisoner A betrays (defects)

Prisoner A: goes free
Prisoner B: 3 years

Each serves 2 years

Here, regardless of what the other decides, each prisoner gets a higher payoff by betraying the other ("defecting"). The reasoning involves an argument by dilemma: B will either cooperate or defect. If B cooperates, A should defect, since going free is better than serving 1 year. If B defects, A should also defect, since serving 2 years is better than serving 3. So either way, A should defect. Parallel reasoning will show that B should defect.
In traditional game theory, some very restrictive assumptions on prisoner behaviour are made. It is assumed that both understand the nature of the game, and that despite being members of the same gang, they have no loyalty to each other and will have no opportunity for retribution or reward outside the game. Most importantly, a very narrow interpretation of "rationality" is applied in defining the decisionmaking strategies of the prisoners. Given these conditions and the payoffs above, prisoner A will betray prisoner B. The game is symmetric, so Prisoner B should act the same way. Since both "rationally" decide to defect, each receives a lower reward than if both were to stay quiet. Traditional game theory results in both players being worse off than if each chose to lessen the sentence of his accomplice at the cost of spending more time in jail himself.
Generalized form
The structure of the traditional Prisoners’ Dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors, red and blue, and that each player chooses to either "Cooperate" or "Defect".
If both players cooperate, they both receive the reward, R, for cooperating. If Blue defects while Red cooperates, then Blue receives the temptation, T payoff while Red receives the "sucker's", S, payoff. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff S while Red receives the temptation payoff T. If both players defect, they both receive the punishment payoff P.
This can be expressed in normal form:
Canonical PD payoff matrix

Cooperate

Defect

Cooperate

R, R

S, T

Defect

T, S

P, P

and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:
T > R > P > S
The payoff relationship R > P implies that mutual cooperation is superior to mutual defection, while the payoff relationships T > R and P > S imply that defection is the dominant strategy for both agents. That is, mutual defection is the only strong Nash equilibrium in the game (i.e., the only outcome from which each player could only do worse by unilaterally changing strategy). The dilemma then is that mutual cooperation yields a better outcome than mutual defection but it is not the rational outcome because the choice to cooperate, at the individual level, is not rational from a selfinterested point of view.
Special case: Donation game
The "donation game"^{[7]} is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit b at a personal cost c with b > c. Defection means offering nothing. The payoff matrix is thus

Cooperate

Defect

Cooperate

bc, bc

c, b

Defect

b, c

0, 0

Note that 2R>T+S (i.e. 2(bc)>bc) which qualifies the donation game to be an iterated game (see next section).
The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The marginal utility of an apple to the orangegrower X is b, which is higher than the marginal utility (c) of an orange, since X has a surplus of oranges and no apples. Similarly, for applegrower Y, the marginal utility of an orange is b while the marginal utility of an apple is c. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of bc. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.
The iterated prisoners' dilemma
If two players play prisoners' dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoners' dilemma.
In addition to the general form above, the iterative version also requires that 2R > T + S, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.
The iterated prisoners' dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multiplayer, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoners' dilemma has also been referred to as the "PeaceWar game".^{[8]}
If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to punish the player. Therefore, both will defect on the last turn. Thus, the player might as well defect on the secondtolast turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.
Unlike the standard prisoners' dilemma, in the iterated prisoners' dilemma the defection strategy is counterintuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoners' dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the gametheoretic rational one.
For cooperation to emerge between game theoretic rational players, the total number of rounds N must be random, or at least unknown to the players. In this case 'always defect' may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.
Strategy for the iterated prisoners' dilemma
Interest in the iterated prisoners' dilemma (IPD) was kindled by altruistic strategies did better, as judged purely by selfinterest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection.
The winning
By analysing the topscoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

Nice

The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the topscoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely selfinterested reasons first.

Retaliating

However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a nonretaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.

Forgiving

Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counterrevenge, maximizing points.

Nonenvious

The last quality is being nonenvious, that is not striving to score more than the opponent.
The optimal (pointsmaximizing) strategy for the onetime PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iteratedPD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of alwaysdefectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.
In the strategy called Pavlov, winstay, loseswitch, If the last round outcome was P,P, a Pavlov player switches strategy the next turn, which means P,P would be considered as a failure to cooperate. For a certain range of parameters, Pavlov beats all other strategies by giving preferential treatment to coplayers which resemble Pavlov.
Deriving the optimal strategy is generally done in two ways:

Bayesian Nash Equilibrium: If the statistical distribution of opposing strategies can be determined (e.g. 50% tit for tat, 50% always cooperate) an optimal counterstrategy can be derived analytically.^{[9]}

Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988), but there is no analytic proof that this will always occur.
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers, Perukrishnen Vytelingum) introduced a new strategy at the 20thanniversary iterated prisoners' dilemma competition, which proved to be more successful than tit for tat. This strategy relied on cooperation between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start.^{[10]} Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a nonSouthampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result,^{[11]} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highestscoring player (meaning that the use of selfsacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analysing single agent strategies as compared to Axelrod's seminal tournament. However, it provided the framework for analysing how to achieve cooperative strategies in multiagent frameworks, especially in the presence of noise. In fact, long before this newrules tournament was played, Richard Dawkins in his book The Selfish Gene pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoners' dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.
Stochastic iterated prisoner's dilemma
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities".^{[12]} In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memoryn" strategy. A memory1 strategy is then specified by four cooperation probabilities: P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}, where P_{ab} is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then P_{cd} is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the "tit for tat" strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memoryn strategy there is a corresponding memory1 strategy which gives the same statistical results, so that only memory1 strategies need be considered.^{[12]}
If we define P as the above 4element strategy vector of X and Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\} as the 4element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X 's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to M_{cd,cd}=P_{cd}(1Q_{dc}). (Note that the indices for Q are from Y 's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.^{[12]}
One result of stochastic theory is that there exists a stationary vector v for the matrix M such that v\cdot M=v. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in M^n will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the longterm probabilities of an encounter producing j which will be independent of i. In other words the rows of M^\infty will be identical, giving the longterm equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for M^n and particularly M^\infty, so that each row of M^\infty will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining S_x=\{R,S,T,P\} and S_y=\{R,T,S,P\} as the shortterm payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as s_x=v\cdot S_x and s_y=v\cdot S_y, allowing the two strategies P and Q to be compared for their long term payoffs.
Zerodeterminant strategies
The relationship between zerodeterminant (ZD), cooperating and defecting strategies in the Iterated Prisoner’s Dilemma (IPD). Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and nonrobust defecting strategies. Titfortat lies at the intersection of cooperating, defecting and ZD strategies.
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zerodeterminant" (ZD) strategies.^{[12]} The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: s_x=D(P,Q,S_x) and s_y=D(P,Q,S_y), which do not involve the stationary vector v. Since the determinant function s_y=D(P,Q,f) is linear in f, it follows that \alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U) (where U={1,1,1,1}). Any strategies for which D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0 is by definition a ZD strategy, and the long term payoffs obey the relation \alpha s_x+\beta s_y+\gamma=0.
Titfortat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which D(P,Q,\beta S_y+\gamma U)=0, unilaterally setting s_y to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set s_x to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.^{[12]})
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).^{[13]}
Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is bigger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a faceoff between uniform defectors and win–stay, lose–switch agents.^{[14]}
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013.^{[15]} Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and socalled "good" strategies, which were defined by Akin (2013)^{[16]} to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if she receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.^{[15]}
Continuous iterated prisoners' dilemma
Most work on the iterated prisoners' dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoners' dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd^{[17]} found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoners' dilemma. The basic intuition for this result is straightforward: in a continuous prisoners' dilemma, if a population starts off in a noncooperative equilibrium, players who are only marginally more cooperative than noncooperators get little benefit from assorting with one another. By contrast, in a discrete prisoners' dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a noncooperative equilibrium, relative to noncooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoners' dilemma may help explain why reallife examples of tit for tatlike cooperation are extremely rare in nature (ex. Hammerstein^{[18]}) even though tit for tat seems robust in theoretical models.
Emergence of Stable Strategies
Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies.^{[19]} Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is winstay loseshift. This strategy outperforms a simple TitForTat strategy  that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.^{[20]}
Reallife examples
The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.
In environmental studies
In environmental studies, the PD is evident in crises such as global climate change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb CO
2 emissions. The immediate benefit to an individual country to maintain current behavior is perceived to be greater than the purported eventual benefit to all countries if behavior was changed, therefore explaining the current impasse concerning climate change.^{[21]}
An important difference between climate change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a gametheoretical analysis of the situation using a real iterated prisoner's dilemma.^{[22]}
Osang and Nandy provide a theoretical explanation with proofs for a regulationdriven winwin situation along the lines of Michael Porter's hypothesis, in which government regulation of competing firms is substantial.^{[23]}
In animals
Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish noncooperative inspectors by tit for tat strategy.
Vampire bats are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:^{[24]}

C/C: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."

D/C: "Temptation: You save my life on my poor night. But then I get the added benefit of not having to pay the slight cost of feeding you on my good night."

C/D: "Sucker's Payoff: I pay the cost of saving your life on my good night. But on my bad night you don't feed me and I run a real risk of starving to death."

D/D: "Punishment: I don't have to pay the slight costs of feeding you on my good nights. But I run a real risk of starving on my poor nights."
In psychology
In

)Stanford Encyclopedia of PhilosophyPrisoner's Dilemma (

The Bowerbird's Dilemma The Prisoner's Dilemma in ornithology – mathematical cartoon by Larry Gonick.


Game Theory 101: Prisoner's Dilemma

Dawkins: Nice Guys Finish First

oTreePlay Prisoner's Dilemma on
External links


Axelrod, R. (1984). The Evolution of Cooperation. ISBN 0465021212

Bicchieri, Cristina (1993). Rationality and Coordination. Cambridge University Press.

Chess, David M. (December 1988). "Simulating the evolution of behavior: the iterated prisoners' dilemma problem". Complex Systems 2 (6): 663–70.

Dresher, M. (1961). The Mathematics of Games of Strategy: Theory and Applications PrenticeHall, Englewood Cliffs, NJ.

Greif, A. (2006). Institutions and the Path to the Modern Economy: Lessons from Medieval Trade. Cambridge University Press, Cambridge, UK.

Rapoport, Anatol and Albert M. Chammah (1965). Prisoner's Dilemma. University of Michigan Press.
Further reading

^ Milovsky, Nicholas. "The Basics of Game Theory and Associated Games". Retrieved 11 February 2014.

^ Fehr, Ernst; Fischbacher, Urs (Oct 23, 2003). "The Nature of human altruism". Nature (Nature Publishing Group) 425 (6960): 785–791.

^ Tversky, Amos; Shafir, Eldar (2004). Preference, belief, and similarity: selected writings.. Massachusettes Institute of Technology Press.

^ TohKyeong, Ahn; Ostrom, Elinor; Walker, James (Sep 5, 2002). "Incorporating Motivational Heterogeneity into GameTheoretic Models of Collective Action". Public Choice 117 (3–4). Retrieved February 27, 2013.

^ Oosterbeek, Hessel; Sloof, Randolph; Van de Kuilen, Gus (Dec 3, 2003). "Cultural Differences in Ultimatum Game Experiments: Evidence from a MetaAnalysis". Experimental Economics (Springer Science and Business Media B.V) 7 (2): 171–188.

^ Capraro, V (2013). "A Model of Human Cooperation in Social Dilemmas". PLoS ONE 8 (8): e72427.

^ Hilbe, Christian; Martin A. Nowak and Karl Sigmund (April 2013). "Evolution of extortion in Iterated Prisoner’s Dilemma games". PNAS 110 (17): 6913.

^ Shy, Oz (1995). Industrial Organization: Theory and Applications. Massachusettes Institute of Technology Press.

^ For example see the 2003 study "Bayesian Nash equilibrium; a statistical test of the hypothesis" for discussion of the concept and whether it can apply in real economic or strategic situations (from Tel Aviv University).

^ :: University of Southampton

^ The 2004 Prisoners' Dilemma Tournament Results show ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by nonretaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.

^ ^{a} ^{b} ^{c} ^{d} ^{e} Press, William H.; Freeman J. Dyson (2012). "Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent". PNAS Early Edition. Retrieved 26 November 2013.

^ Adami, Christoph; Arend Hintze (2013). "Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything". p. 3.

^ Hilbe, Christian; Martin A. Nowak; Karl Sigmund (April 2013). "Evolution of extortion in Iterated Prisoner’s Dilemma games". PNAS 110 (17): 6915–6516. Retrieved 25 November 2013.

^ ^{a} ^{b} Stewart, Alexander J.; Joshua B. Plotkin (2013). "From extortion to generosity, evolution in the Iterated Prisoner’s Dilemma". PNAS Early Edition. Retrieved 25 November 2013.

^ Akin, Ethan (2013). "Stable Cooperative Solutions for the Iterated Prisoner's Dilemma". p. 9.

^ Le, S., Boyd, R. (2007). "Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma". Journal of Theoretical Biology 245 (2): 258–267.

^ Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.

^ Spaniel, William (2011). Game Theory 101: The Complete Textbook.

^ Nowak, Martin; Karl Sigmund (1993). "A strategy of winstay, loseshift that outperforms titfortat in the Prisoner's Dilemma game". Nature 364.

^ "Markets & Data".

^ Rehmeyer, Julie (20121029). "Game theory suggests current climate negotiations won't avert catastrophe". Science News. Society for Science & the Public.

^ Osang and Nandy 2003

^ Dawkins, Richard (1976). The Selfish Gene. Oxford University Press.

^ George Ainslie (2001). Breakdown of Will.

^ This argument for the development of cooperation through trust is given in The Wisdom of Crowds , where it is argued that longdistance capitalism was able to form around a nucleus of Quakers, who always dealt honourably with their business partners. (Rather than defecting and reneging on promises – a phenomenon that had discouraged earlier longterm unenforceable overseas contracts). It is argued that dealings with reliable merchants allowed the meme for cooperation to spread to other traders, who spread it further until a high degree of cooperation became a profitable strategy in general commerce

^

^ ^{a} ^{b} Schneier, Bruce (20121026). "Lance Armstrong and the Prisoners' Dilemma of Doping in Professional Sports  Wired Opinion". Wired.com. Retrieved 20121029.

^ Gokhale CS, Traulsen A. Evolutionary games in the multiverse. Proceedings of the National Academy of Sciences. 2010 Mar 23;107(12):5500–4.

^ "The Volokh Conspiracy " Elinor Ostrom and the Tragedy of the Commons". Volokh.com. 20091012. Retrieved 20111217.

^ Stephen J. Majeski (1984). "Arms races as iterated prisoner's dilemma games". Mathematical and Social Sciences 7 (3): 253–266.

^ – see Ch.29 The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation.

^ Van den Assem, Martijn J. (January 2012). "Split or Steal? Cooperative Behavior When the Stakes Are Large". Management Science 58 (1): 2–20.

^ Kümmerli, Rolf. Snowdrift' game tops 'Prisoner's Dilemma' in explaining cooperation"'". Retrieved 11 April 2012.
References
See also
Example PD Payouts (A, B)

B cooperates

B defects

A cooperates

200, 200

100, 300

A defects

300, 100

0, 0

Example Snowdrift Payouts (A, B)

B cooperates

B defects

A cooperates

200, 200

100, 300

A defects

300, 100

0, 0

This may better reflect real world scenarios, the researchers giving the example of two scientists collaborating on a report, both of whom would benefit if the other worked harder. "But when your collaborator doesn’t do any work, it’s probably better for you to do all the work yourself. You’ll still end up with a completed project."^{[34]}
Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect realworld social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.
Iterated snowdrift
This payoff matrix has also been used on the British television programmes Trust Me, Shafted, The Bank Job and Golden Balls, and on the American shows Bachelor Pad and Take It All. Game data from the Golden Balls series has been analyzed by a team of economists, who found that cooperation was "surprisingly high" for amounts of money that would seem consequential in the real world, but were comparatively low in the context of the game.^{[33]}

Cooperate

Defect

Cooperate

1, 1

0, 2

Defect

2, 0

0, 0

The payoff matrix is
Friend or Foe? is a game show that aired from 2002 to 2005 on the Game Show Network in the USA. It is an example of the prisoner's dilemma game tested on real people, but in an artificial setting. On the game show, three pairs of people compete. When a pair is eliminated, they play a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If a contestant know that their opponent is going to vote "Foe", then their own choice does not affect their own winnings. In a certain sense, Friend or Foe has a payoff model between prisoner's dilemma and the game of Chicken.
Friend or Foe?
In this game, defection is always the best course, implying that rational agents will never play. However, in this case both players cooperating and both players defecting actually give the same result, assuming there are no gains from trade, so chances of mutual cooperation, even in repeated games, are few.

Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.
Hofstadter^{[32]} once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or tradeoff. One of several examples he used was "closed bag exchange":
Closedbag exchange
Related games
Although the 'best' overall outcome is for both sides to disarm, the rational course for both sides is to arm, and this is indeed what happened. Both sides poured enormous resources into military research and armament in a war of attrition for the next thirty years until Soviet President Mikhail Gorbachev and US President Ronald Reagan negotiated arms reductions and reform in the Soviet Union caused ideological differences to abate.
The Cold War and similar arms races can be modeled as a Prisoner's Dilemma situation.^{[31]} During the Cold War the opposing alliances of NATO and the Warsaw Pact both had the choice to arm or disarm. From each side's point of view, disarming whilst their opponent continued to arm would have led to military inferiority and possible annihilation. Conversely, arming whilst their opponent disarmed would have led to superiority. If both sides chose to arm, neither could afford to attack the other, but at the high cost of developing and maintaining a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs.
Arms races
The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.^{[30]}
Many reallife dilemmas involve multiple players.^{[29]} Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multiplayer generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident the collective cost is no clean plates for anyone.
Multiplayer dilemmas
Two competing athletes have the option to use an illegal and dangerous drug to boost their performance. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over their competitor (reduced only by the legal or medical dangers of having taken the drug). If both athletes take the drug, however, the benefits cancel out and only the drawbacks remain, putting them both in a worse position than if neither had used doping.^{[28]}
Doping in sport has been cited as an example of a prisoner's dilemma.^{[28]}
In sport
Without enforceable agreements, members of a cartel are also involved in a (multiplayer) prisoners' dilemma.^{[27]} 'Cooperating' typically means keeping prices at a preagreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. Antitrust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.
Advertising is sometimes cited as a real life example of the prisoner’s dilemma. When cigarette advertising was legal in the United States, competing cigarette manufacturers had to decide how much money to spend on advertising. The effectiveness of Firm A’s advertising was partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B chose to advertise during a given period the advertising cancels out, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising. Nevertheless, the optimal amount of advertising by one firm depends on how much advertising the other undertakes. As the best strategy is dependent on what the other firm chooses there is no dominant strategy, which makes it slightly different from a prisoner's dilemma. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the creation of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry.^{[26]} This analysis is likely to be pertinent in many other business situations involving advertising.
In economics
John Gottman in his research described in "the science of trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop.
, and it is easy to see that not defecting both today and in the future is by far the best outcome, and that defecting both today and in the future is the worst outcome. The case where one abstains today but relapses in the future is clearly a bad outcome—in some sense the discipline and selfsacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.
relapsing means defecting that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, [25]
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.