INDIVIDUAL VERSUS GROUP SUCCESS IN ROUND-ROBIN
TOURNAMENT OF AN ABILITY-BASED TWO-PLAYER GAME

by

Jonathan Asante Nyantakyi
B.Sc. Computer Science,
Valley View University, Ghana, 2007

THESIS SUBMITTED IN PARTIAL FULFILMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
COMPUTER SCIENCE

UNIVERSITY OF NORTHERN BRITISH COLUMBIA
August 2018

c Jonathan Asante Nyantakyi, 2018
⃝

Abstract
This thesis is about the development of cooperation and measures of success among
self-interested agents in a defined ability-based two-player asymmetric game that is
structured as a round-robin tournament. Our research is motivated by the notion that
in many systems cooperative behaviour depends on some parameters that are usually
not considered in existing research. These include: the balance between individual
activity and interaction with others; the impact of agents’ ability levels; and the need
to maintain balance between individual and group performance. In this thesis, we
examine all these issues by using a defined game-theoretic modelling and simulation
framework. Our simulation experiments on six agent group compositions establish
some patterns of how an agent’s ability and strategy impact its individual and overall
group performance. The results demonstrate that the design framework supports
methodical comparative studies of strategy profiles with respect to specific individual
and group performance measures.

i

Contents

Abstract

i

List of Figures

v

Acknowledgements

vii

Dedication

viii

1 Introduction

1

2 Background and Related Work

5

2.1

Multiagent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2

Evolution of Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.3

Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.4

PD Tournament Structures . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5

2.4.1

Axelrod’s Tournament . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.2

Random Scheduling Tournaments

2.4.3

Round-Robin Tournament Scheduling . . . . . . . . . . . . . . . 13

2.4.4

Iterated Round-Robin Tournament Scheduling . . . . . . . . . . 14

. . . . . . . . . . . . . . . . 12

PD Evolutionary Models . . . . . . . . . . . . . . . . . . . . . . . . . . 15

ii

3 Problem Statement

17

4 The Modelling and Simulation Framework

21

4.1

4.2

The Ability-Based Two-Player Games . . . . . . . . . . . . . . . . . . . 22
4.1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.2

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.3

The Ability-based Cooperation (ABC) game . . . . . . . . . . . 23

4.1.4

The Extended Ability-based Cooperation (EABC) game . . . . 25

4.1.5

The Design of ABC Payoff Matrix . . . . . . . . . . . . . . . . . 25

4.1.6

Game analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Simulator Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1

The Purpose of the Simulator . . . . . . . . . . . . . . . . . . . 28

4.2.2

Simulator Requirements . . . . . . . . . . . . . . . . . . . . . . 28

4.2.3

Structure of the Simulator . . . . . . . . . . . . . . . . . . . . . 31

4.2.4

Simulation Engine (SE) . . . . . . . . . . . . . . . . . . . . . . 35

4.2.5

Behaviour of the Simulator . . . . . . . . . . . . . . . . . . . . . 39

4.2.6

Approaches to Experimentation . . . . . . . . . . . . . . . . . . 51

5 Experimental Results and Evaluation
5.1

5.2

52

Definitions and Parameter Settings . . . . . . . . . . . . . . . . . . . . 53
5.1.1

Composition of Agent Group . . . . . . . . . . . . . . . . . . . 53

5.1.2

Game Model

5.1.3

Performance Measures . . . . . . . . . . . . . . . . . . . . . . . 54

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Observation and Analysis of Results . . . . . . . . . . . . . . . . . . . . 57
5.2.1

Group Composition 1 . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2.2

Group Composition 2 . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.3

Group Composition 3 . . . . . . . . . . . . . . . . . . . . . . . . 61

iii

5.3

5.4

5.2.4

Group Composition 4 . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.5

Group Composition 5 . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.6

Group Composition 6 . . . . . . . . . . . . . . . . . . . . . . . . 67

Inter-group Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1

Failure Percentage . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3.2

Group Total Score . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.3.3

Group Average Score . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3.4

Group Adjusted Average Scores . . . . . . . . . . . . . . . . . . 72

Overall Observation and Analysis . . . . . . . . . . . . . . . . . . . . . 73

6 Conclusions and Future Work

74

Bibliography

77

A Appendix

81

A.1 Graphs and Tables of Results . . . . . . . . . . . . . . . . . . . . . . . 81
A.1.1 Group Composition 1 . . . . . . . . . . . . . . . . . . . . . . . . 81
A.1.2 Group Composition 2 . . . . . . . . . . . . . . . . . . . . . . . . 85
A.1.3 Group Composition 3 . . . . . . . . . . . . . . . . . . . . . . . . 89
A.1.4 Group Composition 4 . . . . . . . . . . . . . . . . . . . . . . . . 93
A.1.5 Group Composition 5 . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1.6 Group Composition 6 . . . . . . . . . . . . . . . . . . . . . . . . 101

iv

List of Figures
2.1

The Payoff Matrix of the Prisoner’s Dilemma (PD) game . . . . . . . . 10

4.1

A Use Case representation of Simulator . . . . . . . . . . . . . . . . . . 29

4.2

High-Level Structure of the Simulator . . . . . . . . . . . . . . . . . . . 32

4.3

Structure of the User Interface . . . . . . . . . . . . . . . . . . . . . . . 33

4.4

Structure of the Experiment Manager . . . . . . . . . . . . . . . . . . . 35

4.5

Structure of the Simulation Engine . . . . . . . . . . . . . . . . . . . . 37

4.6

Agent Reasoning Architecture . . . . . . . . . . . . . . . . . . . . . . . 39

4.7

Nested Simulation Control Structure

4.8

Control Structure of Simulation Experiment . . . . . . . . . . . . . . . 43

4.9

Control Structure of a Tournament . . . . . . . . . . . . . . . . . . . . 44

. . . . . . . . . . . . . . . . . . . 41

4.10 Control Structure of a Round . . . . . . . . . . . . . . . . . . . . . . . 45
4.11 Control Structure of an encounter . . . . . . . . . . . . . . . . . . . . . 46
4.12 About component of the Simulator . . . . . . . . . . . . . . . . . . . . . 47
4.13 SetUp pane of the Simulator . . . . . . . . . . . . . . . . . . . . . . . . 49
4.14 Agent pane of the Simulator . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1

Agents’ performance for different ability levels in group composition 1 . 57

5.2

Agents’ performance for different ability levels in group composition 2 . 59

5.3

Agents’ performance for different ability levels in group composition 3 . 61

v

5.4

Agents’ performance for different ability levels in group composition 4 . 63

5.5

Agents’ performance for different ability levels in group composition 5 . 65

5.6

Agents’ performance for different ability levels in group composition 6 . 67

5.7

Failure Percentage distribution for different group composition . . . . . 69

5.8

Group Total Score: 3 vs. 4 . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.9

Group Total Score: 5 vs. 6 . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.10 Group Avg Score: 3 vs. 4 . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.11 Group Avg Score: 5 vs. 6 . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.12 Group adjusted average scores . . . . . . . . . . . . . . . . . . . . . . . 72
A.1 Agents’ performance for different ability levels in composition group 1 . 83
A.2 Agents’ performance for different ability levels in composition group 2 . 86
A.3 Agents’ performance for different ability levels in group composition 3 . 90
A.4 Agents’ performance for different ability levels in group composition 4 . 94
A.5 Agents’ performance for different ability levels in group composition 5 . 98
A.6 Agents’ performance for different ability levels in group composition 6 . 102

vi

Acknowledgements

Foremost, I gratefully thank my supervisor, Dr Jernej Polajnar, for his continuous
support of this research, for his patience, motivation, and immense knowledge in this
field of study. The constructive comments and guidance greatly helped to make this
research possible.
My sincere thanks also go to Dr Desanka Polajnar, for her invaluable advice and
insightful suggestions to this work.
I am thankful to my committee members Dr Chris Opio and Dr David Casperson
for their interest in serving on my supervisory committee.

vii

This Thesis is dedicated to my family and friends

Ophelia Akyiaa, Felonie Johnson, Stephen A Takyi,
and Mehul Solanki
for their endless advice, support, and motivation during the difficult
times of this research.

viii

Chapter 1

Introduction

An important aspect of contemporary technological developments is the rise of artificial intelligent systems that are built to autonomously perform increasingly complex
and responsible tasks delegated to them. They are often built as multiagent systems (MAS) [Wooldridge, 2009], that is, they consist of interacting intelligent and
autonomous units, called agents, which perform individual subtasks but also work
together towards common objectives of the system as a whole. Specialized individual
agents may be independently designed, and the system may be placed in an operating
environment whose characteristics are not entirely known in advance. In anticipation of such circumstances, robust design should enable individual agents to develop
cooperation in real time, choosing appropriate strategies depending on the task requirements, perceived environment, and observed behaviour of other agents.
These considerations lead in part to a more general question of how and when cooperation develops in a society of self-interested intelligent individuals. This problem,
with some variations, is of common interest to many fields, such as evolutionary biol1

ogy and its generalizations to human society, psychology, political science, economics,
and philosophy. The primary mathematical tool employed is game theory, and much
of existing research is centred around variations of a single two-player game, known
as Prisoner’s Dilemma (PD). The paradox of PD is that both players receive higher
rewards (‘payoffs’) when they choose to cooperate with each other than when they
both refuse to cooperate (‘mutual defection’), and yet the game-theoretic optimality criteria (dominant strategy, Nash Equilibrium) lead to mutual defection. This is
caused by the fact that the highest payoff is provided to the player who defects against
a cooperator, and the lowest for unreciprocated cooperation [Axelrod, 2006]. As aptly
observed in [Grinberg et al., 2010], “because of this the PD game represents a conflict
between individual and collective rationality.” However, subsequent research revealed
that the pessimistic social implications of the initial PD analysis can be overcome by
many different techniques, including iterated play among multiple agents with different strategies, controlled acquisition of knowledge about strategies of other players,
etc. [Kretz, 2011, Takahashi, 2010, Axelrod, 2006].
The present study is motivated by the observation that cooperation in many natural and artificial systems depends on additional parameters that are largely absent
from existing PD-centred research literature. One such aspect is the balance between
individual activity and interaction with others. In many areas, such as academic study
or research, one can make progress by working individually and by working with others, with the balance often guided by rational factors other than abstract ‘selfishness’
or ‘altruism’.
Furthermore, the agents may differ in their levels and types of ability. The decisions on whether, when, and with whom to cooperate may well be informed by one’s
own ability and that of the prospective partner. However, the idea that payoffs may

2

depend on characteristics of the players and not only on the game itself does not fit
with the fixed-payoff matrix of PD and is largely absent from PD-based research.
A practical multiagent system design usually needs to maintain a balance between
the individual success of self-interested agents in addressing the individual objectives
associated with their subtasks, and the success of the group as a whole in completing
the overall task. This type of balance is insufficiently explored as most studies either
exclusively focus on maximizing individual payoffs ([Takahashi, 2010, Kretz, 2011])
or emphasize group interest ([Grinberg et al., 2010]).
This thesis investigates all three of the above issues within a unified game-theoretic
modelling and simulation framework. The framework implements and uses the ABC
and EABC games defined in [Polajnar and Polajnar, 2018] specifically for use in this
research. ABC is a two-player game in which players can cooperate or defect. It is
derived from a base PD game. In every encounter, the players’ payoffs depend on the
individual abilities of the two participating players. EABC combines an agent’s ABC
payoffs with gains from the agent’s individual activity to form the total individual
score.
The framework simulator allows the experimenter to define a group composed of
players with different abilities, that employ a number of different strategies. They play
a round-robin tournament that may be iterated if required. We adopt the iterated
round-robin tournament (IRRT) structure as opposed to the more commonly used
Axelrod tournament structure. The calculated tournament scores reflect individual
success and group success.
One purpose of the framework is to investigate the properties of different strategies
by which agents decide to cooperate or defect. The ‘naive’ strategies include naive

3

cooperators, who always cooperate, and naive defectors, who always defect. The
‘advanced’ strategies have a certain knowledge about the partner. For instance, they
may involve knowledge of the partner’s strategy type, or of the partner’s ability level.
In this work, we are not concerned with how such knowledge is acquired (there are
different techniques explored in the literature) but rather what strategies one can
formulate using a specific type of knowledge and how such strategies impact the
performance of individuals and groups.
The framework allows methodical experimental comparative studies across differently composed groups of agents with varying ability levels and individual strategies.
Individual performance can be measured by individual average scores from a completed tournament, in case of groups of same size and similar structure, or by an approximate measure called ‘adjusted average’ that is readily comparable across groups
of different sizes. Group performance may be measured by a ‘social welfare’ function
such as the group average or the group adjusted average, or by a ‘failure rate’ based on
an adopted threshold representing the minimum acceptable individual performance.
This allows the experimenter to examine the profile of each individual strategy with
respect to its impact upon the individual and group performance. As an illustration,
we provide experimental studies of six different group compositions and observation
of how individual strategies perform in different contexts.
The rest of this thesis is structured as follows. We review the background and other
related work together with their limitations in Chapter 2. In Chapter 3, we outline
and motivate the problem that we intend to address in this research. We present the
models and the simulation framework including its requirements definitions, structure,
and behaviour in Chapter 4. The experimental results, analysis and evaluation are
presented in Chapter 5. Chapter 6 presents the conclusions and future work.

4

Chapter 2

Background and Related Work

This chapter presents the necessary background information and an overview of related previous work in multiagent systems, the evolution of cooperation, game theory,
normal-form games, Prisoner’s Dilemma (PD), PD tournament structures, and PD
evolutionary models.

2.1

Multiagent Systems

Shoham and Leyton-Brown [2009] define multiagent systems as “systems that include
multiple autonomous entities with either diverging information or diverging interests
or both.” In another study by Wooldridge [2009] that focused mostly on artificial
systems, multiagent systems are defined as “systems in which many rational and intelligent agents interact with each other in an environment.”

5

By definition, an agent according to [Ferber and Weiss, 1999] “can be a physical
or virtual entity that can act, perceive its environment (in a partial way) and communicate with others, is autonomous and has skills to achieve its goals and tendencies.”
Russell and Norvig [2010] in another study consider agents to be rational entities and
therefore define a rational agent as “one that acts so as to achieve the best outcome
or, when there is uncertainty, the best expected outcome.”

In general, agents are autonomous entities that act upon their environment to satisfy their design objectives. Apart from their autonomous behaviour, there are other
characteristics that define agents. Wooldridge and Jennings [1999] identify common
characteristics of agents such as responding to changes in their environment (reactivity), ability to initiate goals, plans and perform goal-oriented actions (proactiveness)
and the capability to communicate with other agents or human entities within or
outside their environments (social ability).

According to Russell and Norvig [2010], the environments that agents might be situated in can have some general properties that are used in their classification. For
instance, an environment may be deterministic or non-deterministic. A deterministic
environment is one in which the next state of the environment is completely determined by the current state and the action executed by the agent. A non-deterministic
environment is one in which the actions of agents on the environment in a given state
may result in different possible outcomes, but no probabilities are attached to them.
An environment is static if it can change only as a result of agent’s action, or dynamic
if it can change due to other processes that influence the environment beyond the
agent’s control [Russell and Norvig, 2010]. An environment is considered accessible if
the agent can obtain complete information about the environment, and inaccessible

6

otherwise. An environment is classified as discrete when the sets of possible actions
and percepts are finite and fixed, or continuous otherwise [Parsons and Wooldridge,
2002].

2.2

Evolution of Cooperation

Political scientist Robert Axelrod and evolutionary biologist Hamilton used gametheoretic modelling to demonstrate how cooperation arises among members of the
same species and even among different species [Axelrod and Hamilton, 1981]. Evolution of cooperation studies how within a population of self-interested individuals
cooperation can emerge and persist [Axelrod, 2006]. Over the years, studies on the
evolution of cooperation have seen applications in fields such as economics [Friedman,
1998], political science [Axelrod, 2006], evolutionary biology [Axelrod and Hamilton,
1981], and multiagent systems. A formal framework to study the evolution of cooperation is game theory, and in particular its developing sub-discipline of evolutionary
game theory [Weibull, 1997].

2.3

Game Theory

Game theory, formally introduced by von Neumann and Morgenstern [1944], studies
interactions between self-interested agents and further provides the necessary analysis
to determine how a player should act in a given situation [Aumann, 1989]. Applications of game theory are being recognized in fields such as economics [Owen, 2013],
evolutionary biology [Ernst, 2009], and more recently in multiagent encounters where
7

the overall outcome depends critically on the choices made by all agents in the scenario [Shoham and Leyton-Brown, 2009]. Also, principles from game theory could be
applied in developing cooperative and competitive multiagent systems [Pendharkar,
2012].

Games discussed in this study belong to the class of non-cooperative games, which
means they model interactions of individual self-interested players which receive individual payoffs as a result of the game. In Non-Cooperative Game Theory, we consider
the decision theory for more than one agent and all agents act autonomously without any binding agreements. The key assumption is that every player’s goal is to
maximize its individual payoff. Depending on the game structure this may lead to
adversarial relations (as in zero-sum games), cooperation (as in coordination games),
or in most cases to interactions with elements of both conflict and cooperation.1

Definition 2.3.1: (Normal-Form Game) According to Shoham and Leyton-Brown
[2009] A (finite, n - person) normal-form game is a tuple (N, A, u), where :

• N is a finite set of n players, indexed by i;
• A = A1 × · · · × An , where Ai is a finite set of actions available to player i.
Each vector a = (ai , . . . , an ) ∈ A is called an action profile.
• u = (u1 , . . . , un ) where ui : A ↦→ IR is a real-valued utility (or payoff) function
for player i

A two-player normal-form game with two strategy options, called Cooperate and Defect, can be represented by a 2 × 2 payoff matrix, as shown in Table 2.1. The row
1

In this study, games are not necessary viewed as adversarial, and so we prefer to use the term
“partner” instead of “opponent.”

8

Player i

Player j
Cooperate
Cooperate (bi , bj )
Defect
(ai , dj )

Defect
(di , aj )
(ci , cj )

Table 2.1: Payoff-matrix representation of a 2 x 2 game in normal-form
player i and column player j receive the payoff bi and bj respectively when they both
cooperate. Similarly, when they both defect player i receives ci and player j receives
cj . In the case where player i defects while player j is trying to cooperate, player
i gets the payoff ai and player j gets the payoff dj . On the other hand, if player j
defects while player i is cooperating then player i gets di as the payoff and player j
gets aj as his payoff [Beckenkamp et al., 2007].

The two-player matrix game in normal-form shown in Table 2.1 becomes the Prisoner’s Dilemma game if and only if the following conditions are met for both player
i’s and j’s payoffs:

a>b>c>d

(2.1)

2b > a + d

(2.2)

and

A normal-form game can exist as either symmetric or asymmetric. According to
Shoham and Leyton-Brown [2009] a two-player two-action normal-form game is called
a symmetric game if it has the form:

A
B

A
x,x
v,u

9

B
u,v
y,y

The requirements of symmetric games are that players do not have distinct roles in
the game and that the payoffs of individual players do not depend on their identities.
Based on this definition, one can say that the general 2 × 2 game in normal-form
presented in Table 2.1 is symmetric when the indexed payoffs are equivalent to each
other (example, ai = aj , ∨i ̸= j) [Beckenkamp et al., 2007].

However, in the asymmetric case, the payoff for a player does depend on their identities and players do have distinct roles. Considering the general 2 × 2 game in
normal-form presented in Table 2.1, one possible way it can exist in asymmetric form
is when at least one of the payoffs ai to di differs from the corresponding payoff in aj
to dj [Beckenkamp et al., 2007].

Prisoner’s Dilemma: The narrative of the Prisoner’s Dilemma describes two arrested criminal suspects that are placed in separate rooms with no way to communicate with each other. Without knowing the other’s intention, each player has to make
a decision whether to betray (Defect) or remain silent (Cooperate) [Wooldridge and
Jennings, 1999]. The rewards and conditions for the Prisoner’s Dilemma game are
shown in Figure 2.1. The highest reward is defection when partner cooperates and

C(1)
D(1)

C(2)
RR
TS

D(2)
ST
PP

where

T >R >P >S
2R >(T + S)

Figure 2.1: The Payoff Matrix of the Prisoner’s Dilemma (PD) game
the lowest rewarded is cooperation when the partner defects. The rewards for mutual
cooperation (R, R) are higher than the rewards for mutual defection (P, P). When one
player cooperates and the other defects, the defector gets the highest payoff, T, and
the cooperator gets the lowest, S. Defection is the dominant strategy, which means
that regardless of the other player’s choice, the defecting player will do as well or bet10

ter by defecting than by cooperating. Mutual defection is the only Nash Equilibrium,
meaning that defection is always the best response to defection by the other player.
The only Pareto Optimal strategy profile will be to mutually cooperate; this means
that we cannot move from mutual cooperation to any other strategy profile without
making any player worse off.

An iterated version of the PD game is called Iterated Prisoner’s Dilemma (IPD). In
the classical IPD game, players face each other in repeated encounters and therefore
have to choose their mutual strategic actions (Cooperation or Defection) repeatedly
with the sole motive of increasing their payoffs. The main assumption of the IPD
game is that each competing player is totally unaware of the partner’s thoughts except for the moves made by the partner during their previous encounters, which the
player may or may not keep in its internal memory depending on its strategy.

2.4

PD Tournament Structures

2.4.1

Axelrod’s Tournament

The IPD was first formalized as a public tournament in 1980 by Axelrod to study
how cooperation will evolve among self-interested agents. The Axelrod’s tournament
has since been perceived as the standard model for the evolution of cooperation [Axelrod, 2006]. In the tournament, each of the n players meet each other player exactly
once and in that meeting plays a long series of PD games [Axelrod, 2006]. In the
11

Axelrod’s tournament, varied participating strategies competed against one another
and, “Tit-For-Tat”, a simple strategy where a player cooperates first, and then for
subsequent games duplicates the partner’s last known action emerged as the winner.
Analyses have shown that the success of “Tit-For-Tat” could be mostly attributed to
the presence of other strategies in the tournament that were more inclined to cooperate [Axelrod, 2006].

Axelrod’s tournament scheduling, when compared to real life situations where PD
game theory might apply has two unrealistic assumptions. These are the absence of
information about the behaviour of players other than the current partner, and the
scheduling of games where each participant plays a long series of PD games against a
single partner and never meets the same partner again [Axelrod, 2006].

2.4.2

Random Scheduling Tournaments

Variations of IPD redefine the settings under which players compete with one another.
Random scheduling abstraction proposes an arrangement where players are randomly
matched against one another to compete in an IPD tournament. Random scheduling
was implemented by Takahashi [2010] to study the sustainability of cooperation in a
large community when players are uniformly randomly matched in repeated encounters. Players in each round decide whether to cooperate or defect independent of
their own past actions but refer to the given immediate past action of their scheduled partners. The results of the study by Takahashi was that in uniformly randomly
matched encounters cooperation can be sustained by an equilibrium [Takahashi, 2010].

In another study, Heller and Mohlin [2016] showed that cooperation can still be sus12

tained in randomly matched repeated PD environments when in every encounter
players are given a fixed number of random partner’s past actions without an assumed time zero at which interaction starts. Observations from the study by Heller
and Mohlin was that in random encounters cooperation could evolve and persist even
among players that are committed to specific strategies.

An actual experiment was conducted at Jiatong University’s Smith Economic Lab,
Shanghai, by Chong et al. [2007] to investigate how information and reputation help
induce cooperation in a random-matched IPD game. In their research, 144 undergraduate students were divided into groups of 12 and randomly matched to play 60
rounds of IPD with a payoff structure similar to the classic Axelrod’s tournament
but with a higher incentive for defection and an ability to request up to 10 levels of
partner’s preceding actions. The experimental results indicate that players are more
cooperative towards partners that have a high reputation to cooperate.

2.4.3

Round-Robin Tournament Scheduling

A complete round-robin tournament of n agents is a tournament in which every agent
plays the remaining n - 1 agents. Compared to other scheduling abstractions such
as Single Elimination tournaments where pairs of players are matched according to
an initial seeding, and the winners of these pairs advance to the next round, while
the losers are eliminated after a single loss [Kim et al., 2017], or the Double Elimination tournament where no player is eliminated until he has lost two games [Glenn,
1960], round-robin tournament scheduling provides equal advantage or opportunity
for every player to meet all other partners in the game without any form of elimination.

13

Also, the round-robin tournament scheduling is considered to be the fairest way to
schedule competitions since there are no elements of luck nor arbitrariness. The final
results of round-robin tournaments are accurate because the scheduling abstraction
presents results of long period interactions against equal competitions. [Rasmussen
and Trick, 2008].

2.4.4

Iterated Round-Robin Tournament Scheduling

The Iterated Round-Robin tournament (IRRT) scheduling abstraction describes a repeated version of the round-robin tournament where each participant meets all other
players in turns for a fixed number of times and for each round they play a single PD
game. Various studies into repeated round-robin tournaments have been performed
to analyze different forms of repeated Prisoner Dilemma games. One of such is the
study by Kretz [2011] which explored the development of cooperation among different
agent strategies with varying memory capacities and payoff structures.

Kretz [2011] implements the Iterated Round-Robin Tournament scheduling to investigate how cooperation and defection will emerge for different numbers of iterations,
payoff matrices, and memory sizes of strategies in a study on the emergence of cooperative behaviour. Through computer simulations, the scheduling discipline is a
modified Iterated Round-Robin Tournament where strategies play with partners and
themselves but with knowledge of up to three levels of the most recent preceding
actions of future partners and up to two levels of their own. The conclusion is that
payoff matrices have much influence on strategies’ performances and also affect the
evolution of cooperation. The study restricts the look-up of a partner’s past moves to
only three levels, and strategies lack the opportunity to autonomously decide on the
14

number of partner’s past moves required to make a decision.

2.5

PD Evolutionary Models

Different studies of the Prisoner’s Dilemma game consider the impact of information
gathering on the evolution of cooperation. The Axelrod’s tournament assumes that
each competing player is only concerned with the current partner and once the iterated encounter with the same partner ends the player loses all information acquired
from the interactions and move on to compete against another competitor. However,
there are other models that study the impact of information gathering.

Kretz [2011] in his research considered a model that studies the evolution of cooperation among artificial agent strategies in a repeated round-robin tournament of
the PD game. A strategy with a memory size n has n + 1 sub-strategies to define the
action in the first, second, . . . nth , and any further iteration.

Players are given knowledge of up to three levels of the most recent preceding actions of their future partners and up to two levels of their own most recent actions.
At the end of every tournament, agents whose total payoffs are below the overall average payoff are eliminated while the remaining agents evolve into the next tournament
of the competition. The evolutionary tournament ends if only one strategy remains
or all remaining strategies have the same cumulative payoffs.

Kretz [2011] in his study observed the results of evolution of different strategies for
different levels of Own-to-Opponent memories.

15

In all simulation experiments, there were 10,000 iterations of the round-robin tournament and the results showed that different agent strategies emerged winners for
different levels of memory combinations and utility payoffs.

Grinberg et al. [2010] also propose a model to study the evolution of cooperation
among human participants. The main aim of that model is to represent and to predict existing game dynamics and also study the evolution of cooperation among 10
participants randomly paired to play 100 rounds of the PD Game.

These individuals are randomly assigned 5 agent strategies and competed at different
Cooperative Index (CI) levels from 0.1 to 0.9. The CI level of a game is defined to
be (R - P) / (T - S). Using agents’ total payoffs and the number of mutual cooperations as measures of success, agents evolve after every 100 rounds of play and the top
5 agents procreate and proceed to the next tournament whilst the bottom 5 agents
are eliminated. Grinberg et al. [2010] argue that with the ability for agents to predict future subjective payoffs for both cooperation and defection moves there will be
a strong positive correlation between CI levels and evolution of cooperation among
independent self-interested agents.

16

Chapter 3

Problem Statement

Advancements in distributed intelligent systems have enabled us to delegate increasingly complex tasks to heterogeneous multiagent systems that may involve software
agents and robots of various types in interactions that usually include cooperative
behaviour. In particular, this motivates investigation into how a society of independently designed artificial agents can be constructed to perform a specific task in a
dynamic environment whose behaviour is not fully predictable. Such agents should be
self-interested enough to ensure that their design objectives are achieved (individual
success) but must be enabled to proactively develop cooperation through interactions
with other members of the society in order to achieve the overall group goal (group
success).

The predominant part of modelling and simulation research on the development of cooperation in multiagent systems has been based on game-theoretic concepts, primarily
on the Prisoner’s Dilemma (PD), and to a lesser extent on other two-player games (for

17

example, public goods [Santos et al., 2008], snow drift [Souza et al., 2009]). To the
best of our knowledge most such studies have emphasized either individual success
([Axelrod, 2006], [Kretz, 2011]) or group success ([Santos et al., 2008]). There have
not been many studies that balance these two performance criteria. Yet in practical
task-oriented systems individual and group success are often closely related, complement and influence each other. Our study aims at combining both measures of
success within a single model and exploring both types of system performance within
the same setup of a simulation experiment.

In many practical systems, the participating agents will have different levels of ability
and methods of decision making. This is not adequately reflected in existing systems
of game theoretic studies. Some models (for example, [Ridinger and McBride, 2016])
include abilities conducive to better strategy choices but do not model the skills relevant to efficiency in mainstream activities. Yet we observe that in many agent groups
such mainstream abilities of individual agents impact the interaction patterns and the
resulting individual and group success.

An illustrative example that motivates our research is a class of students in a software
course, where each individual develops a piece of software to the same specification
(for example, a compiler for a specified language), done as a sequence of phases. The
students are allowed to cooperate by exchanging testing strategies and test cases, with
the intent to speed up the group progress. Thus, the progress in design and coding is
measured individually and reflects the individual’s ability, while the progress in testing depends also on the patterns of cooperation among players of varying individual
abilities. The total progress of an individual in class is a balanced combination of the
two components.

18

In each phase of the project, any two students have an opportunity to interact, analogous to a round-robin tournament; and since the project has a sequence of phases,
an Iterated Round-Robin Tournament (IRRT) arises as a natural encounter schedule.
This is one of many examples where for realistic modelling of repetitive interactions
IRRT emerges as preferable to the widely used Axelrod tournament [Axelrod, 2006]
pattern, in which each pair of players meet once and play a long series of games.

We further note that the example motivates efforts to develop strategies that enhance both individual and group success. While individual success may naturally be
measured by players’ individual scores, group success could be measured in a number
of ways. Immediate possibilities in the current example are the class average score
(that is, the ‘social welfare’ concept of game theory) and a low failure rate with respect to an established failure threshold modelling the minimum acceptable level of
individual success.

In realistic agent societies where there are cooperation and defection interactions,
it is possible that agents may acquire a certain level of knowledge about their partners and may use such information to achieve their objectives. We intend to introduce
and study the impact of information gathering among different combinations of agent
strategies. Most models in the reviewed literature implement information gathering through direct observance of partner’s action during interactions (for example,
[Axelrod, 2006]) or access to a central historical repository (for example, [Takahashi,
2010], [Kretz, 2011], [Grinberg et al., 2010]). Our study includes strategies based on
knowledge of other agents’ properties but does not address the underlying knowledge
acquisition mechanisms. We are interested in the strategic effect of information gath-

19

ering and not concerned with the mechanisms and techniques of acquiring information.

When pursuing the above research ideas in a game-theoretic setting, we note specific limitations of the Prisoner’s Dilemma as the basic underlying game structure.
The first observation is that all scoring in PD-based models typically comes from
interaction, while progress in real systems may also be achieved through individual
activity without interaction. Second, central to the dilemma in question is the fact
that mutual cooperation is more rewarding than mutual defection while an even higher
reward is collected by defecting against the cooperator, with the latter facing the lowest payoff. This ranking of interaction payoffs may not be realistic in all contexts.
In particular, when agents perform their mainstream tasks individually, with varying
abilities, they may or may not benefit from a specific instance of mutual cooperation
(as might occur in the illustrative example above).

The main objective of this research is to establish a game-theoretic modelling and
simulation framework that supports methodical experimental exploration of how different strategies of cooperation vs. defection impact both individual and group success
in multiagent systems with different agent abilities. The agents perform individual
activities and also interact with other agents, with both components contributing to
their individual scores. The two-player game chosen to model the basic encounter
must incorporate the influence of individual abilities upon player payoffs. The design
of agents’ behaviour should allow advanced strategies that take advantage of knowledge about partners’ properties (notably, their strategies or individual abilities). The
framework should let the experimenter evaluate the performance of different strategies
with respect to their impact upon individual or group success, and also compare how
a specific strategy performs in groups with different strategy compositions.

20

Chapter 4

The Modelling and Simulation
Framework

This chapter presents an asymmetric two-player cooperation game, its model design,
and the architecture of the software simulator used to implement the game. Section 4.1
describes the Ability-based Cooperation (ABC) game and the Extended ABC (EABC)
game as defined in [Polajnar and Polajnar, 2018]. The architecture of the simulator
in terms of its requirements (purpose, definition, functional and non-functional, and
component requirements) is explained in Section 4.2. Following that, we consider the
structure of the simulator in Section 4.3 and present the behaviour of the different
components of the simulator architecture in Section 4.4. Section 4.5 details some notes
on simulator implementation while Section 4.6 elaborates our different approaches to
experimentation.

21

4.1

The Ability-Based Two-Player Games

4.1.1

Introduction

The Ability-based Cooperation (ABC) game is an asymmetric two-payer game derived
from the Prisoner’s Dilemma (PD), in which the payoffs depend on the abilities of
participating players. In general, an ABC game is derived from a “base” PD game and
an assignment of abilities to players. As in PD, the players of ABC choose between
cooperation and defection, but the payoffs depend on players’ abilities, and thus the
ability distribution impacts the player motivation and outcomes. An ABC’s payoff
matrix may itself behave like PD in parts of the game’s parameter space. We further
introduce and study the notion of Extended ABC (EABC) game in which players
pursue ability-based individual activities and also interact through the ABC game.

4.1.2

Motivation

The general idea is that in a system of n agents with different abilities, each agent
performs an individual activity whose reward is proportional to the agent’s ability.
In addition, the agents interact with each other playing an asymmetric two-player
game in which they can cooperate (C) or defect (D). The payoffs for cooperation or
defection are influenced by the abilities of the individual partners. The total agent’s
score is a weighted sum of its earnings through individual activity and its payoffs from
the interactions with other players.

For an illustrative example consider a class of students in a software course, where

22

each individual develops a piece of software to same specification (for example, a
compiler for a specified language), done as a sequence of phases. The progress in
design and coding is individual and reflects the individual ability. The students are
allowed to cooperate by exchanging testing strategies and test cases, with the intent
to speed up the group progress and add a cooperative dimension to the software development activity. The progress in testing thus partly depends on cooperation patterns.

In game-theoretic terms, the class activity can be represented as an iterated tournament of scheduled mutual EABC encounters, with each project phase corresponding to
an iteration. In each encounter, a student can receive test information from the other
player (C) or receive nothing (D); likewise, the student can provide test information
to the other player (C) or provide nothing (D). The model establishes a framework
for comparative studies of how different ability-based strategies impact cooperation
and how they affect the individual and group success in a multiagent system.

4.1.3

The Ability-based Cooperation (ABC) game

Definition

An ABC game is constructed from a given Prisoners Dilemma (PD) game G, with
ability values assigned to its players. The construction is as follows:
We adopt as the base PD game any Prisoner’s Dilemma game for players i and j:

Player i

C
D

Player j
C
R, R
T, S

23

D
S, T
P, P

that satisfies the following two conditions:

T >R>P >S=0

(4.1)

2R > S + T

(4.2)

Next, we assign positive real values ai and aj , called abilities, to the players i and
j respectively.
Then the ABC game Gij is the two-player game with the following payoff matrix:

Player i

C
D

Player j
C
D
aj R, ai R
S, ai R + aj P
aj R + ai P, S ai P, aj P

Interpretation

The ABC game represents a situation in which each player pursues an individual
activity and also interacts with the other player. The ABC game formally describes
the interaction and resulting payoff. The Extended ABC (EABC) game, to be defined
next, also includes the scores from the individual activity, and defines a balanced
sum of individual activity score and the ABC game payoff as the total score. The
intuitive motivation and interpretation for the ABC game definition are situated in
that context.

24

4.1.4

The Extended Ability-based Cooperation (EABC) game

Definition

An EABC game Eij is a quadruple
Eij = (G, ai , aj , α)
where G is a base PD game, ai and aj are positive real numbers representing player
abilities, and α ∈ [0, 1] is the balancingf actor.
The triple (G, ai , aj ) is used to construct the ABC game Gij . The score of player i is
then defined as

(j)

(j)

si = αai + (1 − α)pi
(j)

where pi

(4.3)

is the i′ s payoff from its Gij game with j, and the value of α is used to

balance the relative impact of the two components.

4.1.5

The Design of ABC Payoff Matrix

Let G be a PD game with payoff parameters T > R > P > S = 0. For each pair of
players (i, j) , the asymmetric game Gij is defined with payoffs for i as follows:

25

Rij = aj R

The idea is that the higher the ability of the cooperation
partner j, the higher is the i′ s reward for mutual
cooperation with that partner

Pij = ai P

By not engaging in cooperation, the player i saves some
resources (for example, time) that can be used for individual
progress, commensurate with i′ s ability (as expressed
by the coefficient ai ).

Tij = aj R + ai P

When i defects while j cooperates, i gets both the advantage
of j ′ s cooperation (as in Rij above) and the
advantage of own non-engagement (as in Pij above).

Sij = 0

i does not benefit from this situation; all other payoffs
are positive.

The definitions of Rji , Pji , Tji , and Sji (the payoffs for j) are symmetric to the
above.

4.1.6

Game analysis

The ABC game has been derived from Prisoner’s Dilemma and bears some similarity to it. Unlike the usual version of PD, ABC is asymmetric; however, asymmetric
versions of PD have also been studied [Ahn et al., 2007, Sheposh and Gallo Jr., 1973,
Murnighan, 1991].

The first question is when ABC is actually an asymmetric PD. According to the
definition of asymmetric PD [Beckenkamp et al., 2007], ABC will be PD when all of
the following conditions hold:

Tij > Rij > Pij > Sij

Tji > Rji > Pji > Sji
26

(4.4)

2Rij > Tij + Sij

2Rji > Tji + Sji

(4.5)

After the substitution of values from the ABC payoff matrix definition, it immediately follows that all four conditions in (4.4, 4.5) hold if and only if the following two
conditions hold:

aj R > a i P

ai R > a j P

(4.6)

One can thus formulate the following result:

Proposition 1. Let Gij be an ABC game whose base payoff values for mutual cooperation and mutual defection are R and P respectively, and whose player abilities
are ai and aj . Then Gij is an instance of Asymmetric Prisoner’s Dilemma if and only if
ai
R
P
<
<
R
aj
P

(4.7)

Let us now consider a round-robin tournament with n players (n > 1), where each
player i plays one EABC game with every other player j according to a predefined
tournament schedule. All players’ ability values belong to an interval [ a, a], where
a > 0. The next proposition is immediate from (4.7).
Proposition 2. Consider a set of n agents (n > 1), playing in a round-robin
tournament based on EABC games. Let each player i have the ability ai ∈ [a, a],
where a > a > 0. If
R
ā
<
a
P

(4.8)

then every ABC game played in the tournament is an Asymmetric Prisoner’s Dilemma
game.

27

4.2

Simulator Architecture

4.2.1

The Purpose of the Simulator

The purpose of our discrete-event simulator is to provide the framework necessary
for the experimentation of a specific agent-based model involving the interaction of
multiple agents engaged in a two-player asymmetric game structured in a round-robin
tournament. This is a research simulator composed of different modules some of which
have been developed and implemented in other previous studies.

4.2.2

Simulator Requirements

The list of requirements for the simulator is divided into two main categories: functional requirements, and non-functional requirements. In the rest of this section,
different components of each category are described.

Requirements Definition

The requirements of the simulator outlined in this study provide an abstract description of the services provided by the simulator and the constraints under which the
simulator operates. The requirements definition has been categorized into two different kinds: functional requirements and non-functional requirements. The remainder
of this section elaborates each of these requirements definition in the context of the
simulator described.

28

Functional Requirements of the Simulator
The functional requirements of the simulator consider the requirements that are related to the conceptual functions of the simulator without considering how the features
are interoperated. The Use Case diagram shown in Figure 4.1 describes the functional
requirements of the simulator.

Agent Parameters
e
lud

Inc

Tournament Structure
Parameters

e

lud

Setup Simulation

Inc

Inclu

Inclu

de

de

Information Gathering
Parameters

Game Parameters

Inclu

de
Evolution Model

Start Simulation

e

lud

Inc

Set Up and Run
Simulation

Include

Run Simulation

Experimenter

Include
Inc

lud

Stop Simulation

e

Step Simulation

e

lud

Inc

Save Simulation

de

Display Graph of
Simulation Results

lu
Inc

Include

Display Simulation
Results
Inc

lud

Display Simulation
Log

e

Display Simulation
Measure of Success
Results

Figure 4.1: A Use Case representation of Simulator

1. Set up and Run Simulation: The experimenter should be able to set up and
run a number of different simulation experiment to study the results of different
measures of success. To set up and run simulation may include the following:
29

(a) Set up Simulation: This includes;
i. Agent Parameters: The experimenter should be able to select the desired number of agents required to run in the simulation experiment
and assign to each agent its level of ability.
ii. Tournament Structures Parameters: The experimenter of the simulator must be able to choose the number of tournament iterations for
each experiment, base payoff matrix, and balancing factor.
iii. Game Parameters: The simulation environment shall comply with
other game parameters such as information gathering approach and
the approach to evolution specified per each experiment.
(b) Run Simulation: The experimenter will be able to run a simulation experiment. This functional requirement includes the ability of the simulator to
start and stop the simulation experiment any time at the request of the
experimenter in order to observe simulation results. Also, the simulator
shall comply with the experimenter’s decision to step through the simulation experiment. To step through the simulation run means the simulator
simulates one round and then stores a snapshot for that round until the
experimenter decides to perform another step, run, or stop simulation.
(c) Save Simulation: The experimenter shall be able to save the results of
simulation experiments in terms of agents’ performance as a statistics, progressive charts, or simulation logs.
2. Display Simulation Results: The experimenter should be able to display the
results of each simulation experiment. The Use Case depiction of this functional
requirement includes the following:
(a) Display Graph of Simulation Results: The simulator shall be able to comply
30

with the experimenter’s request to display a graphical representation of the
simulation results indicating for each tournament agents’ score, average
score, maximum and minimum scores.
(b) Display Simulation Logs: The simulator shall upon request from the experimenter provide access to a log of agents’ actions, strategies, and scores.
(c) Display Measures of Success Results: The experimenter of the simulator
should be able to view the results for different measures of success such as
individual and group success.

Non-Functional Requirements of the Simulator

The non-functional requirements of the simulator define the qualitative specifications
including its properties and constraints on the simulator as a whole. These requirements ensure the simulator’s quality and maintainability [Sommerville, 2011]. The
simulator described in this thesis meets the non-functional requirements of Efficiency
and the potential ability to execute Parallel Runs. The efficiency of the simulator
measures its productivity while parallel runs ensure faster simulation executions.

4.2.3

Structure of the Simulator

The high-level design of the simulator is described in this sub-section. We explore
the organization of interrelated elements of the simulator to provide the basis for its
detailed design and implementation.

31

Setup
Repository

Experiment Manager

Results
Repository

Simulation Engine

User Interface

Experimenter

Agents

Historical
Info.
Repository

Historical
Information
Manager

Figure 4.2: High-Level Structure of the Simulator

The structure of the simulator can be divided into eight main components. These
are the User Interface (UI), Experiment Manager (EM), SetUp Repository (SR), Simulation Engine (SE), Results Repository (RR), Historical Information Manager (HIM),
Historical Information Repository (HIR), and Agent as shown in Figure 4.2.

User Interface (UI)

As a very important component of the simulator’s design, the User Interface executes
actively to ensure an easy, and interactive use of the simulator by the experimenter.
Before a simulation experiment, the User Interface provides an Input Interface for the
experimenter to set up the number of agents, assign agents’ strategies, and initialize

32

or update other simulation setup parameters in the SetUp Repository. At the end
of simulation, the experimenter can use the Results Interface sub-component of User
Interface to query and display specific results of the simulation. Figure 4.3 shows the
interrelated components of the User Interface.
User Interface

Input Interface

EM

Experimenter

Results Interface

Instruction
Data

Setup Repository

Results Repository

Figure 4.3: Structure of the User Interface

Experiment Manager (EM)

The Experiment Manager upon activation from the User Interface performs the function of validating assigned strategies, setting up ability levels and other simulation
input parameters stored in the Setup Repository. At the end of each experiment, the
Experiment Manager retrieves the results from the Simulation Engine and stores in
the Result repository to be displayed to the Experimenter.

The Experiment Manager can be functionally divided into two sub-components as
33

shown in Figure 4.4. These are Agents Setup Manager and Parameter Configuration
Manager.

The Agents Setup Manager validates the number of agents in the experiment and also
ensures that there is an accurate assignment of strategies and agents’ ability levels as
specified by the experimenter in the setup repository. The Parameter Configuration
Manager, on the other hand validates and assigns all other parameter inputs of the
simulation environment such as payoff matrix, number of tournaments iterations, and
the balancing factor stored in the setup repository.

These validated inputs and assigned parameters are then sent to the Simulation Engine and Agent components respectively to begin the experiment. In the case of an
invalid simulation input, the Experiment Manager alerts the experimenter through
the User Interface to make the necessary corrections to invalid input parameters before the simulation begins.

34

Experiment Manager

Agents SetUp Manager

Agent

Parameter Conﬁg Manager

SE

UI

Instruction
Data

Setup Repository

Results Repository

Figure 4.4: Structure of the Experiment Manager

SetUp Repository (SR)

The SetUp Repository (SR) serves as a storage component for configuration values.
Inputs for the experiment are stored in the Setup Repository to be accessed by Experiment Manager for validation and assignments before simulation experiments begin.

4.2.4

Simulation Engine (SE)

The design of the Simulation Engine (SE) shown in Figure 4.5 assumes an active
simulation component capable of supporting concurrent discrete-event simulation of
our model defined in Section 4.1. Its main responsibility is to begin and control the
execution of simulation experiments after receiving the input and a signal from the

35

Experiment Manager.

For each simulation run, the Simulation Engine kickstarts by activating the Scheduler. The Scheduler plans events for execution and also provides the arrangement for
agents to compete at their assigned time-steps while ensuring that the round-robin
scheduling abstraction adopted for this model is maintained. The agent interaction
schedule created by the scheduler is then made available to all other sub-components
of the Simulation Engine.

For each tournament in the simulation experiment, the Tournament Handler component determines the total number of rounds of games that must be played based
on the number of agents competing in the tournament. In addition, the Tournament
Handler ensures that each agent participates in equal encounters before proceeding
to the next tournament or end of the simulation. In every tournament, agents engage
in r rounds of encounters. The Round Handler is responsible for ensuring that for
each round in a specific tournament participants are paired with different partners.
Also, the Round Handler manages agents’ performance and pay-off profiles for each
round.

The Match Manager provides the capacity for encounters scheduled in each round
to be executed independent of one another. Also, it ensures that the right scores are
assigned to every agent in the simulation experiment based on their actions and the
payoff matrix. The Match Manager after every encounter stores information on agent
ids, abilities, strategies, actions, and scores in the Historical Information Repository
through the Historical Information Manager and in the Results repository through
the Experiment Manager.

36

Simulation Engine

Scheduler

Agents
Match
Schedule

Tournament Handler
Experiment
Manager

Agent

Round Handler

Instruction

Match Manager

Data

Figure 4.5: Structure of the Simulation Engine

Historical Information Manager (HIM)

At the end of every encounter, the Historical Information Manager (HIM) receives the
scores, and actions of the paired agents from Simulation Engine in order to update
the Historical Information Repository. Furthermore, this active component of the
simulator serves the function of responding to all queries by agents pertaining to
information on current or future partners.

37

Historical Information Repository (HIR)

The Historical Information Repository (HIR) is basically a storage unit for all strategies, abilities, scores, and past actions of agents in the simulation experiment. This
passive component is updated at the end of every round by the Historical Information
Manager. Information stored in the Historical Information Repository can only be
accessed by agents after submitting requests to the Historical Information Manager.

Agent

An agent in this model describes a goal-oriented, autonomous, and intelligent component of the simulator that competes by submitting actions to the Simulation Engine
in every scheduled encounter. During every encounter against a partner, the agent
perceives, reasons, and performs an action upon request from the Simulation Engine.

In order to take an optimal action as to whether to cooperate or defect against a
scheduled partner, the agent may request information about the partner, or other
agents in the tournament by submitting a request to the Historical Information Manager. After this, the agent refers to its strategy algorithm and acts appropriately in
order to achieve its desired goals. Figure 4.6 shows the interrelated components of
the agent component.

38

Agent
Belief Updater

HIM

Belief Base

EM
Reasoning Engine

Instruction
Data

SE

Figure 4.6: Agent Reasoning Architecture

4.2.5

Behaviour of the Simulator

This section of the study discusses the behaviour of various essential components
of the simulator. The behaviour of elements such as Simulation Engine (SE), User
Interface (UI), and Agent will be analyzed in the context of their functions, control
operators, inputs and outputs.

Simulation Engine

The design of the simulator follows steps of discrete events decomposed into a number of encounters that are executed in each round of a tournament by the Simulation
Engine. The number of encounters scheduled per round and the number of rounds
played in each tournament typically depend on the number of agents in an experi39

ment run. At the beginning of the experiment, agents selected by the experimenter
are assigned their initial strategies broadly categorized into Naive Cooperators, Naive
Defectors, Selective Cooperators, and Limited Cooperators (Fixed and Ratio). Apart
from the assignment of strategies, agents are also assigned individual ability levels.
These agents are then scheduled in a round-robin tournament that require them to
submit actions in every repeated round until the end of the experiment. In experiments requiring multiple rounds, the simulator supports the ability of some agents to
request and retain summaries of partners’ information. A general control structure
for the Simulation Engine is as shown in Figure 4.7;

40

Figure 4.7: Nested Simulation Control Structure

41

In a specific simulation experiment, the Simulation Engine begins its function
by activating the Scheduler which initializes current experiment configuration values
received from the Experiment Manager (EM). After this, the Scheduler assigns to
every agent an AgentId by which it is identified throughout the experiment. At the
end of the experiment, the Scheduler calculates agents’ total scores, arranges their
performance in a leaderboard based on abilities, and signals the Experiment Manager
to store the results in the result repository while at the same requesting Experiment
Manager to submit configuration values for the next experiment. The Scheduler’s
control structure is as shown in Figure 4.8

42

Figure 4.8: Control Structure of Simulation Experiment

After setting up configuration values and scheduling agents, the Scheduler invokes
the Tournament Handler to initiate, coordinate, and control the activities of agents
for every tournament specified in the current experiment. In an experiment with n
players, the Tournament Handler ensures that there will be ( n2 (n − 1) ) number of
games for every tournament and each player participates in (n - 1 ) number of games
where n is always an even number value. At the end of a tournament, the Tournament
43

Handler reports a summary statistics of the maximum score, average score, minimum
score, and players’ progress. In multiple tournaments, the Tournament Handler ensures that all agents participate in equal number of games before initiating the next
tournament as shown in Figure 4.9.

Figure 4.9: Control Structure of a Tournament

A single tournament may consist of multiple rounds of encounters depending on
the number of agents. Each round is managed by the Round Handler which monitors the activities of agents in every round. The Round Handler ensures that every
agent participates in a single encounter against a different partner in every round and
44

submits a single action against its scheduled partner. The action could be either to
cooperate or defect. The control structure of the Round Handler is as shown in Figure
4.10.

Figure 4.10: Control Structure of a Round

Encounters between agents and their partners are controlled by the Match Handler.
The Match Handler initializes a new encounter, schedules, and requests agents to
submit their actions. Based on agents’ submitted actions, the Match Handler assigns
45

scores to the respective agents. At the end of every scheduled encounter, the Match
Handler reports on the agents’ strategies, actions, and scores. In addition, the Match
Handler returns the calculated scores to both agents. Figure 4.11 describes the flow
structure of the Match Handler and agents’ actions during an encounter.

Figure 4.11: Control Structure of an encounter

The UI

The User Interface (UI) component of the simulator is responsible for providing interactive access between the experimenter and the simulator. As an important component of the simulator, the User Interface has the goal of ensuring an easy, efficient,
and user-friendly use of the system. The implementation design of this component
46

has multiple tabs that perform different functions. These are About , Setup, Agent,
and Display. The functionalities of these components are briefly explained below;

About

The About tab provides a brief description of the simulator, a list of agent strategies,
experiment’s input parameters and other general information as shown in Figure 4.12

Figure 4.12: About component of the Simulator

47

Setup

The Setup tab which is mainly controlled by the Experiment Manager provides access for the experimenter to define, validate, and update the settings required for the
simulation experiment. Experiment settings specified in the Setup tab includes payoff
values (Temptation to Defect, Reward, Sucker Punch, and Punishment for Defection),
Number of Tournaments, Information Request Limit, and Uncertainty Level that may
be varied to study their effects on agents’ performances and the evolution of cooperation. Alternatively, the experimenter may decide to upload the experiment’s setup
using the Command Line Interface (CLI). Figure 4.13 shows the Setup tab of the
simulator.

48

Figure 4.13: SetUp pane of the Simulator

Agent pane

This pane provides the experimenter access to specify the total number of agents in
the experiment as well as select the different strategies (Naive Cooperators, Naive
Defectors, Limited Cooperators (Fixed and Ratio), and Selective Cooperators.) that
must be assigned to agents. As shown in Figure 4.14, the agent pane also specifies
the minimum and maximum ability intervals. An experimenter may decide to specify
49

the different ability levels.

Figure 4.14: Agent pane of the Simulator

50

4.2.6

Approaches to Experimentation

The game-theoretic agent-based model described in this research provides the possibility to study through simulation experiments agents’ strategic behaviour and measure
their success in terms of individual and group performance. We continue to examine these measures of success by considering the impact of certain agents’ abilities
on the development of cooperation, and individual and group measures of success as
described in our ABC model. We also consider the comparable effect of information
gathering among agents with abilities in the defined EABC model.

51

Chapter 5

Experimental Results and
Evaluation

This chapter presents results and evaluations of different simulation experiments conducted using the Enhanced-ABC model defined in Section 4.1. Our evaluation of
agents’ individual and group performance profiles highlights the potential impact of
agents’ abilities and information gathering on individual and collective group performance.
Section 5.1 provides the definitions and parameter settings for different composition
of agent groups, game model details, and performance measures. In Section 5.2, we
continue to discuss our observations and analysis of different simulation experiments,
and then further our discussion with inter-group comparisons in Section 5.3. The
chapter concludes with an overall analysis of our observations in Section 5.4.

52

5.1

Definitions and Parameter Settings

5.1.1

Composition of Agent Group

We explore 6 different compositions of agent groups representing the number of different simulation experiments conducted in this study. Each group structure consists
of n number of agents determined as a product of 6 different levels of abilities (12,
10, 8, 6, 4, 2) and a varying number of agent strategies. For example, a composition
of 6 ability levels and 2 agent strategies will result in a total of 12 agents. We define
the different kinds of agent strategies below and Table 5.1 displays the compositions
of agent groups;

• Naive Cooperator: also known as NaiveC; always cooperates with its partner.
• Naive Defector : also called NaiveD in our experiments; defects in all situations against its partner.
• Selective Cooperator: has the capacity to determine if the partner is a defector; the Selective Cooperator also called SelectiveC always defects against
Defector, but always cooperates otherwise.
• LimitedC-Fixed: also referred to as LimitedC-F can determine the partner’s
ability; always cooperates if partner’s ability is equal to or above a fixed threshold
of 8, otherwise always defects.
• LimitedC-Ratio: shares some characteristics with the LimitedC-F strategy
and can determine the partner’s ability. It always cooperates if aj R/ai P ≥ 1;
otherwise, it defects. This strategy is referred to as LimitedC-R in our simulation
experiments.
53

Group 1

Group 2

Group 3

Group 4

NaiveC
NaiveD

NaiveC
NaiveD
SelectiveC

NaiveC
NaiveD
LimitedC-F

NaiveC
NaiveD
LimitedC-R

Group 5
NaiveC
NaiveD
SelectiveC
LimitedC-F

Group 6
NaiveC
NaiveD
SelectiveC
LimitedC-R

Table 5.1: Compositions of Agent Groups

5.1.2

Game Model

Following the definitions detailed in Section 4.1.3 we adopt the base Prisoner’s Dilemma
(PD) game in its symmetric form with payoff values T = 10, R = 8, P = 2, S = 0
to discuss the Ability-Based Cooperation (ABC). Unlike the classical PD game, the
ABC game has the payoff values for a participating player i against a partner j as
Tij = aj R + ai P , Rij = aj R, Pij = ai P , Sij = 0 . The EABC includes the following
parameters;

• Agent’s score: is the score agent i receives after an encounter with agent j. This
(j)

(j)

is defined as si = αai + (1 − α)pi

(j)

where pi

is the payoff of agent i vs. j.

• Balancing factor : is a constant parameter α ∈ [0, 1] that regulates the relative
impact of the two components of the score, which are the ability component and
the payoff. In all simulation experiments, α was set to 0.9.
• Round-Robin tournament structure: provides a schedule of pairwise encounters
that give equal advantage to every agent to meet all other partners in the game.

5.1.3

Performance Measures

In this study, we explore two measures of success; individual success and group success.
These are discussed separately below.
54

Measures Individual of Success

Let the multiagent system consist of n = k ∗ q agents, where k is the number of
strategies and q is the number of ability levels. Each agent i has the ability ai . Each
of the strategies is played by exactly one agent of each ability level.
For an agent i that has an ability ai we explore the following measures of individual
success:

• Total score: An agent i′ s total score for the tournament is calculated as stot
i =
n
∑ (j)
si
j=1

j̸=i
stot

i
• Average score: We define the average score of an agent i as savg
= n−1
i

• Adjusted total score: An agent i with ability ai has adjusted total score which is
calculated as the total score from all games with partners whose abilities differ
n
∑
(j)
si
from ai . Formally, satot
=
i
j=1

aj ̸=ai

• Adjusted average score: This is an approximate performance measure which
allows direct comparisons of the individual performance of an agent with given
ability and strategy across groups of different sizes. It is the adjusted total score
of an agent i divided by the number of games against partners whose abilities
satot

i
differ from ai . Thus, saavg
= k(q−1)
.
i

Measures Group of Success

For each group composed of agent strategies, we also examine the following measures
group of success:

55

• Failure threshold : refers to the experimenter-set limit representing the minimum
acceptable level of individual performance. It is uniformly applied in all group
compositions in our study. In each experiment, we use the failure threshold to
determine the failure percentage.
• Failure percentage: is the percentage of agents in a group that falls below the
failure threshold. As a measure of group success, we compare the failure percentage between groups. A lower group failure percentage indicates a higher
group success and vice versa.
• Group total score: calculates the sum of the total scores for all agents in a
specific group structure. This measure of success provides a better approach
to compare group performance between compositions with an equal number of
agents, encounters, and strategies. It provides a better result to compare the
influence of two or more different strategies in similar group structures.
• Group average score: is the sum of total scores for all agents in the group
structure divided by the number of agents. Used in similar conditions as the
group total score to compare the overall average performance among groups.
• Group adjusted total score: calculates the sum of the adjusted total scores for
all agents in a specific group structure. It only serves as an auxiliary measure
to derive group adjusted average score since it cannot be used as a performance
measure to compare group compositions of different size.
• Group adjusted average score: refers to the sum of the total adjusted scores for
all agents in the group structure divided by the number of agents. This measure
shows the overall average success of the group members in each composition.

56

5.2

Observation and Analysis of Results

We present the simulation results and analyze our observations for different compositions of group structures as follows.

5.2.1

Group Composition 1
20
NAIVE C
NAIVE D

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure 5.1: Agents’ performance for different ability levels in group composition 1

57

Individual Success
• Increasing the abilities of agents also increases their individual performance.
Thus, agents with higher abilities, in general, perform better than those with
lower abilities across the same strategy.
• Individually, Naive Cooperators at every ability level perform worse than Naive
Defectors. This follows from the easily verified fact that defection is the dominant strategy in the ABC game (regardless of whether the game happens to be
an asymmetric PD).
• The gap between Naive Defectors and Naive Cooperators increases as abilities
grow. This is because i′ s defection reward ai P increases with i′ s own ability,
while i′ s cooperation reward aj R is independent of i′ s own ability.

Group Success
• At a general failing threshold of 5, we observe that none of the 12 agents fall
below the failing threshold. This is because lower ability Naive Defectors gain
more from exploiting Naive Cooperators while lower ability Naive Cooperators
gain from interaction reward with higher ability cooperators.
• The group adjusted average score for this composition is 98 representing a very
low group performance.
• Naive Cooperators do not fare well in this group because interaction reward for
cooperators is almost independent of player’s own ability.

58

5.2.2

Group Composition 2
20
NAIVE C
NAIVE D
SELECTIVEC

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure 5.2: Agents’ performance for different ability levels in group composition 2

Individual Success
• Naive Defectors with lower abilities (≤ 6) perform worse than all other agents
at the same levels. This is because Naive Defectors in this group structure face
defections from Selective Cooperators at all ability levels.
• The presence of Selective Cooperators increases the individual performance of
Naive Cooperators by doubling the number of their cooperating partners and
decreases the individual performance of Naive Defectors by doubling the number
of their defecting partners.
• We observe an increasing gap between Selective Cooperators and Naive Coop59

erators because each Selective Cooperator defects against 1/3 of the population
with interaction gain proportional to its own ability.

Group Success
• The impact of Selective Cooperators reduces the overall group failure percentage.
• The group adjusted average score which is similar to social welfare in game
theory and represents the overall success of group members for this composition
of agents is 150.5.
• The increased group success is because Selective Cooperators cooperates with
2/3 of population irrespective of their ability levels.
• The most supportive environment for individual Naive Cooperators is when
there are Selective Cooperators.

60

5.2.3

Group Composition 3
20
NAIVE C
NAIVE D
LIMITEDC-F

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure 5.3: Agents’ performance for different ability levels in group composition 3

Individual Success
• Higher ability Naive Defectors perform better in this group than group 1 and 2
due to gains from interaction with both Naive Cooperators and LimitedC-Fixed
strategies at all levels.
• LimitedC-Fixed strategy reduces interaction gains of all lower ability agents and
causes them to record low individual performance.
• Naive Cooperators with abilities ≥ 8 gain from twice the number of cooperators
in this group than in group 1.

61

• LimitedC-Fixed doesn’t seem to be the best strategy in this composition because of its approach to cooperate or defect based on a fixed ability threshold.
However, it is also not the worst strategy in this group structure.

Group Success
• Failure percentage is higher in this group than group 1 and 2 because LimitedCFixed strategy defects against 1/2 of the population based on abilities.
• The group adjusted average score representing overall success of group members
in this composition is 146.1. This is slightly below the group performance of
composition 2 but higher than composition 1.
• The presence of LimitedC-Fixed strategy significantly reduces the contribution
of low ability agents toward group success.

62

5.2.4

Group Composition 4
20
NAIVE C
NAIVE D
LIMITEDC-R

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure 5.4: Agents’ performance for different ability levels in group composition 4

Individual Success
• At a = 2, the Naive Defector performs slightly better due to the benefit of
individual progress from non-interaction and cooperative behaviour from other
non-defectors with abilities ≤ 8.
• We observe a slight increasing gap between LimitedC-Ratio agents and Naive
Cooperators because LimitedC-Ratio agents with ability (a ≥ 10) lack the incentive to cooperate with lower ability agents when they can benefit from both
partners’ cooperation and individual reward for non-engagement.

63

• Unlike LimitedC-Fixed agents in group 3, LimitedC-Ratio agents improve group
performance but record low individual scores.

Group Success
• Compared to group 2 the individually rational approach of LimitedC-Ratio
strategy toward cooperation and defection significantly reduces the overall group
failure percentage.
• The group adjusted average score for this composition is 155.3. This is higher
than group 3 because the individually rational approach of LimitedC-Ratio
strategy increases overall group performance.
• Compared to group 3, this group records a higher contribution from low ability
agents toward group success due to the significant impact of LimitedC-Ratio
strategy.

64

5.2.5

Group Composition 5
20
NAIVE C
NAIVE D
SELECTEDC
LIMITEDC-F

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents Abilities

Figure 5.5: Agents’ performance for different ability levels in group composition 5

Individual Success
• Higher ability LimitedC-Fixed strategy is the best in this group because of gains
from cooperating with high ability cooperators and defecting gains against low
ability agents.
• The lowest ability Naive Defector records the overall lowest score in this study
due to defections from 3/4 of the group population.
• We observe an increasing gap between Selective Cooperators and Naive Cooperators because:
– Selective Cooperators’ interaction reward for defection increases with players’ own abilities.
65

– Naive Cooperators’ benefit with Selective Cooperators and Naive Cooperators is almost independent of own abilities.
• Naive Defectors may not benefit significantly from interaction with Selective
Cooperators and LimitedC-Fixed strategies, higher abilities defectors perform
well due to higher individual activity.

Group Success
• A low ability Naive Defector that faces mutual non-engagement against 75% of
the group population falls below the failure threshold.
• The group adjusted average score is 200.35. This is basically because Selective
Cooperators promote cooperation among Native Cooperators and LimitedCFixed strategy supports all agents with higher abilities.
• Only agents with higher level abilities contribute significantly to group success
with very minimal support from lower ability agents.

66

5.2.6

Group Composition 6
20
NAIVE C
NAIVE D
SELECTEDC
LIMITEDC-R

Adjusted Average Score

15

10

5

0

0

2

4

6

8

10

12

14

Agents Abilities

Figure 5.6: Agents’ performance for different ability levels in group composition 6

Individual Success
• Individual performance of LimitedC-Ratio strategy with higher abilities are
lower than LimitedC-Fixed strategies in group 5 because:
– LimitedC-Ratio agents only defect when reward for mutual cooperation is
lower than mutual defection.
– LimitedC-Fixed agents defects against all agents with abilities ≤ 6.
• LimitedC-Ratio strategy does not dominate in individual performance but significantly increases group performance.
• Naive Cooperators perform better in this group than all other groups.

67

Group Success
• We observe an equal failure percentage as group 5. However, the performance
of the agent below the failure threshold in this group is better than group 5.
• Consistent with our expectation, the group adjusted average score for this composition is 209.55 representing the highest in our study so far. This is because
Selective Cooperators promote cooperation with all agents except Naive Defectors and LimitedC-Ratio agents use an ability-based individually rational
technique to promote group success.
• The individual rational approach of LimitedC-Ratio strategy may reduce their
own individual performance but significantly increase group performance.

5.3

Inter-group Comparisons

We now compare the performance of each of the six group compositions examined in
this study. The purpose of inter-group comparison is to determine and analyze the
overall nature of the different group compositions and also compare the performance
levels of two or more groups.

5.3.1

Failure Percentage

For all group compositions, we measure the percentage of agents that scored an adjusted average value below the failure threshold. We assume a failure threshold of 5
and analyze the failure percentage for all 6 group structures as shown in Figure 5.7.

68

We observe the following:

• The more generalized approach by LimitedC-Fixed strategy to defect against
all agents with abilities (a < 8) reduces the performance of lower ability agents
and therefore increases the failure percentage to 16.67%.
• Selective Cooperators’ collective rationality to support cooperation reduces the
failure percentage in group 2. This is lower than the higher failure percentage
observed in group 3.
• Both groups 5 and 6 record only 1 agent (low ability Naive Defector) below the
failure threshold because of the presence of advanced strategies. However, with
regards to individual performance, group 6 is safer for the low ability Naive
Defector because it performs better than in group 5.
18

16

14

Percentage Failure

12

10

8

6

4

2

0

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Figure 5.7: Failure Percentage distribution for different group composition

69

5.3.2

Group Total Score

We compare the group total score between agents in group composition 3 and 5 that
has LimitedC-Fixed strategy with group 4 and 6 that has LimitedC-Ratio strategy
due to shared similarities in the number of agents and other kinds of strategies. Our
observation as shown in Figure 5.8 indicates that the overall group total score in
group 4 (3173.4) is slightly higher than group 3 (2993.4). Similarly, a comparison
between group 5 and 6 as shown in Figure 5.9 indicates that group structure 6 reports
a group total score of (5785.2) whilst group 5 only had (5542.8). The higher group
total score in composition 4 and 6 comparable to 3 and 5 respectively is because the
LimitedC-Ratio strategy which is implemented in group 4 and 6 cooperates with some
lower-ability defectors against which LimitedC-Fixed strategy defects.

Figure 5.8: Group Total Score: 3 vs. 4

Figure 5.9: Group Total Score: 5 vs. 6

70

5.3.3

Group Average Score

The group average score provide an exact method to compare group success between
group 3 vs. 4 and 5 vs. 6. We observe a group average score of 176.3 and 241.05 in
group 4 and 6 respectively. This is slightly higher than the comparable average group
score of 166.3 and 230.95 reported in group 3 and 5 respectively. Figures 5.10 and
5.11 display our results.

Figure 5.10: Group Avg Score: 3 vs. 4

Figure 5.11: Group Avg Score: 5 vs. 6

71

5.3.4

Group Adjusted Average Scores
250

Group Adjusted Average

200

150

100

50

0

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Figure 5.12: Group adjusted average scores
Across all 6 group structures, we measure and compare the performance of each
group based on their group adjusted average scores. We observe that;

• The combination of Selective Cooperators and LimitedC-Ratio strategies in
group 6 promotes cooperation and results in the highest group adjusted average score of 209.55. This is higher than the adjusted average score observed
in group 5 (200.35).
• The individually rational approach of LimitedC-Ratio strategy gives group 4 a
higher adjusted average score than group 3.
• The adjusted average score of group 2 (150.5) is higher than group 3 (146.1) but
slightly lower than group 4 (155.3).
• The lowest group adjusted average score (98) was recorded in group 1 because it
consist of only naive agents that make decisions without any form of information
gathering mechanism.
72

5.4

Overall Observation and Analysis

• In all group compositions, we observe that increasing ability levels positively
affect the individual scores of agents due to high impact of individual activity.
• In all instances, we have observed that the presence of Selective Cooperators
positively favour the performance of Naive Cooperators but reduce the individual performance of Naive Defectors.
• The use of acquired information assist advanced agents to increase their individual performance and also support the development of cooperation. This was
observed in group compositions 2, 3, 4, 5 and 6.
• A generalized approach to cooperation and defection based on ability levels
increases group failure percentage. Group success increases when agents that
cooperate or defect based on abilities decide individually.
• Naively cooperating or defecting does not promote group success even though
it may support the performance of some individuals.
• A comparison of the group adjusted average performance between group 2 and
group 4 leads us to conclude that group success is not only about naively helping
cooperators but more about using an advanced approach to cooperate by also
considering ability levels.
• Furthermore, we can infer from the adjusted average score of group composition
6 that an appropriate proportion of strategies that cooperate based on gathered
information on partner’s strategy and ability levels increases the overall group
success.

73

Chapter 6

Conclusions and Future Work

In this thesis, we have introduced, implemented, and investigated a novel modelling
and simulation framework for the study of cooperation among self-interested players
in multiagent systems (MAS). As the basis for our game-theoretic modelling, we adopt
the Ability Based Cooperation (ABC) game and the Extended ABC (EABC) game
defined within our research group at UNBC. The framework allows for the modelling
of several common aspects of natural and artificial systems that, to the best of our
knowledge, have not been adequately addressed in the existing literature.

The specific contributions of the framework include:

• Balancing individual activity and interaction. A player in our study can make
progress through individual activity or by interaction with others.
• The abilities of players are included in the modelling. The scores from both
individual activities and interactions depend on the individual abilities of the
74

players. This means that the payoff a player receives from an encounter is based
on the ability level of the player and/or its partner and the base payoff.
• We define several advanced agent strategies that use knowledge about partners’
abilities or strategies in deciding whether or not to cooperate. Our interest is
in the strategic effect of acquired information and not in the mechanism for
information acquisition.
• There is a balance between individual and group success. The players’ abilities,
strategies, and the group composition influence both individual scores and the
success of the group as a whole. Our framework provides the possibility to measure both kinds of success. Individual strategies can be designed with the intent
to achieve specific impact with respect to individual and group performance.
• We evaluate individual strategies as to how much they contribute to the individual success of the players and the group success. This is done to observe the
influence of information gathering on individual achievement and overall group
performance.

We have demonstrated the properties of this framework with a comparative experimental study of six group compositions, each with a different strategy combination
and a full range of individual abilities. For each strategy, the study allows us to analyze its performance in different group compositions, its impact upon other strategies,
and its impact upon group performance.

The results lead us to conclude that the framework is a suitable tool for such studies,
and expect that it could be successfully applied to more complex strategies and group
compositions.

75

In future studies, we can:

• explore higher levels of modelling where different mechanisms of information
gathering can be examined and their impact on individual and group success
analyzed;
• introduce and observe increasingly complex groups of agent strategies such as
an advanced strategy that seeks to promote ability development among group
members;
• investigate the influence of other experimenter-controlled limitations such as
pay-off matrix on individual and group performance.

76

Bibliography

Michael Wooldridge. An Introduction to Multiagent Systems. John Wiley & Sons,
June 2009.
Robert M. Axelrod. The Evolution of Cooperation. Basic Books Inc., New York,
December 2006.
Maurice Grinberg, Evgenia Hristova, and Emilian Lalev. Models for cooperative decisions in prisoner’s dilemma. In S. Nefti and J. Gray, editors, Advances in Cognitive
Systems, chapter 7, pages 169–210. Institute of Engineering and Technology: London, 2010.
Tobias Kretz. A round-robin tournament of the iterated prisoner’s dilemma with
complete memory-size-three strategies. Complex Systems, 19(4):363–389, January
2011.
Satoru Takahashi. Community enforcement when players observe partners’ past play.
Journal of Economic Theory, 145(1):42–62, January 2010.
Jernej Polajnar and Desa Polajnar. An asymmetric two-player cooperation game
based on player abilities. Modelling suggestions for Jonathan Asante Nyantakyi,
unpublished short paper, UNBC, June 2018.

77

Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, GameTheoretic, and Logical Foundations. Cambridge University Press, 2009.
Jacques Ferber and Gerhard Weiss. Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence, volume 1. Addison-Wesley Reading, 1999.
Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.
Prentice Hall, third edition, 2010.
Michael Wooldridge and Nicholas R. Jennings. The cooperative problem-solving process. Journal of Logic and Computation, 9(4):563–592, 1999.
Simon Parsons and Michael Wooldridge.

Game theory and decision theory in

multi-agent systems. Autonomous Agents and Multi-Agent Systems, 5(3):243–254,
September 2002.
Robert Axelrod and William D. Hamilton. The evolution of cooperation. Science
Journal, 211(27):1390–1396, March 1981.
Daniel Friedman. On economic applications of evolutionary game theory. Journal of
Evolutionary Economics, 8(1):15–43, 1998.
Jörgen W. Weibull. Evolutionary Game Theory. MIT press, 1997.
John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behaviour. Princeton University Press, 1944.
Robert J. Aumann. Game theory. In John Eatwell, Murray Milgate, and Peter
Newman, editors, The New Palgrave Game Theory, volume 4 of The New Palgrave,
chapter 1, pages 1–53. Palgrave Macmillan UK, 1 edition, 1989.
Guillermo Owen. Applications of game theory to economics. International Game
Theory Review, 15(03):8, September 2013.
78

Zach Ernst. Game theory in evolutionary biology. Philosophy after Darwin: Classic
and Contemporary Readings, pages 464–76, 2009.
Parag C. Pendharkar. Game theoretical applications for multi-agent systems. Expert
Systems with Applications, 39(1):273–279, January 2012.
Martin Beckenkamp, Heike Hennig-Schmidt, and Frank P. Maier-Rigaud. Cooperation
in symmetric and asymmetric prisoner’s dilemma games. Max Planck Institute for
Research on Collective Goods, 25:40, March 2007.
Yuval Heller and Erik Mohlin. Observations on cooperation. Social Science Research
Network, pages 1–62, April 2016.
S. Chong, J. Humble, G. Kendall, J. Li, and X. Yao. Iterated prisoner’s dilemma and
evolutionary game theory. In Graham Kendall, Xin Yao, and Yew Chong, Siang,
editors, The Iterated Prisoners Dilemma: 20 Years On, volume 20 of Advances in
Neural Computations, chapter 2, pages 23–62. Singapore: World Scientific, 2007.
M. P. Kim, W. Suksompong, and V. V. Williams. Who can win a single-elimination
tournament? Society for Industrial and Applied Mathematics Journal on Discrete
Mathematics, 31(3):1751–1764, 2017.
W. A. Glenn. A comparison of the effectiveness of tournaments. Biometrika, 47(3/4):
253–262, 1960.
Rasmus V. Rasmussen and Michael A Trick. Round robin scheduling–a survey. European Journal of Operational Research, 188(3):617–636, June 2008.
F. C. Santos, M. D. Santos, and J. M. Pacheco. Social diversity promotes the emergence of cooperation in public goods games. Nature, 454(7201):213, 2008.

79

M. O. Souza, J. M. Pacheco, and F. C. Santos. Evolution of cooperation under nperson snowdrift games. Journal of Theoretical Biology, 260(4):581–588, 2009.
Garret Ridinger and Michael McBride. Theory of mind ability and cooperation in the
prisoner’s dilemma. Technical report, Working paper, 2016.
Toh-Kyeong Ahn, Myungsuk Lee, Lore Ruttan, and James Walker. Asymmetric
payoffs in simultaneous and sequential prisoner’s dilemma games. Public Choice,
132(3-4):353–366, 2007.
J. P. Sheposh and Philip S. Gallo Jr. Asymmetry of payoff structure and cooperative
behaviour in the prisoner’s dilemma game. Journal of Conflict Resolution, 17(2):
321–333, 1973.
Keith J. Murnighan. Cooperating when you know your outcomes will differ. Simulation & Gaming, 22(4):463–475, 1991.
Ian Sommerville. Software Engineering. International Computer Science. Pearson,
ninth edition, 2011. ISBN 0321210263.

80

Appendix A

A.1

Graphs and Tables of Results

A.1.1

Group Composition 1

Parameter Settings
• Agent strategies (k = 2 ): NaiveC = 6, NaiveD = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 12
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9

81

• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
.
– Adjusted average score: saavg
= k(q−1)
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1
tot

– Group average score: savg = s n

– Group adjusted total score: satot =

n
∑
satot
i
1

– Group adjusted average score: s

aavg

Experimental Results

82

atot

= sn

20
NAIVE C
NAIVE D

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure A.1: Agents’ performance for different ability levels in composition group 1

ID

Strategy

1
2
3
4
5
6
7
8
9
10
11
12

NaiveC
NaiveD
NaiveC
NaiveD
NaiveC
NaiveD
NaiveC
NaiveD
NaiveC
NaiveD
NaiveC
NaiveD

Ability

Total
Score

Average
Scores

12
12
10
10
8
8
6
6
4
4
2
2

142.80
178.80
124.60
154.60
106.40
130.40
88.20
106.20
70.00
82.00
51.80
57.80

12.98
16.25
11.33
14.05
9.67
11.85
8.02
9.65
6.36
7.45
4.71
5.25

Adjusted
Total
Scores
132.00
156.00
115.60
135.60
99.20
115.20
82.80
94.80
66.40
74.40
50.00
54.00

Adjusted
Average
Scores
13.20
15.60
11.56
13.56
9.92
11.52
8.28
9.48
6.64
7.44
5.00
5.40

Table A.1: Agents’ average and adjusted average scores for Group Composition 1

83

1
NC
1
2
3
4
5
6
7
8
9
10
11
12

NC
ND
NC
ND
NC
ND
NC
ND
NC
ND
NC
ND

22.80
18.60
20.60
16.80
18.40
15.00
16.20
13.20
14.00
11.40
11.80

2
ND
10.80
9.00
11.00
7.20
8.80
5.40
6.60
3.60
4.40
1.80
2.20

3
NC
18.80
21.20
19.00
15.20
16.80
13.40
14.60
11.60
12.40
9.80
10.20

4
ND
10.80
13.20
9.00
7.20
8.80
5.40
6.60
3.60
4.40
1.80
2.20

5
NC
17.20
19.60
15.40
17.40
15.20
11.80
13.00
10.00
10.80
8.20
8.60

6
ND
10.80
13.20
9.00
11.00
7.20
5.40
6.60
3.60
4.40
1.80
2.20

7
NC
15.60
18.00
13.80
15.80
12.00
13.60
11.40
8.40
9.20
6.60
7.00

8
ND
10.80
13.20
9.00
11.00
7.20
8.80
5.40
3.60
4.40
1.80
2.20

9
NC
14.00
16.40
12.20
14.20
10.40
12.00
8.60
9.80
7.60
5.00
5.40

84

Table A.2: Agents’ scores Group Composition 1

10
ND
10.80
13.20
9.00
11.00
7.20
8.80
5.40
6.60
3.60
1.80
2.20

11
NC
12.40
14.80
10.60
12.60
8.80
10.40
7.00
8.20
5.20
6.00
3.80

12
ND
10.80
13.20
9.00
11.00
7.20
8.80
5.40
6.60
3.60
4.40
1.80

A.1.2

Group Composition 2

Parameter Settings
• Agent strategies (k = 3 ): NaiveC = 6, NaiveD = 6, SelectiveC = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 18
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9
• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
– Adjusted average score: saavg
= k(q−1)
.
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1
tot

– Group average score: savg = s n

85

– Group adjusted total score: satot =

n
∑
satot
i
1
atot

– Group adjusted average score: saavg = s n

Experimental Results
20
NAIVE C
NAIVE D
SELECTIVEC

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure A.2: Agents’ performance for different ability levels in composition group 2

86

ID

Strategy

Ability

Total
Score

Average
Scores

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NaiveC
NaiveD
SelectiveC
NaiveC
NaiveD
SelectiveC
NaiveC
NaiveD
SelectiveC
NaiveC
NaiveD
SelectiveC
NaiveC
NaiveD
SelectiveC
NaiveC
NaiveD
SelectiveC

12
12
12
10
10
10
8
8
8
6
6
6
4
4
4
2
2
2

241.20
258.00
255.60
212.20
220.60
224.20
183.20
183.20
192.80
154.20
145.80
161.40
125.20
108.40
130.00
96.20
71.00
98.60

14.19
15.18
15.04
12.48
12.98
13.19
10.78
10.78
11.34
9.07
8.58
9.49
7.36
6.38
7.65
5.66
4.18
5.80

Total
Adjusted
Scores
210.00
222.00
222.00
186.20
190.60
196.20
162.40
159.20
170.40
138.60
127.80
144.60
114.80
96.40
118.80
91.00
65.00
93.00

Average
Adjusted
Scores
14.00
14.80
14.80
12.41
12.71
13.08
10.83
10.61
11.36
9.24
8.52
9.64
7.65
6.43
7.92
6.07
4.33
6.20

Table A.3: Agents’ average and adjusted average scores for Group Composition 2

87

1
NC

88

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NC
ND
SC
NC
ND
SC
NC
ND
SC
NC
ND
SC
NC
ND
SC
NC
ND
SC

22.80
20.40
18.60
20.60
18.60
16.80
18.40
16.80
15.00
16.20
15.00
13.20
14.00
13.20
11.40
11.80
11.40

2
ND
10.80
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
4.40
1.80
2.20
2.20

3
SC
20.40
13.20
18.60
11.00
18.60
16.80
8.80
16.80
15.00
6.60
15.00
13.20
4.40
13.20
11.40
2.20
11.40

4
NC
18.80
21.20
18.80
19.00
17.00
15.20
16.80
15.20
13.40
14.60
13.40
11.60
12.40
11.60
9.80
10.20
9.80

5
ND
10.80
13.20
13.20
9.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
4.40
1.80
2.20
2.20

6
SC
18.80
13.20
18.80
17.00
11.00
15.20
8.80
15.20
13.40
6.60
13.40
11.60
4.40
11.60
9.80
2.20
9.80

7
NC
17.20
19.60
17.20
15.40
17.40
15.40
15.20
13.60
11.80
13.00
11.80
10.00
10.80
10.00
8.20
8.60
8.20

8
ND
10.80
13.20
13.20
9.00
11.00
11.00
7.20
8.80
5.40
6.60
6.60
3.60
4.40
4.40
1.80
2.20
2.20

9
SC
17.20
13.20
17.20
15.40
11.00
15.40
13.60
8.80
11.80
6.60
11.80
10.00
4.40
10.00
8.20
2.20
8.20

10
NC
15.60
18.00
15.60
13.80
15.80
13.80
12.00
13.60
12.00

11
ND
10.80
13.20
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40

11.40
10.20
8.40
9.20
8.40
6.60
7.00
6.60

Table A.4: Agents’ scores Group Composition 2

6.60
3.60
4.40
4.40
1.80
2.20
2.20

12
SC
15.60
13.20
15.60
13.80
11.00
13.80
12.00
8.80
12.00
10.20
6.60
8.40
4.40
8.40
6.60
2.20
6.60

13
NC
14.00
16.40
14.00
12.20
14.20
12.20
10.40
12.00
10.40
8.60
9.80
8.60
7.60
6.80
5.00
5.40
5.00

14
ND
10.80
13.20
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
1.80
2.20
2.20

15
SC
14.00
13.20
14.00
12.20
11.00
12.20
10.40
8.80
10.40
8.60
6.60
8.60
6.80
4.40
5.00
2.20
5.00

16
NC
12.40
14.80
12.40
10.60
12.60
10.60
8.80
10.40
8.80
7.00
8.20
7.00
5.20
6.00
5.20
3.80
3.40

17
ND
10.80
13.20
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
4.40
1.80
2.20

18
SC
12.40
13.20
12.40
10.60
11.00
10.60
8.80
8.80
8.80
7.00
6.60
7.00
5.20
4.40
5.20
3.40
2.20

A.1.3

Group Composition 3

Parameter Settings
• Agent strategies (k = 3 ): NaiveC = 6, NaiveD = 6, LimitedC-Fixed = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 18
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9
• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
– Adjusted average score: saavg
= k(q−1)
.
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1
tot

– Group average score: savg = s n

89

– Group adjusted total score: satot =

n
∑
satot
i
1
atot

– Group adjusted average score: saavg = s n

Experimental Results
20
NAIVE C
NAIVE D
LIMITEDC-F

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure A.3: Agents’ performance for different ability levels in group composition 3

90

ID

Strategy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NaiveC
NaiveD
LC-F
NaiveC
NaiveD
LC-F
NaiveC
NaiveD
LC-F
NaiveC
NaiveD
LC-F
NaiveC
NaiveD
LC-F
NaiveC
NaiveD
LC-F

Ability

Total
Score

Average
Scores

12
12
12
10
10
10
8
8
8
6
6
6
4
4
4
2
2
2

241.20
291.60
262.80
212.20
254.20
230.20
183.20
216.80
197.60
120.60
145.80
135.00
91.60
108.40
101.20
62.60
71.00
67.40

14.19
17.15
15.46
12.48
14.95
13.54
10.78
12.75
11.62
7.09
8.58
7.94
5.39
6.38
5.95
3.68
4.18
3.96

Total
Adjusted
Scores
210.00
246.00
231.60
186.20
216.20
204.20
162.40
186.40
176.80
109.80
127.80
117.00
84.40
96.40
89.20
59.00
65.00
61.40

Average
Adjusted
Scores
14.00
16.40
15.44
12.41
14.41
13.61
10.83
12.43
11.79
7.32
8.52
7.80
5.63
6.43
5.95
3.93
4.33
4.09

Table A.5: Agents’ average and adjusted average scores for Group Composition 3

91

1
NC

92

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NC
ND
LC-F
NC
ND
LC-F
NC
ND
LC-F
NC
ND
LC-F
NC
ND
LC-F
NC
ND
LC-F

22.8
20.40
18.60
20.60
18.60
16.80
18.40
16.80
15.00
16.20
15.00
13.20
14.00
13.20
11.40
11.80
11.40

2
ND
10.80
10.80
9.00
11.00
9.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

3
LC-F
20.40
22.8
18.60
20.60
18.60
16.80
18.40
16.80
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

4
NC
18.80
21.2
18.80
19.00
17.00
15.20
16.80
15.20
13.40
14.60
13.40
11.60
12.40
11.60
9.80
10.20
9.80

5
ND
10.80
13.2
10.80
9.00
9.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

6
LC-F
18.80
21.2
18.80
17.00
19.00
15.20
16.80
15.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

7
NC
17.20
19.6
17.20
15.40
17.40
15.40
15.20
13.60
11.80
13.00
11.80
10.00
10.80
10.00
8.20
8.60
8.20

8
ND
10.80
13.2
10.80
9.00
11.00
9.00
7.20
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

9
LC-F
17.20
19.6
17.20
15.40
17.40
15.40
13.60
15.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

10
NC
15.60
18
18.00
13.80
15.80
15.80
12.00
13.60
13.60
11.40
11.40
8.40
9.20
9.20
6.60
7.00
7.00

11
ND
10.80
13.2
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40

12
LC-F
15.60
18
18.00
13.80
15.80
15.80
12.00
13.60
13.60
5.40
6.60

6.60
3.60
4.40
4.40
1.80
2.20
2.20

Table A.6: Agents’ match scores for Group Composition 3

3.60
4.40
4.40
1.80
2.20
2.20

13
NC
14.00
16.4
16.40
12.20
14.20
14.20
10.40
12.00
12.00
8.60
9.80
9.80
7.60
7.60
5.00
5.40
5.40

14
ND
10.80
13.2
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
1.80
2.20
2.20

15
LC-F
14.00
16.4
16.40
12.20
14.20
14.20
10.40
12.00
12.00
5.40
6.60
6.60
3.60
4.40
1.80
2.20
2.20

16
NC
12.40
14.8
14.80
10.60
12.60
12.60
8.80
10.40
10.40
7.00
8.20
8.20
5.20
6.00
6.00
3.80
3.80

17
ND
10.8
13.2
13.20
9.00
11.00
11.00
7.20
8.80
8.80
5.40
6.60
6.60
3.60
4.40
4.40
1.80
2.20

18
LC-F
12.40
14.8
14.80
10.60
12.60
12.60
8.80
10.40
10.40
5.4 0
6.60
6.60
3.60
4.40
4.40
1.80
2.20

A.1.4

Group Composition 4

Parameter Settings
• Agent strategies (k = 3 ): NaiveC = 6, NaiveD = 6, LimitedC-Ratio = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 18
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9
• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
– Adjusted average score: saavg
= k(q−1)
.
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1
tot

– Group average score: savg = s n

93

– Group adjusted total score: satot =

n
∑
satot
i
1
atot

– Group adjusted average score: saavg = s n

Experimental Results
20
NAIVE C
NAIVE D
LIMITEDC-R

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents' Abilities

Figure A.4: Agents’ performance for different ability levels in group composition 4

94

ID

Strategy

Ability

Total
Score

Average
Scores

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NaiveC
NaiveD
LC-R
NaiveC
NaiveD
LC-R
NaiveC
NaiveD
LC-R
NaiveC
NaiveD
LC-R
NaiveC
NaiveD
LC-R
NaiveC
NaiveD
LC-R

12
12
12
10
10
10
8
8
8
6
6
6
4
4
4
2
2
2

241.20
291.60
248.40
212.20
254.20
218.20
183.20
216.80
183.20
154.20
179.40
154.20
125.20
142.00
125.20
78.60
87.00
78.60

14.19
17.15
14.61
12.48
14.95
12.84
10.78
12.75
10.78
9.07
10.55
9.07
7.36
8.35
7.36
4.62
5.12
4.62

Total
Adjusted
Scores
210.00
246.00
217.20
186.20
216.20
192.20
162.40
186.40
162.40
138.60
156.60
138.60
114.80
126.80
114.80
73.40
79.40
73.40

Average
Adjusted
Scores
14.00
16.40
14.48
12.41
14.41
12.81
10.83
12.43
10.83
9.24
10.44
9.24
7.65
8.45
7.65
4.89
5.29
4.89

Table A.7: Agents’ average and adjusted average scores for Group Composition 4

95

1
NC

96

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NC
ND
LC-R
NC
ND
LC-R
NC
ND
LC-R
NC
ND
LC-R
NC
ND
LC-R
NC
ND
LC-R

22.80
20.40
18.60
20.60
18.60
16.80
18.40
16.80
15.00
16.20
15.00
13.20
14.00
13.20
11.40
11.80
11.40

2
ND
10.80
10.80
9.00
11.00
9.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

3
LC-R
20.40
22.80
18.60
20.60
18.60
16.80
18.40
16.80
15.00
16.20
15.00
13.20
14.00
13.20
1.80
2.20
1.80

4
NC
18.80
21.20
18.80
19.00
17.00
15.20
16.80
15.20
13.40
14.60
13.40
11.60
12.40
11.60
9.80
10.20
9.80

5
ND
10.80
13.20
10.80
9.00
9.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

6
LC-R
18.80
21.20
18.80
17.00
19.00
15.20
16.80
15.20
13.40
14.60
13.40
11.60
12.40
11.60
1.80
2.20
1.80

7
NC
17.20
19.60
17.20
15.40
17.40
15.40
15.20
13.60
11.80
13.00
11.80
10.00
10.80
10.00
8.20
8.60
8.20

8
ND
10.80
13.20
10.80
9.00
11.00
9.00
7.20
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
2.20
1.80

9
LC-R
17.20
19.60
17.20
15.40
17.40
15.40
13.60
15.20
11.80
13.00
11.80
10.00
10.80
10.00
8.20
8.60
8.20

10
NC
15.60
18.00
15.60
13.80
15.80
13.80
12.00
13.60
12.00
11.40
10.20
8.40
9.20
8.40
6.60
7.00
6.60

11
ND
10.80
13.20
10.80
9.00
11.00
9.00
7.20
8.80
7.20
5.40
5.40
3.60
4.40
3.60
1.80
2.20
1.80

Table A.8: Agents’ match scores for Group Composition 4

12
LC-R
15.60
18.00
15.60
13.80
15.80
13.80
12.00
13.60
12.00
10.20
11.40
8.40
9.20
8.40
6.60
7.00
6.60

13
NC
14.00
16.40
14.00
12.20
14.20
12.20
10.40
12.00
10.40
8.60
9.80
8.60
7.60
6.80
5.00
5.40
5.00

14
ND
10.80
13.20
10.80
9.00
11.00
9.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
3.60
1.80
2.20
1.80

15
LC-R
14.00
16.40
14.00
12.20
14.20
12.20
10.40
12.00
10.40
8.60
9.80
8.60
6.80
7.60
5.00
5.40
5.00

16
NC
12.40
14.80
14.80
10.60
12.60
12.60
8.80
10.40
8.80
7.00
8.20
7.00
5.20
6.00
5.20
3.80
3.40

17
ND
10.80
13.20
13.20
9.00
11.00
11.00
7.20
8.80
7.20
5.40
6.60
5.40
3.60
4.40
3.60
1.80
1.80

18
LC-R
12.40
14.80
14.80
10.60
12.60
12.60
8.80
10.40
8.80
7.00
8.20
7.00
5.20
6.00
5.20
3.40
3.80

A.1.5

Group Composition 5

Parameter Settings
• Agent strategies (k = 4 ): NaiveC = 6, NaiveD = 6, SelectiveC = 6, LimitedCFixed = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 24
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9
• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
– Adjusted average score: saavg
= k(q−1)
.
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1

97

tot

– Group average score: savg = s n

– Group adjusted total score: satot =

n
∑
satot
i
1
atot

– Group adjusted average score: saavg = s n

Experimental Results
20
NAIVE C
NAIVE D
SELECTEDC
LIMITEDC-F

Adjusted Average Scores

15

10

5

0

0

2

4

6

8

10

12

14

Agents Abilities

Figure A.5: Agents’ performance for different ability levels in group composition 5

98

ID

Strategy

Abilities

Total
Score

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

NaiveC
NaiveD
SelectiveC
LC-F
NaiveC
NaiveD
SelectiveC
LC-F
NaiveC
NaiveD
SelectiveC
LC-F
NaiveC
NaiveD
SelectiveC
LC-F
NaiveC
NaiveD
SelectiveC
LC-F
NaiveC
NaiveD
SelectiveC
LC-F

12
12
12
12
10
10
10
10
8
8
8
8
6
6
6
6
4
4
4
4
2
2
2
2

339.60
370.80
354.00
368.40
299.80
320.20
311.80
323.80
260.00
269.60
269.60
279.20
186.60
185.40
193.80
204.60
146.80
134.80
151.60
158.80
107.00
84.20
109.40
113.00

Average
Scores
14.77
16.12
15.39
16.02
13.03
13.92
13.56
14.08
11.30
11.72
11.72
12.14
8.11
8.06
8.43
8.90
6.38
5.86
6.59
6.90
4.65
3.66
4.76
4.91

Adjusted
Total
Score
288.00
312.00
300.00
316.80
256.80
271.20
266.80
280.80
225.60
230.40
233.60
244.80
165.60
160.80
171.60
175.20
132.80
118.40
136.80
139.20
100.00
76.00
102.00
103.20

Adjusted
Average
Score
14.40
15.60
15.00
15.84
12.84
13.56
13.34
14.04
11.28
11.52
11.68
12.24
8.28
8.04
8.58
8.76
6.64
5.92
6.84
6.96
5.00
3.80
5.10
5.16

Table A.9: Agents’ average and adjusted average scores for Group Composition 5

99

1
NC

100

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

NC
ND
SC
LC-F
NC
ND
SC
LC-F
NC
ND
SC
LC-F
NC
ND
SC
LC-F
NC
ND
SC
LC-F
NC
ND
SC
LC-F

22.80
20.40
20.40
18.60
20.60
18.60
18.60
16.80
18.40
16.80
16.80
15.00
16.20
15.00
15.00
13.20
14.00
13.20
13.20
11.40
11.80
11.40
11.40

2
ND
10.80
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

3
SC
20.40
13.20
20.40
18.60
11.00
18.60
18.60
16.80
8.80
16.80
16.80
15.00
6.60
15.00
15.00
13.20
4.40
13.20
13.20
11.40
2.20
11.40
11.40

4
LC-F
20.40
22.80
20.40
18.60
20.60
18.60
18.60
16.80
18.40
16.80
16.80
5.40
6.60
5.40
5.40
3.60
4.40
3.60
3.60
1.80
2.20
1.80
1.80

5
NC
18.80
21.20
18.80
18.80
19.00
17.00
17.00
15.20
16.80
15.20
15.20
13.40
14.60
13.40
13.40
11.60
12.40
11.60
11.60
9.80
10.20
9.80
9.80

6
ND
10.80
13.20
13.20
10.80
9.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

7
SC
18.80
13.20
18.80
18.80
17.00
11.00
17.00
15.20
8.80
15.20
15.20
13.40
6.60
13.40
13.40
11.60
4.40
11.60
11.60
9.80
2.20
9.80
9.80

8
LC-F
18.80
21.20
18.80
18.80
17.00
19.00
17.00
15.20
16.80
15.20
15.20
5.40
6.60
5.40
5.40
3.60
4.40
3.60
3.60
1.80
2.20
1.80
1.80

9
NC
17.20
19.60
17.20
17.20
15.40
17.40
15.40
15.40
15.20
13.60
13.60
11.80
13.00
11.80
11.80
10.00
10.80
10.00
10.00
8.20
8.60
8.20
8.20

10
ND
10.80
13.20
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

11
SC
17.20
13.20
17.20
17.20
15.40
11.00
15.40
15.40
13.60
8.80
13.60
11.80
6.60
11.80
11.80
10.00
4.40
10.00
10.00
8.20
2.20
8.20
8.20

12
LC-F
17.20
19.60
17.20
17.20
15.40
17.40
15.40
15.40
13.60
15.20
13.60
5.40
6.60
5.40
5.40
3.60
4.40
3.60
3.60
1.80
2.20
1.80
1.80

13
NC
15.60
18.00
15.60
18.00
13.80
15.80
13.80
15.80
12.00
13.60
12.00
13.60
11.40
10.20
11.40
8.40
9.20
8.40
9.20
6.60
7.00
6.60
7.00

14
ND
10.80
13.20
13.20
13.20
9.00
11.00
11.00
11.00
7.20
8.80
8.80
8.80
5.40
6.60
6.60
3.60
4.40
4.40
4.40
1.80
2.20
2.20
2.20

15
SC
15.60
13.20
15.60
18.00
13.80
11.00
13.80
15.80
12.00
8.80
12.00
13.60
10.20
6.60
11.40
8.40
4.40
8.40
9.20
6.60
2.20
6.60
7.00

16
LC-F
15.60
18.00
15.60
18.00
13.80
15.80
13.80
15.80
12.00
13.60
12.00
13.60
5.40
6.60
5.40
3.60
4.40
3.60
4.40
1.80
2.20
1.80
2.20

17
NC
14.00
16.40
14.00
16.40
12.20
14.20
12.20
14.20
10.40
12.00
10.40
12.00
8.60
9.80
8.60
9.80
7.60
6.80
7.60
5.00
5.40
5.00
5.40

18
ND
10.80
13.20
13.20
13.20
9.00
11.00
11.00
11.00
7.20
8.80
8.80
8.80
5.40
6.60
6.60
6.60
3.60
4.40
4.40
1.80
2.20
2.20
2.20

Table A.10: Agents’ match scores for Group Composition 5

19
SC
14.00
13.20
14.00
16.40
12.20
11.00
12.20
14.20
10.40
8.80
10.40
12.00
8.60
6.60
8.60
9.80
6.80
4.40
7.60
5.00
2.20
5.00
5.40

20
LC-F
14.00
16.40
14.00
16.40
12.20
14.20
12.20
14.20
10.40
12.00
10.40
12.00
5.40
6.60
5.40
6.60
3.60
4.40
3.60
1.80
2.20
1.80
2.20

21
NC
12.40
14.80
12.40
14.80
10.60
12.60
10.60
12.60
8.80
10.40
8.80
10.40
7.00
8.20
7.00
8.20
5.20
6.00
5.20
6.00
3.80
3.40
3.80

22
ND
10.80
13.20
13.20
13.20
9.00
11.00
11.00
11.00
7.20
8.80
8.80
8.80
5.40
6.60
6.60
6.60
3.60
4.40
4.40
4.40
1.80
2.20
2.20

23
SC
12.40
13.20
12.40
14.80
10.60
11.00
10.60
12.60
8.80
8.80
8.80
10.40
7.00
6.60
7.00
8.20
5.20
4.40
5.20
6.00
3.40
2.20
3.80

24
LC-F
12.40
14.80
12.40
14.80
10.60
12.60
10.60
12.60
8.80
10.40
8.80
10.40
5.40
6.60
5.40
6.60
3.60
4.40
3.60
4.40
1.80
2.20
1.80

A.1.6

Group Composition 6

Parameter Settings
• Agent strategies (k = 4 ): NaiveC = 6, NaiveD = 6, SelectiveC = 6, LimitedCRatio = 6
• Ability levels (q = 6 ): 12, 10, 8, 6, 4, 2
• Number of agents (n = k*q): 24
• Base payoffs: T = 10, R = 8, P = 2, S = 0
• Balancing factor (α): = 0.9
• Number of tournament iterations: 1

Formulas
• Individual Scores of agent i:
(j)

(j)

(j)

– Score vs. agent j : si = αai + (1 − α)pi where pi is the payoff of i vs. j.
– Total score: stot
i =

n
∑
(j)
si
j=1

j̸=i
stot

i
– Average score: savg
= n−1
i

– Adjusted total score: satot
=
i

n
∑

(j)

si

j=1

aj ̸=ai
satot

i
– Adjusted average score: saavg
= k(q−1)
.
i

• Group Scores:
– Group total score: stot =

n
∑
stot
i
1

101

tot

– Group average score: savg = s n

– Group adjusted total score: satot =

n
∑
satot
i
1
atot

– Group adjusted average score: saavg = s n

Experimental Results
20
NAIVE C
NAIVE D
SELECTEDC
LIMITEDC-R

Adjusted Average Score

15

10

5

0

0

2

4

6

8

10

12

14

Agents Abilities

Figure A.6: Agents’ performance for different ability levels in group composition 6

102

ID

Strategy

Abilities

Total
Score

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

NaiveC
NaiveD
SelectiveC
LC-R
NaiveC
NaiveD
SelectiveC
LC-R
NaiveC
NaiveD
SelectiveC
LC-R
NaiveC
NaiveD
SelectiveC
LC-R
NaiveC
NaiveD
SelectiveC
LC-R
NaiveC
NaiveD
SelectiveC
LC-R

12
12
12
12
10
10
10
10
8
8
8
8
6
6
6
6
4
4
4
4
2
2
2
2

339.60
370.80
354.00
349.20
299.80
320.20
311.80
307.80
260.00
269.60
269.60
260.00
220.20
219.00
227.40
220.20
180.40
168.40
185.20
180.40
123.00
100.20
125.40
123.00

Average
Scores
14.77
16.12
15.39
15.18
13.03
13.92
13.56
13.38
11.30
11.72
11.72
11.30
9.57
9.52
9.89
9.57
7.84
7.32
8.05
7.84
5.35
4.36
5.45
5.35

Adjusted
Total
Score
288.00
312.00
300.00
297.60
256.80
271.20
266.80
264.80
225.60
230.40
233.60
225.60
194.40
189.60
200.40
194.40
163.20
148.80
167.20
163.20
114.40
90.40
116.40
114.40

Adjusted
Average
Score
14.40
15.60
15.00
14.88
12.84
13.56
13.34
13.24
11.28
11.52
11.68
11.28
9.72
9.48
10.02
9.72
8.16
7.44
8.36
8.16
5.72
4.52
5.82
5.72

Table A.11: Agents’ average and adjusted average scores for Group Composition 6

103

1
NC

104

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

NC
ND
SC
LC-R
NC
ND
SC
LC-R
NC
ND
SC
LC-R
NC
ND
SC
LC-R
NC
ND
SC
LC-R
NC
ND
SC
LC-R

22.80
20.40
20.40
18.60
20.60
18.60
18.60
16.80
18.40
16.80
16.80
15.00
16.20
15.00
15.00
13.20
14.00
13.20
13.20
11.40
11.80
11.40
11.40

2
ND
10.80
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

3
SC
20.40
13.20
20.40
18.60
11.00
18.60
18.60
16.80
8.80
16.80
16.80
15.00
6.60
15.00
15.00
13.20
4.40
13.20
13.20
11.40
2.20
11.40
11.40

4
LC-R
20.40
22.80
20.40
18.60
20.60
18.60
18.60
16.80
18.40
16.80
16.80
15.00
16.20
15.00
15.00
13.20
14.00
13.20
13.20
1.80
2.20
1.80
1.80

5
NC
18.80
21.20
18.80
18.80
19.00
17.00
17.00
15.20
16.80
15.20
15.20
13.40
14.60
13.40
13.40
11.60
12.40
11.60
11.60
9.80
10.20
9.80
9.80

6
ND
10.80
13.20
13.20
10.80
9.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

7
SC
18.80
13.20
18.80
18.80
17.00
11.00
17.00
15.20
8.80
15.20
15.20
13.40
6.60
13.40
13.40
11.60
4.40
11.60
11.60
9.80
2.20
9.80
9.80

8
LC-R
18.80
21.20
18.80
18.80
17.00
19.00
17.00
15.20
16.80
15.20
15.20
13.40
14.60
13.40
13.40
11.60
12.40
11.60
11.60
1.80
2.20
1.80
1.80

9
NC
17.20
19.60
17.20
17.20
15.40
17.40
15.40
15.40
15.20
13.60
13.60
11.80
13.00
11.80
11.80
10.00
10.80
10.00
10.00
8.20
8.60
8.20
8.20

10
ND
10.80
13.20
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

11
SC
17.20
13.20
17.20
17.20
15.40
11.00
15.40
15.40
13.60
8.80
13.60
11.80
6.60
11.80
11.80
10.00
4.40
10.00
10.00
8.20
2.20
8.20
8.20

12
LC-R
17.20
19.60
17.20
17.20
15.40
17.40
15.40
15.40
13.60
15.20
13.60
11.80
13.00
11.80
11.80
10.00
10.80
10.00
10.00
8.20
8.60
8.20
8.20

13
NC
15.60
18.00
15.60
15.60
13.80
15.80
13.80
13.80
12.00
13.60
12.00
12.00
11.40
10.20
10.20
8.40
9.20
8.40
8.40
6.60
7.00
6.60
6.60

14
ND
10.80
13.20
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
2.20
1.80

15
SC
15.60
13.20
15.60
15.60
13.80
11.00
13.80
13.80
12.00
8.80
12.00
12.00
10.20
6.60
10.20
8.40
4.40
8.40
8.40
6.60
2.20
6.60
6.60

16
LC-R
15.60
18.00
15.60
15.60
13.80
15.80
13.80
13.80
12.00
13.60
12.00
12.00
10.20
11.40
10.20
8.40
9.20
8.40
8.40
6.60
7.00
6.60
6.60

17
NC
14.00
16.40
14.00
14.00
12.20
14.20
12.20
12.20
10.40
12.00
10.40
10.40
8.60
9.80
8.60
8.60
7.60
6.80
6.80
5.00
5.40
5.00
5.00

18
ND
10.80
13.20
13.20
10.80
9.00
11.00
11.00
9.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
3.60
1.80
2.20
2.20
1.80

Table A.12: Agents’ match scores for Group Composition 6

19
SC
14.00
13.20
14.00
14.00
12.20
11.00
12.20
12.20
10.40
8.80
10.40
10.40
8.60
6.60
8.60
8.60
6.80
4.40
6.80
5.00
2.20
5.00
5.00

20
LC-R
14.00
16.40
14.00
14.00
12.20
14.20
12.20
12.20
10.40
12.00
10.40
10.40
8.60
9.80
8.60
8.60
6.80
7.60
6.80
5.00
5.40
5.00
5.00

21
NC
12.40
14.80
12.40
14.80
10.60
12.60
10.60
12.60
8.80
10.40
8.80
8.80
7.00
8.20
7.00
7.00
5.20
6.00
5.20
5.20
3.80
3.40
3.40

22
ND
10.80
13.20
13.20
13.20
9.00
11.00
11.00
11.00
7.20
8.80
8.80
7.20
5.40
6.60
6.60
5.40
3.60
4.40
4.40
3.60
1.80
2.20
1.80

23
SC
12.40
13.20
12.40
14.80
10.60
11.00
10.60
12.60
8.80
8.80
8.80
8.80
7.00
6.60
7.00
7.00
5.20
4.40
5.20
5.20
3.40
2.20
3.40

24
LC-R
12.40
14.80
12.40
14.80
10.60
12.60
10.60
12.60
8.80
10.40
8.80
8.80
7.00
8.20
7.00
7.00
5.20
6.00
5.20
5.20
3.40
3.80
3.40