AN AGENT-BASED MODEL FOR EVOLUTION OF COOPERATION WITH PROACTIVE INFORMATION GATHERING by Nahid Mohammad Taheri B.Sc., Applied Mathematics, University of Isfahan, Iran, 2011 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA April 2018 c Nahid Mohammad Taheri, 2018 Abstract This thesis proposes a new model to investigate the impact of proactive information gathering upon the evolution of cooperation among self-interested agents in a multiagent system. It builds upon an existing game-theoretical model of spatially distributed mobile agent population with energy-based individual life cycle, in which individuals keep playing one-shot Prisoner’s Dilemma games in neighbourhood encounters. Using proactively gathered information about past behaviour of others, advanced agents in the new model can dynamically adjust their strategies towards different types of opponents. Simulation experiments establish some patterns of how this ability impacts the evolution of cooperation in the presence of varying levels of environmental adversity. The adequacy of the model is demonstrated through a specific design involving two types of advanced agents, one oriented towards cooperation, the other towards defection. The results show that cooperation prevails in a substantially larger area of parameter space than in the basic model without information gathering. i Contents Abstract i List of Figures v List of Tables ix Acknowledgements x Dedication xi 1 Introduction 1 2 Background and Related Work 8 2.1 Multiagent Systems (MAS) . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Evolutionary Game Theory . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Evolution of Cooperation in MAS . . . . . . . . . . . . . . . . . . . . . 16 2.5 Evolution of Cooperation in Spatial Structure . . . . . . . . . . . . . . 17 2.6 Evolution of Cooperation with Information Exchange . . . . . . . . . . 20 2.7 Agent-Based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 ii 8 3 Problem Statement 25 3.1 The Motivation and Research Objectives . . . . . . . . . . . . . . . . . 25 3.2 The IG-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 3.2.1 Advanced Agent . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Strategy Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 30 Cluster studies and neighbourhood profile . . . . . . . . . . . . . . . . 32 4 Simulator Architecture 4.1 The Simulator Requirements . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 4.2 33 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1.1 Set and Run Experiment . . . . . . . . . . . . . . . . . 34 4.1.1.2 Display Experiment . . . . . . . . . . . . . . . . . . . 36 4.1.2 Non-functional Requirements . . . . . . . . . . . . . . . . . . . 37 4.1.3 Domain-specific Requirements . . . . . . . . . . . . . . . . . . . 37 The Simulator Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2.1 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.2 MAS-layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.3 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 The System Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5 Evaluation 59 5.1 Parameter Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 The Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2.1 5.3 Clusters and Neighbourhood Profiles . . . . . . . . . . . . . . . 64 The IG-Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.1 Initial Agent Population . . . . . . . . . . . . . . . . . . . . . . 67 iii 5.3.2 5.3.3 IG-AC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.2.1 Observations and Analysis . . . . . . . . . . . . . . . . 68 5.3.2.2 Clusters and Neighbourhood Profiles . . . . . . . . . . 69 IG-AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3.3.1 5.3.4 5.4 Observations and Analysis . . . . . . . . . . . . . . . . 72 IG-ACD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Performance Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.1 Results Validation . . . . . . . . . . . . . . . . . . . . . . . . . 89 6 Conclusions and Future Work 92 Bibliography 95 iv List of Figures 2.1 Agent’s life cycle as defined in the [Smaldino et al., 2013]. . . . . . . . . 18 4.1 The use case diagram of the simulator . . . . . . . . . . . . . . . . . . 35 4.2 The high level structure of the simulator . . . . . . . . . . . . . . . . . 39 4.3 The controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 The advanced agent architecture . . . . . . . . . . . . . . . . . . . . . . 44 4.5 The front end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 The control panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.7 The mechanics of the experiment . . . . . . . . . . . . . . . . . . . . . 47 4.8 The mechanics of the run 4.9 The mechanics of the round . . . . . . . . . . . . . . . . . . . . . . . . 49 . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.10 The agent’s activity diagram . . . . . . . . . . . . . . . . . . . . . . . . 51 4.11 The agent’s state-machine representation . . . . . . . . . . . . . . . . . 53 4.12 The GUI window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.13 The GUI window, when we only have two types of agents in the population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.14 The Mouseover tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.15 The report viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.16 The Cluster and its information . . . . . . . . . . . . . . . . . . . . . . 57 v 5.1 The number of cooperators for different values of S in the basic model. 61 5.2 The number of defectors for different values of S in the basic model. . . 62 5.3 The number of defectors and cooperators in the long-run when S is -1 in the basic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.4 The number of cooperators for different values of cost of living in the basic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.5 The number of defectors for different values of cost of living in the basic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.6 An example of a dying cluster in the basic model . . . . . . . . . . . . 65 5.7 An example of a rising cluster in the basic model . . . . . . . . . . . . 65 5.8 An example of the number of agents and cooperators’ neighbourhood profile in a dying cluster in the basic model 5.9 . . . . . . . . . . . . . . . 66 An example of the number of agents and cooperators’ neighbourhood profile in a rising cluster in the basic model . . . . . . . . . . . . . . . . 66 5.10 The number of cooperators for different values of S in the IG-AC model. 69 5.11 The number of cooperators for different values of cost of living in the IG-AC model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.12 The number of neighbours in different types of agent’s neighbourhood in the IG-AC model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.13 Clusters’ development in the IG-AC model . . . . . . . . . . . . . . . . 70 5.14 The number of cooperators for different values of S in the IG-AD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.15 The number of cooperators for different values of cost of living in the IG-AD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.16 The number of cooperators for different values of S in the IG-AD0.5 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 vi 5.17 The number of cooperators for different values of cost of living in the IG-AD0.5 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.18 The number of cooperators for different values of S in the IG-AD0.7 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.19 The number of cooperators for different values of cost of living in the IG-AD0.7 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.20 The number of cooperators in the basic and the IG-AD models . . . . . 74 5.21 The number of defectors in the basic and the IG-AD models . . . . . . 75 5.22 The number of advanced defectors and naive defectors in the IG-AD models with a low punishing value of S . . . . . . . . . . . . . . . . . . 76 5.23 The number of advanced defectors and naive defectors in the IG-AD models with a high punishing value of S . . . . . . . . . . . . . . . . . 76 5.24 The number of cooperators in defectors’ neighbourhood in the IGAD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.25 The number of cooperators in defectors’ neighbourhood in the IGAD0.5 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.26 The number of cooperators in defectors’ neighbourhood n the IG-AD0.7 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.27 The number of agents in a rising cluster in IG-AD0.3 model . . . . . . 79 5.28 The number of agents in a dying cluster in IG-AD0.3 model . . . . . . 80 5.29 An example of a rising cluster in the IG-AD0.3 model . . . . . . . . . . 80 5.30 An example of a dying cluster in the IG-AD0.3 model . . . . . . . . . . 80 5.31 An example of cooperators’ neighbourhood profile in a dying cluster in the IG-AD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.32 An example of cooperators’ neighbourhood profile in a rising cluster in the IG-AD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 vii 5.33 The number of cooperators for different values of S in the IG-ACD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.34 The number of defectors for different values of S in the IG-ACD0.3 model 82 5.35 The number of cooperators for different values of cost of living in the IG-ACD0.3 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.36 The number of cooperators for different values of S in the IG-ACD0.7 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.37 The number of cooperators for different values of cost of living in the IG-ACD0.7 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.38 The number of advanced defectors and naive defectors in the IG-ACD models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.39 The number of advanced defectors and naive defectors in the IG-ACD models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.40 The outcomes of the evolution in the basic model . . . . . . . . . . . . 87 5.41 The outcomes of the evolution in the IG-ACD0.3 model . . . . . . . . . 87 5.42 The outcomes of the evolution in the IG-AD0.3 model . . . . . . . . . . 89 5.43 The outcomes of the evolution in the IG-AC model . . . . . . . . . . . 89 5.44 An example of compared results with error bars . . . . . . . . . . . . . 91 viii List of Tables 2.1 The payoff matrix for the PD. The PD game is defined when this condition holds: T > R > P ⩾ S. In addition, when the agents play repeatedly, 2R > T + S is a requirement. . . . . . . . . . . . . . . . . . 13 2.2 The payoff matrix for a symmetric game . . . . . . . . . . . . . . . . . 13 2.3 The payoff matrix for the Prisoner’s Dilemma game, where g, l > 0, g < l + 1 [Heller and Mohlin, 2016]. . . . . . . . . . . . . . . . . . . . . . . 23 2.4 The payoff matrix for the Defensive Prisoner’s Dilemma game, where 1 = g < l = 3 [Heller and Mohlin, 2016]. . . . . . . . . . . . . . . . . . 23 2.5 The payoff matrix for the Offensive Prisoner’s Dilemma game, where 2.3 = g > l = 1.7 [Heller and Mohlin, 2016]. . . . . . . . . . . . . . . . 23 2.6 The payoff matrix for the Mild Prisoner’s Dilemma game, where g = l = 0.2 < (l + 1)/2 = 0.6 [Heller and Mohlin, 2016]. . . . . . . . . . . . 23 ix Acknowledgements I would like to sincerely express my gratefulness to my supervisor, Dr. Jernej Polajnar, for his support, advice and patience throughout all stages of this thesis. x This Thesis is dedicated to my husband Mani Samani for his endless love and support xi Chapter 1 Introduction Cooperation is often considered as one of the key concepts of multiagent systems (MAS) [Doran et al., 1997] [Buccafurri et al., 2004] [Salazar et al., 2011]. An agent is an entity situated in an environment, and it is able to perform intelligent and autonomous action in order to achieve its delegated goals [Wooldridge, 2009]. A multiagent system consists of multiple agents interacting with each other, which are assigned individual and team-level tasks. Self-interested agents in a multiagent system, whether designed by a single team or independently, may need to cooperate with each other to perform their tasks and advance the overall objectives of the system [Kraus, 1997] [Jennings, 1995]. The designers of agents typically have two different ways to design a multiagent system. The first option is to design agents with fixed cooperation patterns. The alternative way is designing each agent with an ability to develop cooperative behaviour through interactions with other agents. The latter way may prove more productive, especially in a dynamic environment where agents need to adjust their behaviour to 1 the changes that are difficult to predict. In addition, the second approach applies more naturally to societies composed of independently designed agents. To design agents with the ability to develop cooperation proactively, it is helpful to understand how cooperation can evolve in agent societies under different circumstances. The groundwork for studies of the evolution of cooperation among self-interested individuals has been provided by Hamilton [1963], Axelrod [1984], Smith [1982], and Nowak [2006]. Numerous researches in the evolution of cooperation among selfinterested agents are based on the evolutionary game theory [Weibull, 1997] [Gintis, 2000] [Colman, 2013]. Studies in the evolution of cooperation, that are rooted in evolutionary game theory, are typically built on the Prisoner’s Dilemma (PD) game [Shoham and LeytonBrown, 2009]. The PD game is representative of the wider approach to the study of cooperation between self-interested agents, contrasting two extreme types of behaviour: characterized as ‘selfish’ (defection) and ‘altruistic’ (cooperation). A cooperator helps other individuals at its own cost. A defector does not pay any cost and takes advantage of cooperators. Among agents playing the PD game, individuals engage in pairwise interactions, and each of them can select to cooperate or to defect. If both players cooperate (mutual cooperation), they will receive a higher payoff than when they both defect (mutual defection). However, if one cooperates and the other one defects, the defector gets the highest payoff, and the cooperator gets the lowest. In the one-shot PD game, regardless of what the other player does, defection is always the best strategy because it has the highest payoff. As a result, even though mutual cooperation is more profitable than mutual defection, self-interested agents always defect in one-shot PD game to receive the highest payoff [Elkind and Markakis, 2013]. However, several 2 studies prove that the consistent defection is not always the best strategy. Different mechanisms have been developed to promote the evolution of cooperation in the agent societies [Doebeli and Hauert, 2005]. In some of the models studied, an agent that needs to decide its strategy in the next encounter could benefit from having some information about the profile of other agents. It seems that in most of these models, cooperation can evolve if self-interested individuals can obtain some information about other agents, such as the history of their interactions and their visited opponents [Bachi et al., 2010] [Takahashi, 2010] [Awaya, 2014] [Heller and Mohlin, 2016]. An agent then can adjust its strategy based on the obtained information. This thesis focusses on the impact of dynamically adjustable agents’ strategies, based on the proactive gathering of information about the behaivor of other agents, upon the evolution of cooperation within an existing model of agent societies. The model that we adopt as the basis of our research is formulated by Smaldino et al. [2013]; henceforth, we refer to it as the basic model. The basic model explores the impact of environmental adversity, or harsh environment, characterized by the cost of living and the unreciprocated cooperation cost, upon the evolution of cooperation in a population of agents. In that model, the agents are situated on a toroidal grid. The agents have energy-based life cycle and the population evolves through the births and deaths. Each individual agent can interact with a randomly chosen partner from its spatial neighbourhood by playing the PD game. If no neighbour is available as a partner, the agent can move to an empty adjacent cell. Each agent is given a starting energy value, which is subsequently modified by the game payoffs. In addition, the energy of each agent is reduced in each round by a fixed amount representing the cost of living. Smaldino et al. [2013] show that in 3 moderately harsh environmental conditions, cooperation tends to prevail over selfish behaviour eventually. They also present the circumstances under which cooperation evolves in the basic model. This observation has raised the research question of how that development is affected by allowing agents to dynamically adjust their strategies based on proactively collected knowledge of past behaviour of other agents. In this thesis, we approach that question by adopting and extending the premise of the basic model. In the basic model, the population consists of two types of agents: cooperators, which always cooperate, and defectors, which always defect. We retain those agent types in our model and call them naive cooperators and naive defectors respectively. In addition, we introduce advanced cooperators, which favour cooperative behaviour, and advanced defectors, which favour selfish behaviour, but are capable of pursuing more complex, dynamically adjustable strategies. Advanced agents are capable of collecting some information about their opponents, when they play with them. The collected information is stored in their personal game memory. Also, they can inquire information of past behaviour of other agents in the population. Advanced agents’ strategies are adjusted dynamically based on their personal game memory and the proactively gathered knowledge of past behaviour of other agents. In the presence of some knowledge about the past behaviour of the opponent, an advanced agent may either cooperate or defect, as dictated by its advanced strategy, motivated by its tactical approach to perceived cooperators or defectors. However, in the absence of any such information, advanced cooperators always cooperate, and advanced defectors always defect. In this context, we investigate the information gathering (IG) techniques and advanced strategies that facilitate the evolution of cooperation in the society in spite of the presence of defectors. In particular, we explore whether proactive information 4 gathering can lower the harshness threshold of environmental adversity at which cooperation evolves compared to the basic model. One should note that the hypothesis that proactive information gathering may facilitate the success of advanced cooperators requires careful study, as advanced defectors can also use the gathered information to promote their own success 1 . Within this context, we design IG-models with specific strategies of advanced agents, provide a rationale for that particular design, perform a series of experiments across a space of parameters representing varying levels of environmental adversity, and compare the outcomes with respect to evolution of cooperation with the outcomes of the basic model over the same parameter space. The results confirm that, compared to the basic model, the presence of advanced agents with IG capabilities significantly extends the region of parameter space in which cooperators prevail over defectors. After the information is obtained by an advanced agent, it can be used in different ways. For example, advanced cooperators can act conditionally based on their opponents’ previous actions as represented in [Nowak and Sigmund, 1998] where each individual only cooperates with those that cooperated in their previous interactions. The agent’s strategic reasoning is an open research dimension, which can be more explored during the further experimentation for future works. For modeling and simulation, we develop a simulator that supports the concurrent simulation of multiple agents interaction in an environment. The simulator provides an interactive framework for comparing the number of different types of agents and population development, across the distinct parameter spaces. It includes facilities for 1 The impact of strategy adjustment based on information gathering by advanced agents to the development of cooperation is being studied by our research group in the context of different models of agent society. An ongoing study parallel to this thesis uses as basis a substantially different game-theoretic model of evolving agent society [Nyantakyi, 2018]. 5 the study of localized evolutionary phenomena, such as neighbourhood profiles and cluster development. The IG-models are investigated through simulation experiments, and the impact of proactive information gathering upon the evolution of cooperation is determined in comparison with the basic model. The main contributions of this thesis are as follows. First, a concrete model of agent society has been constructed in which the designs and strategies of informationgathering advanced agents can be examined with respect to their impact upon the development of cooperation under moderately harsh environmental conditions, extending the modeling studies in evolution of cooperation in [Smaldino et al., 2013]. Second, concrete designs of advanced cooperators and advanced defectors have been introduced, implemented, and studied through simulation experiments. The results show that the presence of advanced agents significantly extends, in comparison to the basic model, the region of the parameter space representing the harshness of living conditions (as represented by the cost of living and the payoff for unreciprocated cooperation) in which cooperators eventually prevail over defectors. Third, our experimental studies of the specific model with advanced cooperators and defectors, as well as auxiliary models with advanced agents of a single type, in our view demonstrate the adequacy of the general model as a vehicle for further study of specific IG-models and their impact upon the development of cooperation, possibly including comparisons with results and insights from parallel developments of IG-models built upon basic models of different structure. The rest of this thesis is organized as follows. Chapter 2 covers the background and related work. Chapter 3 describes the motivations and research objectives and formulates the main objectives in the IG-models. Chapter 4 lays out the simulator architecture, including the simulator requirements, structure, and behaviour. Chapter 6 5 elucidates evaluation methods to analyze the results of simulation experiments, and Chapter 6 discusses the conclusions and future work. 7 Chapter 2 Background and Related Work In this chapter, we review several groundwork studies and combine those with related recent research in multiagent systems, evolutionary game theory, evolution of cooperation in multiagent systems, evolution of cooperation in spatial structure, evolution of cooperation with information exchange, and agent-based modelling, with the focus relevant to this thesis. 2.1 Multiagent Systems (MAS) Wooldridge [2009] defines an intelligent agent as “a computer system that is situated in some environment, and that is capable of autonomous action in this environment to meet its delegated objectives.” Shoham and Leyton-Brown [2009] describe a multiagent system (MAS) as “a system that includes multiple autonomous entities with either diverging information or diverging interests or both.” 8 Wooldridge and Jennings [1995] suggest some capabilities for an agent to be considered as intelligent. An intelligent agent is expected to be reactive, i.e., able to perceive the environment and respond to its changes, proactive, i.e., able to have goal-directed behaviours, and social, i.e., able to interact with other agents and possibly humans. The environment in which an agent is situated is generally classified according to four criteria [Russell and Norvig, 1995] explained in the following: Accessible versus inaccessible: In an accessible environment, the agent can obtain complete information about the environment; however a real-world environment, such as the physical world, is mostly inaccessible. Deterministic versus non-deterministic: In a deterministic environment, each agent’s action has a single guaranteed effect on the resulting environment state. Static versus dynamic: In a static environment, the state of the environment can only change by an agent’s action; however, in a dynamic environment, other processes influence the environment beyond the agent’s control. Discrete versus continuous: In a discrete environment, there is a fixed and finite number of actions and percepts. Otherwise, an environment is called continuous. In the modelling of agents interacting with their environment, the environment is initially in some state. While an agent is interacting with the environment, it perceives the current state of the environment and performs an action. The environment can transit to another state in response to the performed actions. The states of the environment can be correlated with some real values, know as utility. This real value indicates how good each state is and allows agents to decide what to do. 9 Practical reasoning [Bratman, 1987] is a particular model of decision making directed towards actions, whereas the theoretical reasoning is directed towards beliefs. In human practical reasoning, there are two different processes for decision making called deliberation and means-ends reasoning. Deliberation is to answer the question of what desired state we want to achieve, and the means-ends reasoning is to decide how to achieve the desired state. The practical reasoning agents are designed with the mental states similar to the ones in the human mind. These mental states are used in the agents’ decision making processes. The output of deliberation is intention, and the output of means-ends reasoning is plan. The best known practical reasoning framework is the Belief-Desire-Intention (BDI) proposed by Bratman [1987] and formulated by Shoham [1993]. In the context of BDI framework, the agents are considered rational with mental states of belief, desire, and intention. Beliefs are information the agent has about the environment; desires are all the possible states of affairs that the agent might like to accomplish; and intentions are the desired states that the agent has decided to work towards. Russell and Norvig [2009] define a rational agent as “one that acts so as to achieve the best outcome or when there is uncertainty, the best expected outcome.” In BDI, a rational agent is committed to its intention, as long as the intention is valid. The conceptual framework of the BDI model is described in [Bratman et al., 1988]. 2.2 Game Theory Generally, there are two kinds of MAS: cooperative and non-cooperative. In the cooperative systems, the agents share a common goal, however, in the non-cooperative systems, the agents are self-interested. Although the self-interested entities have their 10 own desires, they still interact with other agents in the system. The best framework for studying and analyzing the non-cooperative MAS is the main branch of game theory, called non-cooperative game theory. The non-cooperative game theory is mainly used to model the behaviour of self-interested entities and their associated decision making in different situations [Elkind and Markakis, 2013]. The other branch of game theory is called cooperative game theory, where the main modelling entity is the group of self-interested individuals [Shoham and LeytonBrown, 2009]. Since in this study, we are mostly interested in observing the behaviour of each individual, we only cover the non-cooperative game theory in this background. The theory of games was first formalized by von Neumann and Morgenstern [1944] to analyze human economic behaviour. In the game theory, a game is defined as an interaction among multiple agents with different preferences over the states of the environment, and each agent behaves rationally. Utility theory is the main way to capture an agent’s preferences. As Shoham and Leyton-Brown [2009] define, “[utility theory] aims to quantify an agent’s degree of preference across a set of available alternatives. The theory also aims to understand how these preferences change when an agent faces uncertainty about which alternative he will receive.” Considering the utility-theoretic assumptions the utility function represents an agent’s preferences by assigning a numeric value to each state of the environment. A rational agent always wants to maximize its own utility. In an environment that consists of multiple agents, the game theory is used to describe the strategic interactions among self-interested agents, where each individual agent wants the maximum utility. One of the most common representations of the strategic interactions between 11 agents in the game theory is the normal-form game [Shoham and Leyton-Brown, 2009]. A normal-form game represents each agent’s (player) utility for different states of the environment, where the states of the environment only depend on the players’ combined actions [Shoham and Leyton-Brown, 2009]. Definition 2.1.1 (Normal-form game): A (finite, n-person) normal form game is a tuple (N , A, u) where: • N is a finite set of n players, indexed by i. • A = A1 X... X An , where Ai is a finite set of actions available to player i. • u = (u1 ,..., un ), where ui is a real-valued utility function for player i. The utility function in the Definition 2.1.1 is also called the payoff function. The payoff function assigns a real number to a combined-action profile for each agent. The payoff values are shown in a matrix, called the payoff matrix. In a payoff matrix, each row represents a possible action for player 1, each column represents a possible action for player 2, and each cell represents a possible outcome. The payoff values for the corresponding outcome are shown as a pair of (X1 , X2 ), where X1 is the payoff to player 1, and X2 is the payoff to player 2. The Prisoner’s Dilemma (PD) game is an example of a normal-form game. Definition 2.1.2 (Prisoner’s Dilemma): The name of this game derives from the interpretation of two suspects, who are accused of committing a crime together. They reside in different cells and have no way of communicating. Each has the option of either confessing to the offense, defection (with respect to the other player) or denying the charge, cooperation (with respect to the other player). If both cooperate (mutual cooperation), they will go to jail for one year. If both defect (mutual defec12 tion), each of them will go to prison for three years. If one cooperates and the other defects, the defector will be freed, and the cooperator will be jailed for four years [Elkind and Markakis, 2013]. The values in the PD payoff matrix are defined as R, for the mutual cooperation, and P , for the mutual defection. When one player cooperates and the other defects, the defector gets the highest payoff, T , and the cooperator gets the lowest, S. Player 1 Cooperation Def ection Player 2 Cooperation Def ection (R, R) (S, T ) (T, S) (P, P ) Table 2.1: The payoff matrix for the PD. The PD game is defined when this condition holds: T > R > P ⩾ S. In addition, when the agents play repeatedly, 2R > T + S is a requirement. A normal-form game can be either symmetric or asymmetric. The payoff for the players does not depend on their identities in a symmetric game; it depends on the interaction. Also, the players do not have different roles in the game. PD game is an example of a symmetric game. Definition 2.1.3 (Symmetric 2X2 Game): A two-player normal-form game is called a symmetric game if it has the following form: Strategy A A (x, x) B (v, u) B (u, v) (y, y) Table 2.2: The payoff matrix for a symmetric game According to Definition 2.1.1, each agent has a set of available actions. The way that an agent decides what action to perform is called strategy. One way to play a game is to choose a single action. This strategy is called the pure strategy. In a 13 single PD game, the defection is always the best strategy because it has a higher payoff, regardless of the other player’s action. An agent can also use some probability distribution to choose randomly over its set of available actions. This kind of strategy is called the mixed strategy [Shoham and Leyton-Brown, 2009]: Definition 2.1.4 (Mixed Strategy): Let (N , A, u) be a normal-form game, and for any set X let Π(X) be the set of all probability distribution over X. Then the set of mixed strategies for player i is Si = Π(Ai ). In the single-agent situation, we use the optimal strategy notion, which refers to the strategy that maximizes an agent’s utility. In the multiagent setting, the concept of optimal strategy is not useful anymore since the best strategy for a given agent depends on the actions of the other agents in the environment. In that case, game theorists define different solution concepts [Shoham and Leyton-Brown, 2009]. One of the most important notions of solution concepts in the game theory is the equilibrium in general, and the Nash equilibrium in particular. From one individual agent’s perspective, if an agent knows the other agent’s strategy, it just needs to select an action that maximizes its utility. That means an agent needs to determine its best response to the strategy profile s−i = (s1 , ..., si−1 , si+1 , ..., sn ) [Shoham and Leyton-Brown, 2009]. The strategy profile s−i includes all the agents’ strategies, but not i. Definition 2.1.5 (Best Response): Player i’s best response to the strategy profile s−i is the mixed strategy s∗i ∈ Si such that ui (s∗i , s−i ) ≥ ui (si , s−i ) for all strategies si ∈ Si . In general, an agent will not have any information on what strategies the other 14 agents would play. However, the idea of best response can be used to define the Nash equilibrium [Shoham and Leyton-Brown, 2009]. Definition 2.1.6 (Nash equilibrium): A strategy profile s = (s1 , ..., sn ) is a Nash equilibrium if si is a best response to s−i , for all agents i, and i ∈ N , the finite set of n players. A Nash equilibrium is a stable strategy profile. In a stable strategy profile, none of the agents would prefer to change their strategy if they know the selected strategy of the other players. 2.3 Evolutionary Game Theory Evolutionary game theory (EGT) was developed by Smith and Price [1973] as an extension of game theory for modelling the evolution of life forms. The focus of EGT is on the population of agents playing a game against random opponents, all of whom are competing. The competition among individuals results in selecting agents with the higher utility. Therefore, the agents want to adopt the most successful strategy in terms of payoff. EGT is used to aid analysis of evolutionary selection in an interactive environment [Weibull, 1997] [Tuyls and Tumer, 2013]. The main assumption of the EGT is that in a population of an infinite number of players playing a game, all the agents play the same strategy initially. However, there is a chance that a small number of agents begin to play the game using a new strategy. The evolutionary success of a strategy is measured by the core equilibrium concept of the EGT, called the evolutionary stable strategy (ESS). If the payoff obtained by the 15 new strategy is smaller than the payoff obtained by the old strategy, the new strategy will disappear after some generations, and the original strategy becomes evolutionary stable. Definition 2.3.2 (Evolutionary Stable Strategy): Given a symmetric twoplayer normal-form game (G=1,2, A, u), and a mixed strategy S, we say that S is 0 an ESS if and only if for some  > 0 and for all other strategies S : 0 0 0 u(S, (1 − )S + S ) > u(S , (1 − )S + S ) where u is referring to the utility function. EGT is used in different fields such as biology [Sigmund and Nowak, 1999] and sociobiology [Wilson and Wilson, 2007], and also to study the evolution of cooperation among self-interested agents [Axelrod and Hamilton, 1981]. 2.4 Evolution of Cooperation in MAS Designing and building systems of intelligent and self-interested agents that can work together is a complex task in computer science [Kraus, 1997]. Agents should perform individual and team-level tasks and need to cooperate with each other to satisfy a joint goal and increase the system’s efficiency. Designers can build agents with the fixed cooperative patterns. However, the agents may need to develop cooperation at the run time, especially when they are designed independently and are located in an unknown dynamic environment. Therefore, the agent designers may benefit from understanding how cooperation can evolve among self-interested agents. 16 To address that subject, many game-theoretical models have been developed [Pendharkar, 2012]. Several studies have formulated the evolution of cooperation among the agents in graphs and social networks, [Szabó and Fath, 2007] [Florı́a et al., 2009]. Many researchers have tried to find certain circumstances which may or may not favour the cooperation, such as harsh environment [Smaldino et al., 2013], spatial structure [Majeski et al., 1997], variant group size [Barcelo and Capraro, 2015], and so on. Furthermore, different strategies have been developed to allow cooperators to interact more with each other, such as partner selection mechanisms and conditional movement [Aktipis, 2011], walk away strategy [Aktipis, 2004], and different movement patterns [Smaldino and Schank, 2012]. 2.5 Evolution of Cooperation in Spatial Structure Spatial structure is often considered as an effective factor of social evolution [Epstein et al., 1998]. In the evolutionary games with spatial structure, agents’ interactions are limited to defined neighbours in the game space. That limitation can favour the evolution of cooperative behaviour [Brauchli et al., 1999] [Nowak et al., 2010]. We adopt the basic model for this thesis from [Smaldino et al., 2013], which studies the evolution of cooperation among self-interested mobile agents in a dynamic socio-spatial structure. The basic model investigates the impact of two factors on the evolution of cooperation, namely the cost of unreciprocated cooperation and the environmental cost of living. Unreciprocated cooperation cost refers to the payoff S of cooperation in the PD game, where one player cooperates and the other one defects. Figure 2.1 shows an agent’s life cycle in the basic model. In the basic model, the 17 agents are given an energy value. The agents’ energy level is affected by the cost of living and game payoffs. They move to find a co-players and gain or lose resources, i.e., energy, through game interactions. The basic model is explained in the following paragraphs. Figure 2.1: Agent’s life cycle as defined in the [Smaldino et al., 2013]. • The Population The population consists of agents ai , i ∈ I, where I is a set of positive integers. The initial population is determined by the initial value of set I, and it is divided to the equal number of cooperators and defectors. The population evolves in discrete steps, where the set of agents can change through births and deaths. The population’s capacity is capped at a constant number, the maximum carrying capacity, so the population cannot grow beyond its maximum carrying capacity. • The Environment The mobile agents are located randomly on a q × q discrete square grid with periodic (toroidal) boundaries, which resembles a torus [Oxford University Press, 2017]. Only one agent can occupy a cell. Each agent has its own neighbourhood, 18 which includes the eight closest cells. Two agents are called neighbours if they are located in the same neighbourhood. • The Interactions and Movement Agents interact with each other by playing a 2-player game. Each agent can only play with its neighbours. The agent must find a co-player that is not engaged in another interaction. Each agent is given a certain unchangeable strategy (action) to play the game. If there is no opponent, the agent attempts to move to another cell in its neighbourhood. The movement is successful if there is an empty cell in the agent’s neighbourhood. • Experiments and rounds The agents’ interactions and movement occur in the discrete rounds. The basic model is built with the MASON [Luke et al., 2005], a discrete event multiagent simulation toolkit, which allows for scheduling agents to run every time step or round. • The Energy and Population Dynamics Each agent in the initial population has a starting energy value. That value is assigned randomly by the uniform distribution (U [a, b], where a is the minimum and b is the maximum). Agents should be able to accumulate energy to remain alive and to reproduce. The cost of living and game payoffs affect the energy of agents; however, an agent’s energy level is always capped at a constant amount, the maximum energy. Furthermore, an agent would be removed from the game space if its energy is below a certain value called the death threshold. Also, an agent can attempt to reproduce into a neighbouring cell if its energy reaches a certain value called the birth threshold. The birth is not successful if the population is at its maximum capacity or if there is no empty cell in the agent’s 19 neighbourhood. A new born agent receives a fixed starting energy from its parent, which results in the parent’s energy level deduction. • The Game The two-player game used in the basic model is the PD game. • Results Under certain assumptions of the basic model, Smaldino et al. [2013] conclude the following results: 1. When the cost of living has a fixed value and the cost of unreciprocated cooperation varies, the higher cost of unreciprocated cooperation results in losses for cooperators at the beginning; however cooperative agents can survive in the long run. By increasing the cost of unreciprocated cooperation, it is more difficult for cooperators to keep growing cooperative clusters. 2. When the cost of unreciprocated cooperation is constant, and the environmental cost of living takes different values, the higher cost helps cooperators in the long run, unless the cost of living is too high for any individual to survive. 2.6 Evolution of Cooperation with Information Exchange Among all different frameworks for studying the evolution of cooperation in MAS, there are some models which emphasize the impact of having information about other individuals on the evolution of cooperation through the use of reciprocity, social in20 formation, and trust and reputation [Mohtashemi and Mui, 2003] [Sabater and Sierra, 2002] [Sekaran and Sen, 1995]. The first well-known work studying evolution of cooperation with the knowledge of the opponent’s past behaviour is the Axelrods tournament, where agents engage in a long sequence of repeated PD interactions with the same opponent, and they can decide whether to cooperate or defect based on the previous action of their opponent. The winning strategy of Axelrods tournament was tit-for-tat (TFT), in which an agent would cooperate the first time, and then it would repeat the action of its opponent from the previous round. If there are other cooperators in the tournament with which to interact, an agent with TFT strategy can increase its total score [Axelrod, 1984]. Sekaran and Sen [1995] investigate the concept of reciprocity and its impact on developing cooperative behaviour in a MAS with the use of probabilistic decisionmaking algorithms. Each agent has a number of tasks to perform. An agent can perform its tasks individually, or it can request help. The agent a will cooperate with the agent b if it seeks the future benefits from the agent b. Thus, the agent a uses the information about past interactions with the agent b to decide if it desires help or not. The result of that study shows that the cooperative agents perform better than others. Evolutionary biologists, Nowak and Sigmund [1998], develop a model of indirect reciprocity by image scoring. In that model, each player has an image score. A player’s image score can be modified locally by the other player engaged in the interaction and a few random observers. The image score of a cooperator will increase because of its altruistic act. Therefore, an informed player only cooperates with those that are likely to help others. The result suggests that when information is localized, cooperation can evolve based on the players’ reputation in the system. 21 The fact that an agent can cooperate to receive help from others agents in the future, results in developing some models with social information and trust and reputation. Mohtashemi and Mui [2003] claim that rather than observation and direct interaction with other agents, information can be gathered by trusting agents in a social network. In their model, agents can communicate and acquire information selectively from their social network. The results suggest that cooperation can evolve if the collected benefit from reliable agents in the network out-weigh the cost of helping untrustworthy individuals. Sabater and Sierra [2002] use a trust-net to consider an agent’s social interaction with others. In that model, the agents calculate some reputation values by analyzing other agents’ social network. As Sabater and Sierra [2002] argue, the social network analysis can be used in different experimentations in real e-commerce environments. [Heller and Mohlin, 2016] propose a model in which the agents play one-shot PD game with a random opponent and can observe a random sample of the previous behaviours of their opponents and decide whether to defect or cooperate. There are two main assumptions in their study. First, there is a particular group of committed agents, and they use a certain type of strategy. Second, all the individuals in the population do not start to interact at the same time. That study presents three main results based on the different variations of the PD game with respects to the equilibrium. First, in the offensive PD game, Table 2.5, defection is the unique equilibrium. Second, in the defensive PD game, Table 2.4, cooperation can remain stable in the presence of a combination of strategies, where all the agents defect if at least two defections occur in the opponent’s history and cooperate when there is no defection. There are also a few agents, that defect in the case of a single defection in in the opponent’s history. Third, in the mild PD game, Table 2.6, cooperation can 22 be sustained, if an agent can also acquire information about the behaviour of past opponents against the current co-player. Player 1 Cooperation Def ection Player 2 Cooperation Def ection (1, 1) (−l, 1 + g) (1 + g, −l) (0, 0) Table 2.3: The payoff matrix for the Prisoner’s Dilemma game, where g, l > 0, g < l+1 [Heller and Mohlin, 2016]. Player 2 Cooperation Def ection Cooperation (1, 1) (−3, 2) Player 1 Def ection (2, −3) (0, 0) Table 2.4: The payoff matrix for the Defensive Prisoner’s Dilemma game, where 1 = g < l = 3 [Heller and Mohlin, 2016]. Player 2 Cooperation Def ection Cooperation (1, 1) (−1.7, 3.3) Player 1 Def ection (3.3, −1.7) (0, 0) Table 2.5: The payoff matrix for the Offensive Prisoner’s Dilemma game, where 2.3 = g > l = 1.7 [Heller and Mohlin, 2016]. Player 2 Cooperation Def ection Cooperation (1, 1) (−0.2, 1.2) Player 1 Def ection (1.2, −0.2) (0, 0) Table 2.6: The payoff matrix for the Mild Prisoner’s Dilemma game, where g = l = 0.2 < (l + 1)/2 = 0.6 [Heller and Mohlin, 2016]. [Takahashi, 2010] theoretically show that cooperation can be sustained, by an equilibrium, in a large population of players playing the repeated PD game. Each player is aware of its own past actions. In addition, each player can observe its current opponent’s past actions without any cost. Players then choose a mixed strategy based on the available information. [Awaya, 2014] construct an equilibrium, which remains robust when players can choose to observe their current opponent’s past action with 23 a very low cost. [Bachi et al., 2010] investigate the role of communication in the emergence of cooperation in one-shot PD game. In that model, players talk to each other before playing the game, and they may learn the other player’s strategy with a probability. The probability of receiving the information is the same for both players. The results show the levels of cooperation development under different conditions. 2.7 Agent-Based Modelling Agent-based modelling (ABM), is a very powerful tool that has been used to simulate real-world problems and human behaviours in recent years [Helbing, 2012] [Bonabeau, 2002]. Several studies have used this technique to explore the evolution of cooperation among self-interested agents [Smaldino and Schank, 2012] [Smaldino and Schank, 2012] [Wilson and O’Brien, 2009] [Aktipis, 2004] [Epstein et al., 1998]. 24 Chapter 3 Problem Statement This chapter delineates the motivation for designing a new model for the evolution of cooperation among self-interested agents with proactive information gathering. This model is designed to investigate the combined effect of two principles, namely repeated interaction with knowledge of opponents’ past behaviour and environmental adversity controllable by the experimenter, upon the evolution of cooperation. Section 3.1 describes the research motivations, and Section 3.2 presents the new model with proactive information gathering. 3.1 The Motivation and Research Objectives Building multiagent systems (MAS), in which self-interested agents are able to perform a common task jointly, can be challenging since agents may need to cooperate to achieve their individual goals and improve the system performance. In fact, co25 operation is considered as one of the key concepts of agent societies [Doran et al., 1997] [Buccafurri et al., 2004] [Salazar et al., 2011]. There are two possible ways to enable agents developing cooperation patterns. If the multiagent system is created by a single team and the environment is known, then cooperative behaviour can be built into agents at the design time. However, if agents are designed independently and located in an unknown environment, it might be beneficial to understand how cooperation can evolve through interactions among self-interested agents. To formalize the evolution of cooperation among self-interested individuals, many studies have developed different game-theoretical models based on repeated pairwise encounters in which agents engage in a noncooperative two-player game [Gintis, 2000] [Colman, 2013]. In each encounter, agents play a game which allows both cooperative and selfish strategies, typically the Prisoner’s Dilemma (PD) game [Shoham and Leyton-Brown, 2009], with the agents’ cumulative payoff from the series of encounters representing individuals’ success. Cooperation and defection are the only actions available to an agent playing the PD game. As described in Section 2.2, in the one-shot PD game, no matter what the other player does, defection is always the best strategy because it has a higher payoff. However, it has been proven that consistent defection is not always the best strategy among n agents (n > 2) playing the PD game repeatedly as we can see in Axelrod’s tournament [Axelrod, 1984]. In addition, some research studies show that evolution of cooperation is affected by several factors other than individual strategy choices, such as different population structure and the environment [Axelrod, 1997] [Majeski et al., 1997] [Barcelo and Capraro, 2015]. Our specific investigation is situated in the context of a game-theoretic model of an agent society with an energy-based life cycle within a spatially structured population 26 and a controllable level of environmental adversity introduced by Smaldino et al. [2013] as discussed in Section 2.5. Smaldino et al. [2013] show how the impact of a harsh environment upon the evolution of cooperation among self-interested agents can be demonstrated in the context of game-theoretic agent-based population modelling. The environmental adversity is represented by the environmental cost of living and the cost of unreciprocated cooperation (corresponds to the value of S in the payoff matrix). A moderately high level of adversity favours cooperation in the long run, and the cooperative behaviour will develop. When the environment is too harsh, neither cooperators nor defectors can survive. Throughout this thesis we refer to that model as the basic model. Agents in the basic model play with randomly selected opponents from their spatial neighbourhood, and they always play the strategy of pure cooperation or pure defection without having any information about other agents in the population. However, observations of animal species and human societies indicate that individuals use information, produced by the behaviour of other individuals, to facilitate their decision making, which can affect the cultural and biological evolution [Danchin et al., 2004] [Frith and Frith, 2012]. In addition, a variety of research studies have proved that cooperation can evolve if self-interested agents can have some partial knowledge about the past behaviour of their opponents [Bachi et al., 2010] [Awaya, 2014] [Heller and Mohlin, 2016]. Allowing agents to have some information about the behaviour of their opponent, while they are engaged in repeated interactions, was first studied in the Axelrod’s tournament [Axelrod, 1984]. In that tournament, an agent plays a series of PD games against the same opponent, and it only has information about that particular opponent. Observation of the Axelrod’s tournament and the basic model, has given 27 rise to the research question of whether the combination of harsh environment and repeated interaction with partial knowledge about the past behaviour of opponents can facilitate the evolution of cooperation in agent societies. In this study, we explore how the evolution of cooperation is affected by agents that are enabled to base their strategies on proactively gathered information about past cooperative or selfish behaviours of other agents by enhancing the basic model. In particular, we investigate if proactive information gathering can change the level of environmental harshness under which cooperation develops compared to the basic model. In this study, we investigate the impact of having partial knowledge about the behaviour of other agents on the evolution of cooperation in MAS. This new model intends to empower the cooperators’ success by enriching the basic model with a new element of information acquisition. Developing cooperative behaviours among self-interested agents in this research can be applied to more complex systems, where agents need to cooperate effectively by using partial knowledge about each other. The impact of proactive information gathering on the evolution of cooperation is explored through simulation experiments, and the results will be explained in Chapter 5. The design of this new model is expected to allow the agents to recognize cooperative and non-cooperative behaviours and adjust their behaviour accordingly. Therefore, deciding how advanced agents will use the gathered information is an important question to answer. More particularly, we are interested in investigating the impact of agents’ different strategies on the evolution of cooperative behaviours. 28 3.2 The IG-Model To answer the proposed research question, we need to compare the agents’ performance with and without information gathering (IG) technique. For that purpose, we adopt the premise of the basic model. In the basic model, agents always play their native strategy which is either cooperation or defection, therefore we call them naive agents. In the new model, named the IG-model, we add our desired features by introducing advanced agents. That modification will result in having a combined population of four types of agents: 1. naive-cooperators do not collect information and always play cooperation as their native strategy. 2. naive-defectors do not collect information and always play defection as their native strategy. 3. advanced-cooperators can selectively gather information about other agents, and their goal is developing cooperative behaviours. 4. advanced-defectors can selectively gather information about other agents. They try to play with cooperators to maximize their utility which leads to reproduction. 3.2.1 Advanced Agent Advanced agents’ information gathering involves two different parts. First, advanced agents can retain a personal game memory. When an advanced agent plays with another agent, it keeps a record of the identity of that agent and its actions. Also, 29 it can acquire information about the past behaviour of its opponent, which is saved as the history. The available information in each agent’s history includes its action in previous games, the actions of players with whom it played, and identity of those players. After gathering the information, advanced agents adjust their strategy based on the collected information, explained in the following section. 3.2.2 Strategy Adjustment Each advanced agent has a short-term and a long-term agenda, and it needs to move towards the defined agenda in choosing its strategy. An advanced cooperator avoids interacting with defectors and tries to develop cooperation, even if that action reduces its short-term benefit. In particular, an advanced cooperator can defect when it plays with a naive cooperator, knowing that a naive cooperator always cooperates, and defection has the highest payoff. However, the long-term agenda for an advanced cooperator is to help other cooperators to build sustainable clusters, which eventually leads to the evolution of cooperation. On the other hand, an advanced defector tries to guarantee its survival and reproduction; then it needs to interact with cooperators. However, the advanced defector does not need too many cooperators for its survival and reproduction. So, it is true that an advanced defector prefers to maximize its payoff in the short run, but it can have a long-term agenda of being part of a small cluster of cooperators and not allowing them to grow in size. Each agent plays its native strategy in its very first game by default. So, if an advanced agent gathers information about its opponent’s first game, it can detect the opponent’s identity as a cooperator or a defector. Then, the advanced agent 30 can select its strategy based on the gathered information. An advanced cooperator will play cooperation when its opponent cooperated in the first game, and it will play defection when its opponent defected in the first game. In the absence of such information, an advanced cooperator will cooperate. An advanced defector should engage in some measured cooperation with cooperators in order to keep enough of them alive for itself to survive. For that purpose, an advanced defector should be aware of the frequency of its encounters with cooperators (as determined by checking their first game). Therefore, to avoid losing too many cooperators, an advanced defector begins cooperating with them when the frequency of its encounters with cooperators falls below a certain threshold. When the frequency of encounters with cooperators rises above a predefined threshold, an advanced defector resumes pure defection. In the absence of any information about the opponent’s first game, an advanced defector will defect. The specified threshold is determined by frequency threshold ± deviation. The values of frequency threshold and deviation are calculated experimentally and explained in Chapter 5. In the absence of any information, an advanced defector will defect. Also, an advanced defector always plays defection with another defector. As discussed in Section 2.6, different models have been introduced to formalize the evolution of cooperation among agents through direct or indirect reciprocity with the use of trust, reputation, and social information [Mohtashemi and Mui, 2003] [Sabater and Sierra, 2002] [Sekaran and Sen, 1995]. In those models, information gathering mostly happens through observation, direct interaction, interviewing others, or following social networks [Sabater and Sierra, 2002]. In our model, an agent does not necessarily need to establish a direct interaction to collect information, and it only gathers factual historical information. Then, an agent can reason about its strategy 31 and adjust its strategy based on the information it receives about the behaviour of other agents. There are also some theoretical models in which cooperation can evolve by equilibriums. However, our model differs from those models in few key aspects. Unlike the model described in [Takahashi, 2010], the agents only gather information about the opponent’s first game in our model. Also, players can send a message to each other, prior to the game, in [Awaya, 2014] and [Bachi et al., 2010] models. However, there is no communication between the agents in our model. 3.3 Cluster studies and neighbourhood profile To have a better understanding of if and how cooperation evolves, one could investigate the change in the number of agents in the spatial neighbourhoods and the individual clusters. Smaldino et al. [2013] refer to a group of cooperators which can manage to stay alive with repeated mutual cooperation as a cluster of cooperators. The clusters are usually surrounded by defectors on the periphery. Those defectors which are not closed to a cluster of cooperators will die, since they cannot interact with cooperative partners. Measuring the composition of individual clusters would help to understand the property of clusters, which shrink in size or prosper and grow fast. To identify all the agents in a cluster explicitly, the Tarjan’s depth-first search (DFS) algorithm [Tarjan, 1972] has been used. In addition, evaluating the spatial environment of an agent of a given type in its spatial neighbourhood would help us to identify if and how the number of agents from one specific type is changing in another agent type’s neighbourhood. 32 Chapter 4 Simulator Architecture This chapter presents the simulator architecture. It begins by identifying the simulator requirements in Section 4.1. Section 4.2 describes the simulator structure. Section 4.3 elaborates upon the behavioural view of the simulator’s different components. Section 4.4 demonstrates our approach to the experimentation. 4.1 The Simulator Requirements The list of requirements for the simulator is divided in three categories: functional requirements, non-functional requirements and domain-specific requirements [Sommerville, 2011]. In the rest of this section, different components of each category are described. 33 4.1.1 Functional Requirements The functional requirements of the system are captured in the use case diagram represented in Figure 4.1. The use case diagram specifies different interactions between the simulator and the experimenters. The experimenter runs the simulator to initialize the experiments and receives the results. Each use case is described in more detail in the following sections. 4.1.1.1 Set and Run Experiment The experimenter can set an experiment and run it. This use case includes other use cases as follows: 1. Configure Experiment: The experimenter can configure a new experiment with respect to the desired models and parameters. (a) Set Parameters: The experimenter should initialize the environment, agents, game, and simulator parameters. i. Environment Parameters: The properties of the environment including the grid size and the environmental cost of living. ii. Agents Parameters: The experimenter can set the parameters for the agents. The agent parameters include agents’ initial and maximum allowed level of energy. In addition, the experimenter should indicate if agents receive a fixed or a random initial energy value. The experimenter can also configure the size of personal game memory and the discipline for checking the opponent’s history for advanced agents. 34 Figure 4.1: The use case diagram of the simulator iii. Game Parameters: The experimenter can set payoff values, which must 35 satisfy the constraints defined in Section 2.2. iv. Simulator Parameters: The experimenter can configure different parameters of the simulator including the initial number of agents and the maximum carrying capacity. (b) Set the Model : The experimenter can run different models, particularly the basic model and the IG-model. 2. Save Experiment: The experimenter can save the configuration for an experiment. 3. Load Experiment: The experimenter can load an experiment that has been saved previously. 4. Run Experiment: The experimenter can control the execution of an experiment. The experimenter can start and stop an experiment. Also, the experimenter can run an experiment, step-by-step, to monitor the course of the experiment. It is worth noting that an experiment would stop if the defined maximum number of rounds is reached or if all agents die. 4.1.1.2 Display Experiment 1. Display Experiment Course: The experimenter can invoke the simulator to display a graphical representation of an experiment; this includes displaying the environment as a square grid, and the agents with their movement as well as their reproduction or death. In addition, the experimenter can select to observe the behaviour of individual clusters. 2. Display Results: Once the experimenter enables a graphical representation of a simulator, the simulator shows a numerical and a graphical representation of 36 the final results. The final results indicate the total number of agents of each type that remain alive at the end of an experiment. 3. Debug Results: The experimenter can select to view the logs, which are generated by the simulator. The information in the log window describes the course of an experiment in greater detail and is used to study and analyze the results. 4.1.2 Non-functional Requirements 1. The simulator needs to provide the support for the processing and analyzing of simulation results. For that purpose, the simulator interoperates with MySQLTM database [MySQL R , 2017] to store the collected results from the experiments. The collected data can be viewed using the MySQLTM Workbench [Inan and Juita, 2011]. MySQLTM Workbench is the official integrated environment for MySQLTM to manage the database design and modeling. 2. The simulator should provide a graphic user interface (GUI) to have interactive experimentation. The GUI should represent the functional requirements of the simulator. 4.1.3 Domain-specific Requirements The domain-specific requirements specify the details of simulator structure, with respect to the models it simulates, and reflect the fundamental concepts of the multiagent system (MAS) and agents interaction. The simulator has been developed to study the evolution of cooperation among self-interested agents with and without 37 proactive information gathering. The models of multiagent systems studied by using the simulator are: the basic model, introduced by Smaldino et al. [2013], and the IG-model (described in Section 3.2). The domain-specific requirements for the above models are as follows: 1. The simulator should simulate MAS, which contains a number of agents, situated in an environment. 2. The simulator should represent an energy-based life cycle for the agents. 3. The simulator should support the presence of different types of agents, namely the naive agents and the advanced agents. 4. Individual agents need to perform various actions, which includes interacting with each other and moving in the environment. Also, individual agents need to be able reproduce new agents of their own type. 5. The agents interactions should follow the Prisoner’s Dilemma (PD) game structure. 6. Advanced agents should be able to adjust their strategy based on the gathered information, in order to study the impact of information gathering upon the evolution of cooperation, 7. The environment should present a source of adversity, called harsh environmental conditions. 8. The environment should be toroidal, which resembles a torus [Oxford University Press, 2017]. It should be noted that, the environment can present a source of uncertainty, in order to represent the limitations on the information acquisition. 38 4.2 The Simulator Structure To support the described requirements, our simulator contains three main components: controller, MAS-layout, and front end. The high-level structure of the simulator is shown in Figure 4.2. Figure 4.2: The high level structure of the simulator The controller is responsible for running and controlling the experiments. The MAS-layout represents the model for the simulation. Each model includes agents, an environment in which agents are located, and the interaction between the agents. The front end is a user interface which allows the experimenter to setup and control the 39 course of the experiment and presents the results. Each component is further comprised of other components. In the rest of this section, each component is explained in greater detail. 4.2.1 Controller The controller is the main component of the simulator. The functional components of the controller coordinate with each other to initialize and execute the experiments and collect the results. Figure 4.3 shows the internal structure of the controller. Figure 4.3: The controller 40 Each experiment involves a number of agents—divided into different types—which are acting on the same environment. Agents can attempt to play the PD game, move in the game space, and reproduce new agents. The design of the controller’s components supports agents’ actions and allows an experiment to proceed in discrete rounds. Executing an experiment begins with parameter initialization. The parameters manager is designed to manage the parameters’ configuration for the simulator. The experimenter sets the initial parameters through the front end. The parameters manager receives the parameters, validates the values, and passes them to the experiment handler. The experiment handler is responsible for executing an experiment. An experiment run refers to different instances of the same experiment, with the same parameter settings and independent randomization of random parameters under the same distributions. The experiment handler interacts with the parameters manager to receive the initial parameters setting and activate the runs. The run handler executes each run of an experiment and presents the final results when one run has ended. The final results indicate the total number of the agents of each type that have survived at the end of an experiment. Each run of an experiment consists of a specified number of rounds, and it will stop if it reaches the maximum number of rounds. The round handler manages the execution of each round in a run of an experiment. In each round, an agent can be interacting with other agents by playing the PD game or by moving in the environment. 41 The game handler executes the PD game between two agents and modifies each agent’s energy level based on the game payoffs. For the advanced agents, the game involves the information acquisition and strategic decision making, explained in the Section 4.2.2. Finally, there is a service component called logger, which works with the run handler to collect all generated logs in the simulator. The logs show the progress of an experiment in different rounds for different agents and can be used to debug the experiment’s results. 4.2.2 MAS-layout The MAS-layout represents a multiagent system. This component represents an evolving population of agents with an energy-based life cycle, situated in the environment. The agents are interacting with each other, moving in the game space, and reproducing if applicable. The MAS-layout’s components are described in the following: • Information Repository The MAS-layout contains the information repository which keeps records of all the agents’ interactions. The identity of agents, which played with each other, and their actions in each game are stored in the information repository. The advanced agents can interact with the information repository to acquire information about their opponents. • Agent The simulator provides a simple architecture for agents to support different models of population and agents’ interactions. All agents are capable of playing 42 the game, moving, and reproducing. In addition, advanced agents are able to acquire information from the information repository. The advanced agent architecture consists of a personal game memory, a memory updater, and a strategic reasoning engine (Figure 4.4). There is a corresponding record in an advanced agent’s personal game memory for each game it plays. Each record in an advanced agent’s memory includes the identity of one particular opponent and its action in each game. The capacity of the memory can be limited to a specific number, or it can be infinite. Each advanced agent acquires information about its opponents by checking their history through the information repository. Once an advanced agent has some new information about other agents in the population, it updates its personal game memory through the memory updater. It should be mentioned that the simulator is designed with the ability of presenting some limitations on information acquisition process. That feature can be used for the future works. The advanced agent then uses the information it has about its opponent to decide its strategy. The strategic reasoning engine allows the advanced agents to reason about their opponent, based on the information they have, and make decisions as discussed in Section 3.2.2. • Environment The environment provides the game space for agents to play, move, and reproduce. All agents’ interactions, movement, and reproduction happen on a toroidal square grid adopted from the basic model. The main component of the environment is the board. The board is a square grid with periodic boundaries. The agents are situated in unique cells on the board. 43 Figure 4.4: The advanced agent architecture 4.2.3 Front End The front end provides the experimenter’s access to the simulator. It allows the experimenter to observe and control the course of the experiment. The front end consists of a control panel and a graphical user interface (GUI). The internal structure of the front end and its connection to other components of the simulator are shown in the Figure 4.5. The experimenter can access the simulator and control an experiment through the control panel. The experimenter can initialize, run, save,and load an experiment by using the control panel. The experimenter can observe the course of simulation through the GUI. By observing the experiment, the experimenter can follow the change in the population in 44 general or in a specific round by stepping through an experiment. The GUI also shows the final results, which is the number of agents of each type that survived at the end of an experiment. Figure 4.5: The front end 4.3 The System Behaviour This section explains the behavioural aspects of the simulator’s components. The experimenter adjusts the parameters to generate an experiment through the control panel of the front end. The control panel window is shown in the Figure 4.6. 45 Figure 4.6: The control panel The experiment handler receives the initial parameters from the parameters manager at the beginning of each experiment, and it activates the run handler for executing different runs of the same experimental settings (Figure 4.7). The number of runs must be adequately large to assure the statistical significance of the results. 46 Figure 4.7: The mechanics of the experiment The run handler first initializes the environment and the population of agents for a new experiment run. Based on the specified model, the population can include four different types of agents: Advanced-Cooperator, Advanced-Defector, Naive-Cooperator, and Naive-Defector. Agents are randomly located in separate cells for each run of an experiment. Also, agents are assigned with an initial energy value. This value is either assigned randomly by the uniform distribution (U [a, b], where a is the minimum and b is the maximum) or constantly. Each run of an experiment consists of a number of rounds, specified by the experimenter. The run handler activities the round handler for executing each round (Figure 4.8). 47 Figure 4.8: The mechanics of the run At each round, every agent, that has not already played the game that round, searches its neighbourhood for an opponent that has also not already played that round. Note that each agent’s neighbourhood consists of eight closing cells. If the agent finds an opponent, they play the PD game and receive payoffs. The received game payoffs change the energy level of each agent. However, agents’ energy level is always capped at a constant amount. Therefore, an agent cannot accumulate energy without bound. If there is no opponent, the agent searches its neighbourhood to move to an empty 48 random cell. If an agent moves to another cell, it should wait for the next round to look for an opponent again. Thus, the agent is not able to move and play in the same round. If there is no empty cell in an agent’s neighbourhood, the agent stays in its current cell and waits for the next round (Figure 4.9). Figure 4.9: The mechanics of the round The game handler controls the course of playing the game between two agents. Naive agents play their native strategy in any instance of a game. The native strategy 49 of naive cooperators and naive defectors is cooperation and defection respectively. Advanced agents are expected to reason about their strategy based on the information they have obtained about the behaviour of other agents. The advanced agents are designed to play different strategies in order to achieve their short-term and long-term goals as explained in Section 3.2.2. At the end of each round, the cost of living is deducted from each agent’s energy level, and the round handler checks each agent’s energy level. If an agent’s energy meets the birth threshold, that agent is ready to reproduce a new agent. The reproduction is successful if there is an empty cell in the agent’s neighbourhood and the population maximum carrying capacity is not reached. Agents yield a predefined energy value to their child in the case of reproduction. A newly born agent is the same type of its parent. On the other hand, if an agent’s energy passes the death threshold, that agent is removed from the game space. Figure 4.10 shows the activity diagram for agents during one round. 50 51 Figure 4.10: The agent’s activity diagram During a round agents are in different states. The followings are the most important states which determine the agents’ behaviour based on their interaction and movement. The Figure 4.11 illustrates the state-machine representation for an agent during a round. • Prisoner : the agent has found an opponent. In this state, other agents cannot select a prisoner agent as an opponent anymore. • Moved : the agent has not found an opponent and, it has moved to another cell. Since an agent which moves cannot play the game in the same round, other agents are not able to select a moved agent as an opponent anymore. • Ready-to-Reproduce: the agent’s energy level has reached to the birth threshold, and it is ready to reproduce a new agent. If the ready-to-reproduce agent does not give birth to new agent, its state will be changed to the ready for the next round. • Ready: the agent is ready to find an opponent for playing the game or move to another cell. Ready is a final state in a round, which means when agent is ready it should wait for the next round to begin. • Dead : the agent’s energy level has passed the death threshold, and it should be removed from the game space. 52 Figure 4.11: Double-line boundaries indicate final states for an agent during one round. The final states are shown more than once for the purpose of better presentation. 53 When one round is over, the round handler modifies the population size based on the number of births and deaths. It saves the corresponding information in a database and passes that information to the run handler. The run handler checks if the run is over. In the case that all agents die or the maximum number of rounds is reached, one run is considered as finished. If the experimenter invokes the GUI, the experiment handler presents the final results through the GUI at the end of each run. By invoking the GUI, the experimenter can also select to watch the course of simulation at one specific round. Figures 4.12 and 4.13 show an example of the course of simulation in the GUI window. In addition, the simulator has a mouseover tool. The user can gather more information about one agent by moving the mouse over that particular agent. The agent’s ID, its strategy, and its energy appear on a small box. Figure 4.14 presents an example of mouseover tool. Also, one of the fundamental goals of designing the GUI is to visualize the collected data in the simulator. For that purpose, we design the report viewer, which is a visualization tool and provides the support for drawing charts. The accumulated data is displayed by various graphs using the report viewer. Figure 4.15 shows an example of a chart created by the report viewer. To study the individual clusters, one more tool has been added to the simulator, named the cluster viewer. The cluster viewer is used to identify the clusters in the course of a simulation. The experimenter can use the saved experimental results to run the cluster viewer. The cluster viewer is also empowered with some additional features such as moving back and forth between different rounds of the game, filtering clusters based on their size, and filtering agents based on their energy level. Figure 4.16 shows an example of cluster in the cluster viewer. 54 Figure 4.12: The blue and green dots are cooperators, and the red and purple dots are defectors Figure 4.13: The GUI window, when we only have two types of agents in the population 55 Figure 4.14: The Mouseover tool Figure 4.15: The report viewer 56 Figure 4.16: The Cluster and its information 4.4 Experimentation In the different series of experiments, we explore if and how the total number of agents from different types can be influenced by the proactive information gathering. The simulation models are based on the same parameter settings that were used in the basic model. However, we might need to change some parameters’ values to find the optimized situations in which cooperation evolves. In a preliminary round of experiments, we enable only one type of advanced agents, either cooperators or defectors, to figure out how they behave in different conditions. The increase or decrees of the number of agents depend on different factors, such as the cost of living, the birth threshold, and the population maximum capacity. Running experiments with a single type of advanced agents give us a better understanding of 57 their behaviour under different conditions. Once we we optimize the values for different parameters, In the next round of our simulation experiments, we explore the impact of proactive information gathering upon the behaviour of agents in the presence of both advanced cooperators and advanced defectors. 58 Chapter 5 Evaluation This chapter presents the simulation results for different models that employ the proactive information gathering technique, in comparison with the basic model. We first examine our simulator by running the basic model. Then, we compare the enhanced model through a series of experiments for different values of parameters representing environmental adversity, and agents’ energy level. 5.1 Parameter Space • board-size: the environment includes a 100 × 100 discrete square grid with periodic (toroidal) boundaries. • agent-initial-energy: each agent receives the fixed starting energy of 50. In the basic model of [Smaldino et al., 2013], agents are initialized by a random energy value with the uniform distribution. To eliminate the effect of random variables, 59 we initialized all the agents with the fixed value of the starting energy. It should be mentioned that we repeated all the experiments from the basic model with the fixed starting energy value for the sake of comparison. • agent-max-energy: the maximum allowed energy level for an agent is 150. • birth-threshold: the energy threshold required for reproduction is 100. • reproduction-energy: each time an agent reproduces, it yields 50 units of its energy to its child. • max-population: the maximum allowed population size is 5000 unless noted otherwise. • payoff: The payoffs are set as follows: T = 5, R = 3, and P = 0. • environmental adversity: the payoff value of unreciprocated cooperation, S, and the value of the cost of living vary in different experiments. • frequency threshold: the value of frequency threshold varies in different experiments. • deviation: the value of the deviation is set to 0.1. • number of runs: in order to get statistically significant results, each experimental setting has been examined for 50 runs. The number of agents is averaged over 50 simulation runs. It should be mentioned that each run of an experiment may produce different results due to some random factor, including the agents’ initial composition, their movement, and partner selection. • number of rounds: each run of an experiment includes 10000 rounds. 60 5.2 The Basic Model Smaldino et al. [2013] examine the impact of two factors related to environmental harshness. Under certain assumptions of the basic model, Smaldino et al. [2013] conclude the following results: 1. When the cost of living has a fixed value, and the cost of unreciprocated cooperation (corresponds to the payoff S) varies, the more punishing value for the cost of unreciprocated cooperation results in losses for cooperators at the beginning; however cooperative agents can survive in the long run. By decreasing the cost of unreciprocated cooperation, it is more difficult for cooperators to keep expanding cooperative clusters. Figure 5.1 shows the number of cooperators regarding the different values of S in our simulator, which closely resembles those from [Smaldino et al., 2013]. Also, Figure 5.2 presents the number of defectors with respect to the different values of S. Figure 5.1: The number of cooperators for different values of S when cost of living is fixed at 0.5. The graph presents the averaged results over 50 simulation runs. 61 Figure 5.2: The number of defectors for different values of S when cost of living is fixed at 0.5. The graph presents the averaged results over 50 simulation runs. Observing the population development in the basic model, when cost of living is constant and S varies, shows that there are some oscillations in the number of agents as we can see in Figures 5.1 and 5.2. One should notice that increasing the population capacity or running the desired experiments more than 10000 rounds, can result in a more stable situation for both cooperators and defectors as mentioned in [Smaldino et al., 2013]. Figure 5.3 presents an example of population development with respect to the number of cooperators and defectors, when population capacity has a higher value and experimental results are gathered beyond 10000 rounds. 62 Figure 5.3: The number of defectors and cooperators in the basic model when S is -1, cost of living is 0.5, population capacity is 9000, and number of rounds is 30000. The graph presents the averaged results over 50 simulation runs. 2. When the cost of unreciprocated cooperation is constant, and the environmental cost of living takes different values, the higher cost helps cooperators in the long run unless the cost of living is too high for any individual to survive. Figure 5.4 shows the number of cooperators regarding the different values of cost of living. Also, Figure 5.5 presents the number of defectors with respect to the different values of cost of living. It should be mentioned that the cost of living is reduced from all agents’ energy and affect both cooperators and defectors in the same way. Where S is fixed at 0, cooperators are not punished for interacting with defectors. Therefore, the population reached its maximum capacity and stabilized sooner unlike when S takes more punishing values. 63 Figure 5.4: The number of cooperators for different values of cost of living when S is fixed at 0. The graph presents the averaged results over 50 simulation runs. Figure 5.5: The number of defectors for different values of cost of living when S is fixed at 0. The graph presents the averaged results over 50 simulation runs. 5.2.1 Clusters and Neighbourhood Profiles As we can see in Figures 5.1 and 5.4, there is a drop in the number of cooperators at the beginning. Decreasing the number of cooperators eventually leads to the death of defectors, since defectors need to have the efficient amount of interactions with the cooperators to survive. However, if some cooperators can find each other and stay alive by repeated mutual cooperation, they can develop a cooperative cluster. Those defectors, which are closed to a cooperative cluster, 64 can survive. To have a better understanding of the described phenomena, we have developed two extra tools to study the behaviour of agents more precisely. Using those tools we study the neighbourhood profile of each agent type and individual clusters. In the study of individual clusters, we have identified two different cluster types. One cluster is the rising cluster, in which the number of agents increases and cluster grows. The other type of cluster is called the dying cluster, in which the number of agents decreases, and all the agents in that cluster die. Figures 5.6a, 5.6b, and 5.6c present an example of a dying cluster in the basic model. (a) Round 200 (b) Round 250 (c) Round 300 Figure 5.6: An example of a dying cluster in the basic model when cost of living is 1 and S is 0. Cooperators are gray and defectors are yellow. Figures 5.7a, 5.7b, and 5.7c present an example of a rising cluster in the basic model. Although the number of cooperators is less than the number of defectors on that cluster, it forms a pattern in which a small cluster of cooperators is surrounded by defectors. That pattern stays relatively stable over different rounds. (a) Round 350 (b) Round 400 (c) Round 450 Figure 5.7: An example of a rising cluster in the basic model when cost of living is 1 and S is 0. Cooperators are gray and defectors are yellow. 65 The observation of local neighbourhood profiles in individual clusters shows that: 1. In a dying cluster, the number of defectors in a cooperator’s neighbourhood is higher than the number of cooperators (Figures 5.8a and 5.8b). 2. In a rising cluster, the number of cooperators in a cooperator’s neighbourhood is higher than the number of defectors (Figures 5.9a and 5.9b). (a) Number of Agents (b) Cooperators’ neighbourhood Profile Figure 5.8: An example of the number of agents and cooperators’ neighbourhood profile in a dying cluster captured in Figure 5.6 (a) Number of Agents (b) Cooperators’ neighbourhood Profile Figure 5.9: An example of the number of agents and cooperators’ neighbourhood profile in a rising cluster captured in Figure 5.7 66 5.3 The IG-Models This section presents the results from the IG-models. There are three different sets of experiments in which we investigate the impact of information gathering (IG) upon the evolution of cooperation. In the first series of experiments, we have enabled only the cooperators to gather information, IG-AC model. In the second series of experiments, we only have the advanced defectors, IG-AD model. In the last series of experiments, we have enabled both defectors and cooperators to gather information, IG-ACD model. 5.3.1 Initial Agent Population The initial agent population can contain all the agent types or some of them based on the model. The initial number of agents in all the experiments is 1600 and is equally divided into the different types of agents. However, the actual population composition depends on the model. – In the basic model, the initial population includes 800 naive cooperators and 800 naive defectors. – In the IG-AC model, the initial population includes 400 naive cooperators, 400 advanced cooperators, and 800 naive defectors. – In the IG-AD model, the initial population includes 800 naive cooperators, 400 advanced defectors, and 400 naive defectors. – In the IG-ACD model, the initial population includes 400 naive cooperators, 400 advanced cooperators, 400 naive defectors, and 400 advanced defectors. 67 5.3.2 IG-AC To have a simple model to start, where we can interpret the result easier, we decided to set the advanced agents’ strategies as follows: • The advanced cooperator A (after the first game) checks the first game that its next opponent B has played. If there is none, A cooperates. Otherwise, it now knows whether B is a cooperator or a defector (because it knows B’s native strategy) and plays cooperation or defection accordingly. • All defectors are naive. 5.3.2.1 Observations and Analysis 1. Comparing the number of cooperators in the basic and IG-AC models for the different values of S (Figures 5.1 and 5.10) shows that cooperators have a better performance in the IG-AC model than the basic model. 2. Comparing the basic and IG-AC models for the different values of both S (Figures 5.1 and 5.10) and cost of living (Figures 5.4 and 5.11) shows that the number of cooperators which die at the beginning of an experiment in the basic model is higher than the number of cooperators which die at the beginning of an experiment in the IG-AC model. So, it is more difficult for cooperators to survive and reproduce without the information gathering technique. 3. When the population reaches its maximum allowed capacity, we can see some stability and fewer changes in the number of agents. The point of stability is reached sooner in the IG-AC model than the basic model. 68 Figure 5.10: The number of cooperators (naive and advanced) for different values of S when cost of living is fixed at 0.5 in IG-AC model. The graph presents the averaged results over 50 simulation runs. Figure 5.11: The number of cooperators (naive and advanced) for different values of cost of living when S is fixed at 0 in IG-AC model. The graph presents the averaged results over 50 simulation runs. 5.3.2.2 Clusters and Neighbourhood Profiles Studying neighbourhood profiles of agents in the IG-AC model show that on the one hand, the number of cooperators in both cooperators and defectors neighbourhood increases. On the other hand, the number of defectors in both cooperators and defectors neighbourhood decreases. Figure 5.12 shows an example of how the number of neighbours in each agent type neighbourhood changes over different rounds. 69 Figure 5.12: The number of neighbours in different types of agent’s neighbourhood when cost of living is 0.5 and S is -2.5 in IG-AC model. The graph presents the averaged results over 50 simulation runs. The graph presents the averaged results over 50 simulation runs. Like the basic model, there is a drop in the number of (naive) cooperators at the beginning of the IG-AC model as well. However, the number of advanced cooperators is increasing, which results in the higher number of cooperators in total. Due to the difference between the number of different types of cooperators, we will see some scattered clusters when an experiment starts. However, advanced cooperators grow fast in size. As a result, the clusters in which advanced cooperators are dominant, form really quick (Figures 5.13a, 5.13b, and 5.13c). (a) Round 50 (b) Round 100 (c) Round 150 Figure 5.13: Clusters’ development in the IG-AC model when cost of living is 0.5 and S is -2.25. Naive cooperators are gray and naive defectors are yellow. Advanced cooperators are blue and advanced defectors are red. 70 5.3.3 IG-AD Then, we decided to keep cooperators naive and introduce advanced defectors. As mentioned before, defectors need to interact with cooperators to gain enough energy for survival and reproduction. For this purpose, an advanced defector should be aware of the frequency of its encounters with cooperators (as determined by checking their first game). Therefore, to avoid losing too many cooperators, an advanced defector begins cooperating with them when the frequency of its encounters with cooperators falls below a certain threshold. When the frequency of encounters with cooperators rises above a predefined threshold, an advanced defector resumes pure defection. In the absence of any information about the opponent’s first game, an advanced defector will defect. The value of frequency threshold and deviation are calculated based on some experimental results. In the basic model, defectors are most successful when S is 0 and cost of living is 0.5. Under this condition, the frequency of their interaction with (naive) cooperators for (naive) defectors is 0.3. So, we used 0.3 as the best value for the frequency threshold, f . Running more experiments of the basic model showed that defectors are still successful when the initial number of cooperators is 0.2 of the initial number of defectors, but not when the initial number of cooperators is 0.15 of the initial number of defectors. Therefore, to ensure the stability, we use 0.1 as the best value for deviation, d. It should be noted that when frequency threshold has a higher value, defectors would cooperate with cooperators most of the time. That behaviour is not expected from a defector. However, we examined the other values for frequency threshold such as 0.5 and 0.7 to see how that can impact the general population development. 71 5.3.3.1 Observations and Analysis • Observations O.1 When the cost of living is fixed at 0.5, the number of cooperators decreases by decreasing the value of S (Figures 5.14, 5.16, and 5.18) which does not happen in the basic model. O.2 Increasing the cost of living has the same impact as we observe in the basic model and leads to the evolution of cooperation (Figures 5.15, 5.17, and 5.19). Figure 5.14: The number of cooperators for different values of S in the IG-AD model when cost of living is fixed at 0.5 and frequency threshold is 0.3. The graph presents the averaged results over 50 simulation runs. Figure 5.15: The number of cooperators for different values of cost of living in the IG-AD model when S is fixed at 0 and frequency threshold is 0.3. The graph presents the averaged results over 50 simulation runs. 72 Figure 5.16: The number of cooperators for different values of S in the IG-AD model when cost of living is fixed at 0.5 and frequency threshold is 0.5. The graph presents the averaged results over 50 simulation runs. Figure 5.17: The number of cooperators for different values of cost of living in the IG-AD model when S is fixed at 0 and frequency threshold is 0.5. The graph presents the averaged results over 50 simulation runs. Figure 5.18: The number of cooperators for different values of S in the IG-AD model when cost of living is fixed at 0.5 and frequency threshold is 0.7. The graph presents the averaged results over 50 simulation runs. 73 Figure 5.19: The number of cooperators for different values of cost of living in the IG-AD model when S is fixed at 0 and frequency threshold is 0.7. The graph presents the averaged results over 50 simulation runs. O.3 Comparing the basic model with the IG-AD model both defectors and cooperators are doing better in the IG-AD model than in the basic model when the frequency threshold has a higher value. Also, both defectors and cooperators are doing better in the IG-AD model than in the basic model, or the same as in the basic model when the frequency threshold has a lower value. Figures 5.20 and 5.21 show some examples of a comparison between the average number of agents in the basic model and in the IG-AD model. Figure 5.20: The number of cooperators in the basic and the IG-AD models when S is 0, cost of living is 0.5 and frequency threshold varies. The graph presents the averaged results over 50 simulation runs. 74 Figure 5.21: The number of defectors in the basic and the IG-AD models when S is -2.5, cost of living is 1 and frequency threshold varies. The graph presents the averaged results over 50 simulation runs. O.4 In the case of low cost of living, when the value of unreciprocated cooperation is zero, the number of naive defectors is higher than the number of advanced defectors. This difference is increasing when frequency threshold has a higher value (Figures 5.22a, 5.22b , and 5.22c). O.5 In the case of low cost of living, when unreciprocated cooperation has a more punishing value of -2.5, the number of naive defectors is lower than the number of advanced defectors. In this case, when the frequency threshold is low (0.3), even in some experiments, the actual number of naive defectors is higher than the advanced defectors. However, the difference between the number of advanced defectors and naive defectors increases by increasing the frequency threshold (Figures 5.23a, 5.23b, and 5.23c). 75 (a) Ft = 0.3 (b) Ft = 0.5 (c) Ft = 0.7 Figure 5.22: The number of advanced defectors and naive defectors in the IG-AD models when S is 0, cost of living is 0.5, and frequency threshold takes different values. The graph presents the averaged results over 50 simulation runs. (a) Ft = 0.3 (b) Ft = 0.5 (c) Ft = 0.7 Figure 5.23: The number of advanced defectors and naive defectors in the IG-AD models when S is -2.5, cost of living is 0.5, and frequency threshold takes different values. The graph presents the averaged results over 50 simulation runs. • Analysis A.1 Cost of unreciprocated Cooperation Under the environmental harshness conditions represented by the cost of unreciprocated cooperation, decreasing the value of S slows down the evolution of cooperation unlike the basic model (observation O.1). It should be noted that decreasing the value of S has a more direct impact on cooperators. A.2 Cost of Living Under the environmental harshness conditions represented by the cost of living, increasing the cost of living results in the evolution of cooperation (observation O.2). However, when the cost of living has a lower value, defectors, in general, are doing better than the cooperators. It worth to 76 mention that cost of living effects both cooperators and defectors in the same way. A.3 Environmental Harshness Based on the observation O.3, under the lower environmental harshness conditions represented by both unreciprocated cooperation and environmental cost of living, cooperators have a better performance in the IG-AD model than in the basic model. Also, under the higher environmental harshness conditions represented by both unreciprocated cooperation and environmental cost of living, which is in favour of cooperators, the number of defectors in the IG-AD model is higher than the number of defectors in the basic model. Therefore, enabling the advanced defectors to base their strategy on the gathered information: (a) helps defectors to grow more in size when the environment is too harsh, which does not happen in the basic model. (b) allows cooperators to sustain cooperation under the lower threshold of environmental harshness. A.4 Number of Defectors As mentioned in observation O.5, when the cost of living is low and unreciprocated cooperation has a more punishing value of -2.5, the number of naive defectors is lower than the number of advanced defectors. Also, the difference between the number of advanced defectors and naive defectors increases by increasing the frequency threshold. In this situation, cooperators start to disappear because of the punishing value of unreciprocated cooperation. As a result, the number of defectors is also decreasing. Therefore, advanced defectors begin playing based on the frequency to keep cooperators alive. Since advanced defectors receive the cooperation reward, 77 they are able to maintain their energy and increase in size, while naive defectors cannot grow that much because the cooperators around them are dying. To have a better understanding of the above theory, we have analyzed two additional factors about the agents in the population: (a) neighbourhood Profile In the study of neighbourhood profiles for each agent type, we can observe that advanced defectors have more cooperators in their neighbourhood in average when the frequency threshold has a higher value (Figures 5.24, 5.25, and 5.26). Figure 5.24: The number of cooperators in defectors’ neighbourhood in the IG-AD model when the frequency threshold is 0.3, cost of living is 0.5, and S is -2.5. The graph presents the averaged results over 50 simulation runs. Figure 5.25: The number of cooperators in defectors’ neighbourhood in the IG-AD model when the frequency threshold is 0.5, cost of living is 0.5, and S is -2.5. The graph presents the averaged results over 50 simulation runs. 78 Figure 5.26: The number of cooperators in defectors’ neighbourhood in the IG-AD model when the frequency threshold is 0.7, cost of living is 0.5, and S is -2.5. The graph presents the averaged results over 50 simulation runs. (b) Clusters Studying clusters in IG-AD model shows that a cluster with more advanced defectors on average is a rising cluster. However, a cluster with more naive defectors on average is a dying cluster or has a slower growth rate (Figures 5.27 and 5.28). This observation confirm our result from O.5 and its explanation in analysis A.1. Figures 5.29a, 5.29b, and 5.29c show an example of a rising cluster in IG-AD model. Figures 5.30a, 5.30b, and 5.30c show an example of a dying cluster in IG-AD model. Figure 5.27: The number of agents in a rising cluster in IG-AD model when the frequency threshold is 0.3, cost of living is 0.5, and S is -2.5 79 Figure 5.28: The number of agents in a dying cluster in IG-AD model when the frequency threshold is 0.3, cost of living is 0.5, and S is -2.5 (a) Round 100 (b) Round 150 (c) Round 200 Figure 5.29: An example of a rising cluster in the IG-AD model when the cost of living is 0.5, S is -2.5, and frequency threshold is 0.3. Cooperators are gray and defectors are yellow. Advanced cooperators are blue and advanced defectors are red. (a) Round 100 (b) Round 150 (c) Round 200 Figure 5.30: An example of a dying cluster in the IG-AD model when the cost of living is 0.5, S is -2.5, and frequency threshold is 0.3. Cooperators are gray and defectors are yellow. Advanced cooperators are blue and advanced defectors are red. In addition, studying local neighbourhood inside individual clusters show that: i. In a dying cluster, the number of naive defectors in a cooperator’s neighbourhood is higher than the number of advanced defectors (Figure 5.31). Therefore, while cooperators are surrounded by 80 naive defectors, there is a higher probability that all the agents in that cluster die. ii. In a rising cluster, the number of advanced defectors in a cooperator’s neighbourhood is higher than the number of naive defectors (Figure 5.32). Therefore, while cooperators are surrounded by advanced defectors, there is a higher probability that all the agents in that cluster survive and cluster grows. Figure 5.31: An example of cooperators’ neighbourhood profile in a dying cluster in the IG-AD model where cost of living is 0.5, S is -2.5, and frequency threshold is 0.3. Figure 5.32: An example of cooperators’ neighbourhood profile in a rising cluster in the IG-AD model where cost of living is 0.5, S is -2.5, and frequency threshold is 0.3. 5.3.4 IG-ACD In the IG-ACD model advanced defectors play based on the frequency and advanced cooperators play based on the opponent’s first game. The number of cooperators 81 in the IG-ACD model is always higher than in the basic model with respect to the different values of S and cost of living (Figures 5.33, 5.35, 5.36, and 5.37). In that case, the number of naive defectors is always higher than the number of advanced defectors (Figures 5.38a, 5.38b, 5.38c, 5.39a, 5.39b, and 5.39c) . Figure 5.33: The number of cooperators (naive and advanced) for different values of S in the IG-ACD model when cost of living is fixed at 0.5 and frequency threshold is 0.3. The graph presents the averaged results over 50 simulation runs. Figure 5.34: The number of defectors (naive and advanced) for different values of S in the IG-ACD model when cost of living is fixed at 0.5 and frequency threshold is 0.3. The graph presents the averaged results over 50 simulation runs. 82 Figure 5.35: The number of cooperators (naive and advanced) for different values of cost of living in the IG-ACD model when S is fixed at 0 and frequency threshold is 0.3. The graph presents the averaged results over 50 simulation runs. Figure 5.36: The number of cooperators (naive and advanced) for different values of S in the IG-ACD model when cost of living is fixed at 0.5 and frequency threshold is 0.7. The graph presents the averaged results over 50 simulation runs. 83 Figure 5.37: The number of cooperators (naive and advanced) for different values of cost of living in the IG-ACD model when S is fixed at 0 and frequency threshold is 0.7. The graph presents the averaged results over 50 simulation runs. (a) Ft = 0.3 (b) Ft = 0.5 (c) Ft = 0.7 Figure 5.38: The number of advanced defectors and naive defectors in the IG-ACD models when S is 0, cost of living is 0.5, and frequency threshold takes different values. The graph presents the averaged results over 50 simulation runs. (a) Ft = 0.3 (b) Ft = 0.5 (c) Ft = 0.7 Figure 5.39: The number of advanced defectors and naive defectors in the IG-ACD models when S is 0, cost of living is 1, and frequency threshold takes different values. The graph presents the averaged results over 50 simulation runs. Comparing Figures 5.10, 5.33 and 5.36 for different values of S, and Figures 5.11, 5.35, and 5.37 for different values of cost of living, shows that there is no visible 84 difference between the average number of cooperators in the IG-AC and IG-ACD model. As long as a portion of cooperators are able to tactically base their strategy on the collected information, cooperation evolves with or without the presence of advanced defectors. In addition, studying the neighbourhood profiles and individual clusters for the IG-ACD model shows the process of the evolution of cooperation in the same way we observed in the IG-AC model. 5.4 Performance Comparisons Observing the results from different models, we can identify different regions with respect to the outcomes of evolution. Those regions are separated from each other based on the values of cost of living and cost of unreciprocated cooperation, namely: 1. A region where the number of cooperators is higher than the number of defectors (C > D). 2. A region where the number of cooperators reached the population capacity and all the defectors died (All C = Cap). 3. A region where only cooperators survived, but they did not reach the population capacity (All C < Cap). 4. A region where the number of cooperators is lower than the number of defectors (D > C). 5. A region where the number of defectors reached the population capacity and all the cooperators died (All D = Cap). 85 Comparing the results from the basic model and IG-ACD model with frequency threshold of 0.3, we can conclude that the IG-ACD model outperforms the basic model. Figures 5.40 and 5.41 present the outcomes of evolution in the described regions. The regions have been captured at round 5000, where the population is stable (after this point, there are just a few changes in the number of agents which does not impact on different regions’ formation). Comparing the results (in the represented parameter space) show that: 1. There is no ‘All D’ region on the IG-ACD model. 2. The ‘D > C’ region in the IG-ACD model is smaller than in the basic model. 3. The ‘C > D’ region in the IG-ACD model is larger than in the basic model. 4. The ‘All C’ region in the IG-ACD model is larger than in the basic model. 5. In the case of high cost of living, only cooperators survive, and that region is the same in both models. However, in the examined experiments with 10000 rounds, cooperators are not able to fulfill the population maximum capacity. It should be noted that with the higher value of the frequency threshold, like 0.5 or 0.7, a defector acts more like cooperator and that can impact the total number of cooperators in the population. To eliminate that effect in the process of comparison, we adopt 0.3 as the value for frequency threshold. 86 Figure 5.40: The outcomes of the evolution in the basic model. The results has been averaged over 50 simulation runs at round 5000. Figure 5.41: The outcomes of the evolution in the IG-ACD model when the frequency threshold is 0.3. The results has been averaged over 50 simulation runs at round 5000. Considering the main research question of how enabling agents to dynamically adjust their strategies based on proactively collected knowledge of past behaviour of other agents can impact on the evolution of cooperation, we can conclude that: 1. cooperation can evolve under the lower threshold of environmental harshness represented by the cost of living in the IG-ACD model compared to the basic model. 87 2. cooperation can evolve under the lower threshold of harshness represented by the cost of reciprocated cooperation in the IG-ACD model compared to the basic model. 3. under the higher threshold of harshness represented by the cost of reciprocated cooperation, cooperators prevail over the defectors in the IG-ACD model compared to the basic model. 4. under the higher threshold of environmental adversity representing by cost of living, when cost of living is 2.5, cooperation develops in the same way in both models. Figures 5.42 and 5.43 present the outcomes of evolution in the described regions for the IG-AD model with frequency threshold of 0.3, and the IG-AC model respectively. We can see that only in one region, the IG-AC model outperforms the IG-ACD model. As mentioned before, the presence of advanced cooperators in the population, improves the conditions for the development of cooperation, compared to the basic model, regardless of the presence of advanced defectors. In addition, one can see that there is no visible difference between the basic model and the IG-AD model in some regions. However, as mentioned before, having only advanced defectors in the population increases the total number of defectors, when environment is moderately harsh, compared to the basic model. Also, the number of cooperators is influenced by the presence of advanced defectors in the IG-AD model, compared to the basic model. 88 Figure 5.42: The outcomes of the evolution in the IG-AD model when the frequency threshold is 0.3. The results has been averaged over 50 simulation runs at round 5000. Figure 5.43: The outcomes of the evolution in the IG-AC model. The results has been averaged over 50 simulation runs at round 5000. 5.4.1 Results Validation Student t-test is a statistical test which is widely used to compare the mean of two groups of samples. It is, therefore, to evaluate whether the means of the two sets of data are statistically significantly different from each other [Welch, 1947]. The t-test statistic value is calculated as shown in equation 5.1 where: 89 1. a and b are two independent samples. 2. ma and mb represent the means of samples a and b, respectively. 3. na and nb represent the sample size of samples a and b, respectively. 4. S 2 is an estimator of the common variance of the two samples. ma − mb t= q 2 S2 + Snb na (5.1) The common variance of two samples is calculated as shown in equation 5.2. Pn 2 S = P + nj=1 (xj − mb )2 na + nb − 2 i=1 (xi − ma ) 2 (5.2) We also can use the Welch t-test, where the variances of the two groups being compared are different. The formula for the Welch t- test is provide in equation 5.3 where: 1. a and b are two independent samples. 2. ma and mb represent the means of samples a and b, respectively. 3. na and nb represent the sample size of samples a and b, respectively. 4. Sa2 and Sb2 are the variance of samples a and b, respectively. ma − mb t= q Sb2 Sa2 + na nb 90 (5.3) Running both t-test and Welch t-test shows that the results from the IG-models with sample size 20 are significantly different from the results obtained from the basic model with 95% accuracy. In particular, the difference is statically significant in the regions, where IG-models are superior to the basic model with respect to the number of agents. It should be noted that we have 50 runs of each experiment settings which certainly confirms that the results from advanced models outperform the results from the basic model. To have a better understanding of the significant difference between the results of the different models, we can draw the error bars as presented in Figure 5.44. The error bars in that example indicated the standard deviation. Although one must perform a statistical test to draw a conclusion, we can say since standard deviation error bars do not overlap, results are significantly different. Figure 5.44: An example of comparing results from the basic and IG-ACD0.3 model when cost of living is 0.5 and S is 0 by interpreting the error bars . 91 Chapter 6 Conclusions and Future Work This thesis presents a new model for evaluating the impact of proactive information gathering (IG) upon the evolution of cooperation among self-interested agents. The IG-model inherits its basis from the model introduced by Smaldino et al. [2013], referred to as the basic model. The basic model demonstrates the effect of environmental adversity on the evolution of cooperation in a spatially distributed population of agents with energy-based life cycle, playing the Prisoner’s Dilemma (PD) game in neighbourhood encounters. The novel features of the IG-model allow agents to collect information about the past behaviour of other agents and adjust their strategy based on the gathered information and own memories of past encounters. The impact of proactive information gathering on the evolution of cooperation has been investigated in different variations of the IG-model through simulation experiments with the same parameter space as used in the basic model. Building on the basic model, we have analyzed different approaches to evaluate the impact of information gathering on the evolution of cooperation in a population. 92 We retain the naive cooperators and naive defectors from the basic model. The naive agents always play their native strategy, cooperation or defection, when they are engaged in an interaction with other agents. In addition, we introduce advanced cooperators and advanced defectors. The advanced agents are capable of adjusting their strategy based on their personal memory and gathered information about cooperation profiles of other agents. Our model differs substantially from similarly motivated published research; partly in the specifics of information gathering and use, and partly in the properties of the agent population, its environment, and environmental harshness concepts inherited from the basic model. Advanced agents use distinct advanced strategies motivated by their short-term and long-term agendas. While the design and investigation of advanced strategies remains an open research dimension, we demonstrate the adequacy of the model by introducing specific strategies of advanced cooperators and defectors and studying through simulation experiments how their addition to the basic model affects the evolution of cooperation in moderately harsh conditions, as represented by the cost of living and the payoff for unreciprocated cooperation. Advanced agents are introduced gradually, through several auxiliary models. The first step is IG-AC, in which advanced cooperators are present while all defectors are naive. The second step is IG-AD, in which advanced defectors are present while all cooperators are naive. Early experimentation with those two model variations provided useful feedback for the adjustment of strategy designs, leading to IG-ACD, in which both advanced cooperators and advance defectors exist alongside naive agents. The experimental results show that in the parameter space defined by the ‘harshness parameters’(the cost of living and the payoff for unreciprocated cooperation), the region in which the cooperators eventually prevail over defectors is substantially larger 93 in IG-ACD than in the basic model. Given the particular choice of IG-based advanced strategies, the results show that the presence of advanced agents demonstrates a strong positive impact of information gathering upon the development of cooperation and its eventual dominance. In a broader sense, the study demonstrates that the model introduced in this thesis can serve as a suitable framework for the study of evolution of cooperation under moderately harsh environmental conditions in the presence of IG agents with adjustable strategies. The agent’s strategic reasoning, based on the proactively gathered information, is an open research dimension. For future work, we can investigate the influence of information gathering on the evolution of cooperation where advanced agents are capable of playing more sophisticated strategies. We can further specify more complex inquiries for gathering the required information which helps an advanced agent to more completely characterize the true identity of its opponent. In this study, we have simplified our modeling in order to investigate the impact of information gathering upon the evolution of cooperation, without the impact of external factors. Returning to our original objective of designing practical intelligent agents capable of proactively developing cooperation in interaction with other agents, one may need to analyze the design requirements for specific real-world environments. 94 Bibliography C. Athena Aktipis. Know when to walk away: contingent movement and the evolution of cooperation. Journal of Theoretical Biology, 231(2):249–260, 2004. C. Athena Aktipis. Is cooperation viable in mobile organisms? simple walk away rule favors the evolution of cooperation in groups. Evolution and Human Behavior, 32: 263–276, 2011. Yu Awaya. Community enforcement with observation costs. Journal of Economic Theory, 154:173–186, 2014. Robert Axelrod. The complexity of Cooperation: Agent-based Models of Competition and Collaboration. Princeton University Press, first edition, 1997. Robert Axelrod and William D. Hamilton. The evolution of cooperation. Science, 211:1390–1396, 1981. Robert M Axelrod. The evolution of cooperation. Basic books, 1984. Benjamin Bachi, Sambuddha Ghosh, and Zvika Neeman. Prisoner’s dilemma with talk. Technical report, Discussion Paper, Tel Aviv, 2010. Hélène Barcelo and Valerio Capraro. Group size effect on cooperation in one-shot social dilemmas. Scientific reports, 5, 2015. 95 Eric Bonabeau. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, 99(suppl 3):7280–7287, 2002. Michael Bratman. Intention, Plans, and Practical Reason. Harvard University Press, 1987. Michael E. Bratman, David J. Israel, and Martha E. Pollack. Plans and resourcebounded practical reasoning. Computational Intelligence, 4:349–355, 1988. Kurt Brauchli, Timothy Killingback, and Michael Doebeli. Evolution of cooperation in spatially structured populations. Journal of theoretical biology, 200(4):405–417, 1999. Francesco Buccafurri, Domenico Rosaci, Giuseppe ML Sarnè, and Luigi Palopoli. Modeling cooperation in multi-agent communities. Cognitive Systems Research, 5 (3):171–190, 2004. Andrew M Colman. Game theory and its applications: In the social and biological sciences. Psychology Press, 2013. Étienne Danchin, Luc-Alain Giraldeau, Thomas J Valone, and Richard H Wagner. Public information: from nosy neighbors to cultural evolution. Science, 305(5683): 487–491, 2004. Michael Doebeli and Christoph Hauert. Models of cooperation based on the prisoner’s dilemma and the snowdrift game. Ecology letters, 8(7):748–766, 2005. Jim E Doran, SRJN Franklin, Nicholas R Jennings, and Timothy J Norman. On cooperation in multi-agent systems. The Knowledge Engineering Review, 12(03): 309–314, 1997. 96 Edith Elkind and Evangelos Markakis. Game theoretic foundations of multiagent systems. In Gerhard Weiss, editor, Multiagent Systems, chapter 17, pages 811–847. MIT Press, 2013. Joshua M Epstein et al. Zones of cooperation in demographic prisoner’s dilemma. Complexity, 4(2):36–48, 1998. LM Florı́a, C Gracia-Lázaro, J Gómez-Gardenes, and Y Moreno. Social network reciprocity as a phase transition in evolutionary cooperation. Physical Review E, 79(2):026106, 2009. Chris D Frith and Uta Frith. Mechanisms of social cognition. Annual review of psychology, 63:287–313, 2012. Herbert Gintis. Game theory evolving: A problem-centered introduction to modeling strategic behavior. Princeton university press, 2000. William D Hamilton. The evolution of altruistic behavior. The American Naturalist, 97(896):354–356, 1963. Dirk Helbing. Agent-based modeling. In Social self-organization, pages 25–70. Springer, 2012. Yuval Heller and Erik Mohlin. Observations on cooperation. Available at SSRN, 2016. Dedi Iskandar Inan and Ratna Juita. Analysis and design complex and large data base using mysql workbench. International Journal of Computer Science & Information Technology, 3(5):173, 2011. Nicholas R Jennings. Controlling cooperative problem solving in industrial multiagent systems using joint intentions. Artificial intelligence, 75(2):195–240, 1995. 97 Sarit Kraus. Negotiation and cooperation in multi-agent environments. Artificial intelligence, 94(1-2):79–97, 1997. Sean Luke, Claudio Cioffi-Revilla, Liviu Panait, Keith Sullivan, and Gabriel Balan. Mason: A multiagent simulation environment. Simulation, 81(7):517–527, 2005. Stephen Majeski, Greg Linden, Corina Linden, and Aaron Spitzer. A spatial iterated prisoners dilemma game simulation with movement. In Simulating Social Phenomena, pages 161–167. Springer, 1997. Mojdeh Mohtashemi and Lik Mui. Evolution of indirect reciprocity by social information: the role of trust and reputation in evolution of altruism. Journal of Theoretical Biology, 223(4):523–531, 2003. AB MySQL R . MySQL5.7 reference manual. https://dev.mysql.com/doc/refman/ 5.7/en/, 2017. Martin A. Nowak. Five rules for the evolution of cooperation. Science, 314:1560–1563, 2006. Martin A Nowak and Karl Sigmund. Evolution of indirect reciprocity by image scoring. Nature, 393(6685):573–577, 1998. Martin A Nowak, Corina E Tarnita, and Tibor Antal. Evolutionary dynamics in structured populations. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 365(1537):19–30, 2010. Jonathan Asante Nyantakyi. Msc thesis [in progress]. UNBC, 2018. Oxford University Press. Oxford dictionaries online. oxforddictionaries.com/definition/toroidal, 2017. 98 https://en. Parag C Pendharkar. Game theoretical applications for multi-agent systems. Expert Systems with Applications, 39(1):273–279, 2012. Stuart J. Russell and Peter Norvig. Artificial Intelligence - A Modern Approach. Prentice Hall, 1995. Stuart Jonathan Russell and Peter Norvig. Artificial intelligence: a modern approach (3rd edition), 2009. Jordi Sabater and Carles Sierra. Reputation and social network analysis in multi-agent systems. In Proceedings of the first international joint conference on Autonomous agents and multiagent systems: Part 1, pages 475–482. ACM, 2002. Norman Salazar, Juan A Rodriguez-Aguilar, Josep Ll Arcos, Ana Peleteiro, and Juan C Burguillo-Rial. Emerging cooperation on complex networks. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 669–676. International Foundation for Autonomous Agents and Multiagent Systems, 2011. Mahendra Sekaran and Sandip Sen. To help or not to help. In Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, pages 736–741, 1995. Yoav Shoham. Agent-oriented programming. Artificial intelligence, 60(1):51–92, 1993. Yoav Shoham and Kevin Leyton-Brown. Multiagent systems : algorithmic, gametheoretic, and logical foundations. Cambridge University Press, 2009. Karl Sigmund and Martin A Nowak. Evolutionary game theory. Current Biology, 9 (14):R503–R505, 1999. 99 Paul E. Smaldino and Jeffrey C. Schank. Movement patterns, social dynamics, and the evolution of cooperation. Theoretical Population Biology, 82:48–58, 2012. Paul E. Smaldino, Jeffrey C. Schank, and Richard McElreath. Increased costs of cooperation help cooprators in the long run. The American Naturalist, 181:451– 463, 2013. John Maynard Smith. Evolution and the Theory of Games. Cambridge University Press, 1982. John Maynard Smith and G. R. Price. The logic of animal conflict. Nature, 146: 15–18, 1973. Ian Sommerville. Software Engineering. Pearson, 2011. György Szabó and Gabor Fath. Evolutionary games on graphs. Physics reports, 446 (4):97–216, 2007. Satoru Takahashi. Community enforcement when players observe partners’ past play. Journal of Economic Theory, 145(1), 2010. Robert E Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972. Karl Tuyls and Kagan Tumer. Multiagent learning. In Gerhard Weiss, editor, Multiagent Systems, chapter 10, pages 423–475. MIT Press, 2013. John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University press, 1944. Jörgen W. Weibull. Evolutionary Game Theory. MIT Press, first edition, 1997. Bernard L Welch. The generalization ofstudent’s’ problem when several different population variances are involved. Biometrika, 34:28–35, 1947. 100 David Sloan Wilson and Daniel Tumminelli O’Brien. Evolutionary theory and cooperation in everyday life. In Simon A. Levin, editor, Games, groups, and the global good, pages 155–168. Springer, 2009. David Sloan Wilson and Edward O. Wilson. Rethinking the theoretical foundation of sociobiology. The Quarterly Review of Biology, 82:327–348, 2007. Michael Wooldridge. An Introduction to MultiAgent Systems. Wiley, second edition, 2009. Michael Wooldridge and Nicholas R. Jennings. Intelligent agents: Theory and practice. Knowledge Engineering Review, 10(2):115–152, 1995. 101