A N IN T E R A C T IO N PR O TO C O L FO R B ID IR E C T IO N A L D E LIB ER A T IO N ON D IR E C T HELP IN A G E N T T EA M W O R K by MO J TAB A M ALEK A K H LA G H B.Sc., Physics, University of Guilan, Iran, 2007 M aster of Information Technology, M ultimedia University, Malaysia. 2010 THESIS SUBM ITTED IN PARTIAL FULFILM ENT OF TH E REQUIREM ENTS FOR TH E D EG REE OF M ASTER OF SCIENCE IN COM PUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA April 2015 0 M ojtaba Malek Akhlagh. 2015 UMI Number: 1526532 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation PiiblishMig UMI 1526532 Published by ProQuest LLC 2015. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Abstract This thesis proposes a new interaction protocol for direct help in agent teamwork. It addresses design questions th a t may arise in practical systems development, and achieves higher teamwork performance impact than previous versions of the Mutual Assistance Protocol (MAP). Direct help, such as performing an action on team m ate’s behalf, is deliberated by team members as need arises, rather than imposed by team organization or centralized mechanisms. The deliberation can start with a request for help, or with an offer of help; the two design principles have been embodied in two distinct versions of MAP. Based on their observed complementarity, we refine and combine them into a single protocol th at leverages their individual advantages. Its novel features let an agent initiate help deliberation with request or offer, and also simultaneously provide and receive help. Simulation experiments dem onstrate its team performance gains while varying the environment dynamism, agent resources, and communication costs. Contents A bstract i List of Figures v Previously Published Work vii Acknowledgements viii D edication ix 1 Introduction 1 2 Background and R elated Work 6 2.1 Multiagent S y stem s.......................................................................................... 6 2.2 Agent T e a m w o rk ............................................................................................. 9 2.3 Helpful Behavior in Agent T e a m s .................................................................... 11 2.4 Agent Interaction P r o to c o l s ..............................................................................13 2.5 The M utual Assistance Protocol (MAP) ....................................................... 15 2.5.1 MAP Characterization and P rin cip les..................................................16 2.5.2 Metrics for Help D e lib e ra tio n ...............................................................18 ii 2.6 The Requester-Initiated Action M A P ..............................................................20 2.7 The Helper-Initiated Action M A P .................................................................... 22 2.8 The Agent Interaction Modeling S im u la to r....................................................24 3 Refinements of A ction M A P 25 3.1 The Motivation and Research O b je c tiv e s ....................................................... 25 3.2 The Refined Agent Team M o d e l....................................................................... 28 3.3 The Refined Requester-Initiated Action MAP 3.4 The Refined Helper-Initiated Action M A P .................................................... 31 3.5 The State-Machines of the Refined P ro to c o ls .................................................34 4 The Bidirectionally Initiated Action M A P ............................................. 29 39 4.1 Requester vs. Helper-Initiated B e h a v io rs ....................................................... 40 4.2 Combining the One-Sided P ro to c o ls.................................................................42 4.3 The Bidirectionally Initiated Action M A P ....................................................43 4.4 The State-Machine Representation of B I A M A P .......................................... 49 5 Evaluation 5.1 53 The Simulation M o d els........................................................................................54 5.1.1 The M icrow orld..........................................................................................55 5.1.2 The Simulation Models of Action M A P ............................................... 57 5.2 Performance C o m p a ris o n s ................................................................................. 59 5.2.1 The Param eter Settings ......................................................................... 59 5.2.2 Optimizing the W a te rm a rk s ................................................................... 60 5.2.3 The Team Performance Impact of RIAMAP* and HIAMAP* . . 64 5.2.4 Complementary Profiles of RIAMAP* and H IA M A P*..................... 68 5.2.5 The Team Performance Impact of B I A M A P ......................................72 iii 6 Conclusions and Future Work Bibliography List of Figures 2.1 The RIAMAP bidding sequence........................................................................... 20 2.2 The HIAMAP bidding sequence........................................................................... 23 3.1 The RIAMAP* bidding sequence........................................................................ 30 3.2 The HIAMAP* bidding sequence........................................................................ 32 3.3 The RIAMAP* state-machine representation................................................... 36 3.4 The HIAMAP* state-machine representation................................................... 37 3.5 A RIAMAP* sample state specification............................................................ 38 3.6 A HIAMAP* sample state specification............................................................ 38 4.1 BIAMAP case 1: A* has not sent an offer or a request.................................. 45 4.2 BIAMAP case 2: A* has sent a request and has not sent an offer. . . . 46 4.3 BIAMAP case 3: At has sent an offer and has not sent a request. 47 4.4 BIAMAP case 4: A* has sent an offer and a request.......................................48 4.5 The BIAMAP state-machine representation...................................................... 51 4.6 A BIAMAP sample state specification...............................................................52 5.1 The board game m icro w o rld ............................................................................... 56 5.2 Team scores vs. W L L ........................................................................................... 61 5.3 Team scores vs. W HH 5.4 Team scores vs. W LL and W H H ....................................................................... 63 ... ........................................................................................62 v 5.5 Team scores vs. D istu rb an ce...............................................................................64 5.6 Team scores vs. Communication C o s t............................................................... 66 5.7 Team scores vs. Initial R e s o u rc e s .....................................................................67 5.8 Team scores vs. Disturbance and Initial R esources........................................69 5.9 Team scores vs. Disturbance and Communication C o s t ..............................70 5.10 Team scores vs. Communication Cost and Initial Resources....................... 71 5.11 Team scores vs. Disturbance and Initial R esources........................................73 5.12 Team scores vs. Disturbance and Communication C o s t .............................. 74 5.13 Team scores vs. Communication Cost and Initial R esources........................ 75 vi Previously Published Work Portions of this thesis appear in the following paper: M ojtaba Malek Akhlagh and Jernej Polajnar. Distributed deliberation on direct help in agent teamwork. In Proceedings of the 12th European Conference on Multi-Agent Systems (EUMAS 2014), Prague, Czech Republic, December 2014. To appear as LNAI 8953, Springer, 2015. Acknowledgements I would like to sincerely express my gratitude to my supervisor, Dr. Jernej Polajnar, for his invaluable support throughout all stages of this thesis. Such a study would not happen today without his experience, advice, encouragement, and patience. I am grateful to my committee member, Dr. Desanka Polajnar, for her support, advice, and encouragement. I am thankful to Dr. Iliya Bluskov for his support and interest in serving on my supervisory committee. I would like to thank my colleague, Denish Mumbaiwala, for his help and input on the simulation experiments. I am also thankful to Narek Nalbandyan and Omid Alemi for sharing their experience and insight on the subject. This Thesis is dedicated to my family Nahid M ohseni, Esmaeil M alek Akhlagh, and M oein M alek Akhlagh for their endless love, support, and encouragement. Chapter 1 Introduction Teamwork in multiagent systems is a rapidly evolving field of study with rising re­ search motivations in the last two decades. Agents are autonomous intelligent entities, situated in an environment, and performing tasks th at are delegated to them. They can react to changes in the environment, formulate and proactively pursue their own goals, and interact socially with other agents and humans [Wooldridge, 2009]. In particular, intelligent agents can exhibit collaborative behavior in a team in order to achieve a joint objective. The groundwork for studies of agent teamwork has been provided in the fundamental papers by Levesque et al. [1990], Cohen and Levesque [1991], Wooldridge and Jennings [1994], Rao and Georgeff [1995], Grosz and Kraus [1996], and Dunin-Keplicz and Verbrugge [2010]. Early work also included develop­ ment of multiagent software platforms [Georgeff and Lansky, 1987] and applications of agent teamwork [Ljungberg and Lucas, 1992]. More recently, technological advances in intelligent distributed systems increasingly motivate research on agent teamwork from an engineering perspective. 1 The collaboration in an agent team can incorporate helpful behavior among the teammates, inspired by effeetive human teamwork context. The potential benefits of helpful behavior to human team performance are recognized in management practice [LePine et al., 2000]. Helpful behavior in agent teamwork refers to a collaborative act performed by one agent in order to assist its teammate. Different mechanisms for assistance between teammates by executing actions or providing information have been proposed in the last two decades [Itoh, 1991, Miceli et al., 1994, Yen et al., 2004, Cao et al., 2005, Fan et al., 2005, Kamar et al., 2009, Polajnar et al., 2012]. Helpful behavior among agents can be enforced by their global team organization or a centralized meehanism, or it can be specified as direct help, which is initiated by individual agents as need arises and bilaterally agreed upon by the agents involved in the help act. Direct help is specified as a possible component, of team strategies for decentralized and reactive adjustment of team behavior to an unpredictable changing environment [Polajnar et al., 2014], The growing practical importance of multiagent systems motivates the investigation of potential benefits in team performance from incorporation of direct help mechanisms. In order to investigate the practical impact of direct help upon team performance from an engineering perspective, one needs to develop interaction mechanisms. Agent interaction protocols specify interactive behavior of individual agents dealing with each other. Nalbandyan [2011] and Polajnar et al. [2012] introduce the M utual Assis­ tance Protocol (MAP), with its subsequent elaboration in [Nalbandyan et al., 2013], which specifies a direct help mechanism for an agent team with heterogeneous skill profiles, situated in a dynamically changing environment. The mechanism is based on a bilateral distributed agreement in which both requester and helper agents jointly decide on performing a help act, in contrast with unilateral approaches as in [Ka- 2 mar et al., 2009], in which only one side of the help transaction makes the decision. MAP has two different generic types: Action MAP, th a t enables teammates to help each other by directly performing actions, and Resource MAP, th a t enables team­ mates to help each other by providing the needed resources. Simulation experiments in [Polajnar et al., 2012] suggest behavioral advantages of employing Action MAP over unilateral help protocols when communication cost is relatively low compared to agents’ action costs. Deliberation on possible direct help can be initiated in two different ways. In the requester-initiated approach, the opening message of the help transaction is a request for help; in the helper-initiated approach, the opening message is an offer of help. Simulation experiments in [Nalbandyan et al., 2013] show th a t the Requester and Helper-Initiated Action MAP (RIAMAP and HIAMAP) have complementary im­ pacts on team performance with respect to varying levels of environment dynamism, agent resources, and communication cost. Where one dominates, the other is often weak, but jointly they maintain superiority over all other Action MAP versions across the param eter space. This observation has raised the research question of whether a possible combination of the two different approaches might exhibit a superior perfor­ mance. In this thesis, we approach this research question by designing new protocols, developing simulation models for them, exploring their impact on team performance through simulation experiments, and adjusting our model parameters for optimum performance. We study agent interaction protocols for direct help in the context of a board game microworld, inspired by the Colored IVails game [Gal et al., 2010], and used in previous studies of MAP [Polajnar et al., 2012, Nalbandyan et al., 2013]. For modeling and simulation, we use the Agent Interaction Modeling Simulator (AIMS) 3 framework introduced by Alemi et al. [2014], which allows concurrent simulation of multiple teams in identical dynamic environments. It facilitates representation of agent interaction protocols and modeling of environment dynamism, and provides an interactive simulation mode to gain direct insights into behavior comparisons between teams th at employ different help protocols, but are otherwise identical and operate in identical dynamic environments. In particular, we investigate an interaction protocol design th a t combines the requester and helper-initiated approaches for deliberation on direct help. We start with the observation th a t in the existing requester and helper-initiated protocols, RIAMAP and HIAMAP, an agent is only allowed to either provide or receive help at any given time. We refine the existing protocols to enable agents to provide and receive help simultaneously. The refined protocols, RIAMAP* and HIAMAP*, are then composed into a single new protocol th at allows help deliberation to be initiated by requesters as well as potential helpers. The new protocol, called the Bidirectionally Initiated Action MAP (BIAMAP), allows an agent to initiate help deliberation by either a request or an offer, and to provide and receive help simultaneously. The design objective of BIAMAP is to specify an interactive behavior model for direct help among agents that incorporates both proactive requesting and proactive offering behaviors and optimizes the balance between them in order to leverage the advantages of each across a space of parameters representing environment dynamism, agent resources, and communication cost. We represent the new interaction protocols using a particular state-machine for­ malism, translate them to executable code in the AIMS framework, and investigate them through simulation experiments across the param eter space. We first evaluate the team performance impact of enabling individual agents to simultaneously provide 4 and receive help, and then the team performance impact of bidirectional deliberation on direct help. The simulation results show th at RIAMAP* and HIAMAP* outper­ form RIAMAP and HIAMAP, respectively, in most situations. They also show the superiority of BIAMAP over both RIAMAP* and HIAMAP* in most of the parameter space. In summary, Nalbandyan [2011] and Polajnar et al. [2012] introduce MAP as a better performing interaction protocol in comparison to the unilateral protocols in situations when the communication costs are relatively low compared to the actions costs. RIAMAP and HIAMAP are introduced in [Nalbandyan et al., 2013] as two ad­ vanced variations of MAP, which outperform the basic MAP version and demonstrate complementary behaviors over the param eter space. In this thesis, first we introduce RIAMAP* and HIAMAP*. their refined versions demonstrating performance gains, and finally introduce BIAMAP, a composition of RIAMAP* and HIAMAP*, being the best-performing variation of the MAP family using the same param eter settings. The rest of this thesis is organized as follows. Chapter 2 covers the background and related work, Chapter 3 describes the research objectives and presents refine­ ments of Action MAP, Chapter 4 presents the Bidirectionally Initiated Action MAP (BIAMAP), Chapter 5 presents an evaluation of the new protocols in simulation ex­ periments, and Chapter 6 presents the conclusions and future work. 5 Chapter 2 Background and Related Work In this chapter, we overview some groundwork together with related recent studies in multiagent systems, agent teamwork, helpful behavior in agent teams, agent intraction protocols, and the M utual Assistance Protocol (MAP) family, with the focus relevant to this thesis. 2.1 M ultiagen t S ystem s According to [Wooldridge, 2009], an agent is a computer system situated in some environment, holding some level of autonomy in order to satisfy its design objectives; and a multi-agent system is a system composed of multiple interacting agents within an environment. Shoham and Leyton-Brown [2009] describe multiagent systems as “systems th a t combine multiple autonomous entities, each having diverging interests or different information or both” . 6 Wooldridge and Jennings [1995] suggest th a t in order to consider an agent to be intelligent, the agent is expected to hold the following capabilities. An agent needs to be reactive, i.e., able to perceive the environment and respond to occurring changes in a timely manner; proactive, i.e., able to take initiatives and perform goal-directed behavior; and social, i.e., able to interact with agents and possibly humans. The environments th at agents might be situated in can have some general prop­ erties classified by Russell and Norvig [1995]. For instance, an environment can be deterministic or non-deterministic. In a deterministic environment, each agent’s ac­ tion has a single guaranteed effect, and there is 110 uncertainty regarding the resulting environment state due to the action. Also, an environment can be static or dynamic. In a static case, the environment state only changes by the agent’s actions; while in a dynamic case, there are other processes influencing the environment beyond the agent’s control. In a basic model of agents interacting with their environments, the environment is initially in some state, the agent perceives the current state and per­ forms an action in order to influence the environment and achieve a desirable state; the environment may respond to the action by transiting to another state which in ad­ vance may be unknown to the agent, due to the environment being non-deterministic or dynamic. Hence, the agent may fail and try to perform another action in order to achieve its goal; and so on. The states of the environment can be associated with real values represented by a utility function, specifying how good each state is, letting the agent to deliberate on what to do. Practical reasoning [Bratman, 1987] is a particular model of decision-making in contrast with the traditional approach of theoretical reasoning, also known as symbolic artificial intelligence. While theoretical reasoning is directed towards beliefs, practical reasoning is directed toward actions. In theoretical reasoning, logical theorems are 7 used as models of decision making and agents act as theorem provers. In practical reasoning, the decision making process consists of two distinct activities. The first one, called deliberation, answers the question of what state of affairs to achieve, and the second one, called means-ends reasoning, discovers how to achieve these states of affairs. The output of deliberation is intention and the output of means-ends reasoning is plan. The practical reasoning agents are endowed with mental states similar to the ones in human mind, which are used in their decision making processes. The most popular practical reasoning architecture has been the Belief-DesireIntention (BDI) formalism. In this framework, the agent system is viewed as a ra­ tional agent with mental attitudes of belief, desire, and intention, representing the informative state, the motivational state, and the deliberative state of the rational agent, respectively. The behavior of the system is then determined by these mental attitudes and they are critical for desired performance when deliberation is subject to resource bounds [Bratman et al., 1988]. Modal logic, by which statements such as “necessarily true” or “possibly true” can be expressed [Hughes and Cresswell, 1996, Blackburn et al., 2001], has been applied for modelling and axiomatization of BDI agents [Rao and Georgeff, 1991, 1993, Georgeff and Rao, 1995]. The Procedural Reasoning System (PRS), developed by Georgeff and Lansky [1987], was one of the first and best known agent architectures to explicitly embody the BDI paradigm. It has been applied in several practical multiagent applications th a t are successfully used in the real world, such as the OASIS air traffic management system [Ljungberg and Lucas, 1992], and the SPOC business process management system [Georgeff and Rao, 1996]. Rao and Georgeff [1995] address two main criticisms against formalization of BDI agents: the necessity or adequacy of the three mental attitudes, and the utility of applying modal logic th at does not have complete axiomatizations and cannot be efficiently computed. They argue the necessity of all three mental attitudes in appli­ cation domains where real-time performance is required. They also show that, with certain simplifying assumptions, it is possible to build practical BDI systems. 2.2 A gen t Team work Agent teamwork refers to the collaboration of individual agents who are committed to accomplishing of a particular task. The research on agent teamwork is an active field of study, and there have been different approaches to formalization of the mental states and semantics needed for agents’ collaborative behavior. The agent teamwork research started with the BDI framework [Cohen and Levesque, 1990, Levesque et al., 1990]. Cohen and Levesque [1991] formalize the notions of joint action, joint commitment, and joint intention. They argue th a t a joint action involves more than just the union of simultaneous individual actions even when the actions are coordinated, and it is done by individuals sharing certain specific mental properties. In their work, they investigate joint commitment and joint intention specifications based on individual cases such th at a team of agents acts as an aggregate agent. The individual agents are situated in a dynamic environment in which there is potential divergence of mental state. This potential divergence makes the design specifications of teamwork complicated since there is tension in acting as an aggregate agent. For an agent team with a common goal, in order to hold the team together while letting agents have their own private beliefs, each agent treats the common goal as a weak achievement goal, i.e., the agent considers the possibility th a t any agent in the team may have privately come to believe th a t the goal is either achieved, unachievable, or 9 irrelevant, and is committed to make th at belief mutually known to all other agents. Therefore, if any agent comes to believe th a t the goal is accomplished, not achiev­ able, or irrelevant, it drops the common goal, but has a goal th at the new status be mutually known by all the team members. The Distributed Constraint Optimization Problem (DCOP) [Mailler and Lesser, 2004, Modi et al., 2005] is another framework for modeling teamwork in multiagent systems. In this framework, the agents cooperate as a team and share a common reward function. This framework is useful for modelling distributed reasoning in multi-agent domains for being able to perform optimization over a set of distributed constraints and having agents’ information private. Formally, for each agent in the team, there is a variable from a set V = { x i,x 2, over whose value the agent has control. Variable x* can take on any value from a discrete finite domain Dt. The goal of optimization is to change the variable values so th a t the sum over a set of binary constraints and associated payoff or reward functions is maximized. DCOPs have been applied to meeting scheduling problems, allocating tasks, and coordinating teams of agents [Taylor et al., 2011]. The Partially Observable Markov Decision Process (POM DP) [Kaelbling et al., 1998] provides a mathematical framework to model sequential decision-making with uncertainty. POMDPs consider uncertainty in both agent’s observations and actions. An agent in this model keeps a probability distribution over a set of possible states. They have been applied to real world domains including robot navigation and ma­ chine maintenance [Taylor et al., 2011]. Decentralized POMDPs (DEC-POMDPs) are frameworks to model this behavior in multi-agents systems. In general, finding an optimal joint policy for general DEC-POMDPs is NEXTP-complete [Bernstein et al., 2002]. There are two general approaches to this problem. The first one tries to find 10 approximate solutions using efficient algorithms [Nair et al., 2003, Bernstein et al., 2005], which gives no guarantee on the solution quality. The second category tries to find the global optimal solution by applying useful DEC-POM DPs’ subclasses [Becker et al., 2004, Nair et al., 2005], which lacks expressiveness [Taylor et al., 2011]. While the BDI framework is focused on execution-time reasoning, the DCOP and POMDP frameworks focus more on planning-time reasoning and they often fail to scale up to large numbers of agents because of high computational cost for a highquality joint policy and also to deal with the model uncertainty. A recent work by Taylor et al. [2011] argues th at execution-time reasoning is a critical component to multi-agent systems and shows th a t robustly combining execution-time and planning­ time reasoning in DCOP and POM DP frameworks results in better performance in terms of achieved award, runtime and scalability. It presents new algorithms that integrate execution-time and planning-time reasoning and suggests th a t multi-agent community should focus on incorporating more execution-time reasoning. 2.3 H elpful B ehavior in A gen t Team s In the last two decades, there has been a research interest in helpful behavior in agent teamwork, which specifies mechanisms for assistance between teammates by executing actions or providing information [Itoh, 1991, Miceli et al., 1994, Yen et al., 2004, Cao et al., 2005, Fan et al., 2005, Kamar et al., 2009, Polajnar et al., 2012], Kamar et al. [2009] provide a decision-theoretic mechanism for incorporating help­ ful behavior in agent teams working on a collaborative activity. The authors argue th a t helpful actions incur some costs to the interacting agents, which may be due 11 to communication costs, missed opportunities while helping, adaptations needed for receiving the provided help, or interruption costs. They formally integrate decisiontheoretic and BDI models by incorporating costs and uncertainty into deliberation, reasoning based on local probabilistic beliefs of individual agents, and formalizing helpful behavior based on the commitments and intentions of the participating agents. They formulate two distinct types of helpful behavior including performing actions and providing information, which are based on a unilateral decision making mecha­ nism; i.e. the decision on performing helpful behavior is made by only one participant in the collaborative activity. In another work, Kamar and Grosz [2007] investigate effective interruption management in collaborative human-computer activities. They formalize interruptions as multi-agent decision making in a human-computer setting. Polajnar et al. [2011] argue th at when an agent team is situated in a realistic dynamic environment which is subject to unpredictable changes, helpful behavior among teammates can have a positive impact on the team performance. The authors point out th a t for an agent acting in such an environment, there might be situations caused by unexpected events which require specialized or collective behavior. As it might be impractical to model agents with all potentially required capabilities at design time, it is beneficial to incorporate mechanisms for helpful behavior at execution time. On the other hand, there is an advantage in modeling agents with certain standard capabilities for their reusability in similar domains. In th a t case, an agent may have abilities beyond its immediate role in a particular application, which can potentially be incorporated in a help mechanism in order to advance the team performance. Nalbandyan [2011] and Polajnar et al. [2012] present an interaction protocol for helpful behavior, called the M utual Assistance Protocol (MAP), with its subsequent 12 elaboration in [Nalbandyan et al., 2013]. MAP is based on a distributed decision making mechanism, called bilateral distributed agreement (BDA), in which the deci­ sion on performing a help act is made jointly by both requester and helper agents, in contrast with unilateral approaches presented in [Kamar et al., 2009]. In a different approach, Dalvandi [2012] investigates the need for empathy among artificial agents and provides simulation results suggesting the behavioral advantages of incorporating the empathy-driven deliberation on helpful behavior in certain dy­ namic environments where rational deliberation is costly. This approach is inspired by the way living systems deliberate on helpful behavior in certain situations based on the notion of empathy. In another work, Polajnar et al. [2014] formulate adaptation strategies for agent teamwork organization situated in a dynamic environment with unpredictable changes. The authors formulate a decentralized reactive approach based on flexible, dynamic combinations of complementary adjustment strategies. The specific strategies include: individual replanning, direct help among teammates, and subtask swapping among the teammates. The simulation results in a microworld setting suggest th a t help­ ful behavior is generally the most beneficial contributor to the combined strategies; however, the most inclusive combinations tend to dominate. 2.4 A gen t In teraction P rotocols Agent protocols are generally of two types: communication protocols and interaction protocols. Agent communication protocols operate at a low level regulating the means of sending and receiving information, typically by message passing, among the indi­ 13 vidual agents. They provide speech act classification and semantics, and the domain ontology the agents use to understand each other [Wooldridge, 2009]. KQM L/KIF [Mayfield et al., 1996] and FIPA-ACL [Foundation of Intelligent Physical Agents, 1997] are two best known standard agent communication protocols. Agent interac­ tion protocols are high-level protocols and provide sets of rules specifying interactive behavior of individual agents. They model agents’ social activities such as coordina­ tion, negotiation, and collaboration. One of the most widely used agent interaction protocols has been the Contract Net Protocol (CNP), originally designed by Smith [1980] for cooperative problem solving between nodes in distributed systems, inspired by the way companies put contracts out to tender [Wooldridge, 2009]. Paurobally et al. [2004] discuss CNP interactions for self-interested agents, based on competitive bidding between contracting agents. One may consider designing a universal agent interaction protocol which can be employed in all the possible circumstances. Dunn-Davies et al. [2005] point out th a t such a global protocol is not realistic as agents’ social behavior may be realized in a variety of approaches. Hence, each specific agent interactive behavior in a particular domain requires designing a specific interaction protocol. Wajid and Mehandjiev [2013] classify agent interaction approaches into two general groups: protocol-based approaches, which specify agents’ behavior using interaction protocols, and approaches without protocols, which constrain agents’ behavior with­ out using protocols. They point out th a t in the protocol-based approaches, some level of predictive behavior is ensured in agent interactions, however less flexibility is enforced in agents’ behavior. On the other hand, approaches without protocols increase the flexibility of agents’ behavior, but decrease the level of predictive out­ come. The authors further classify protocol-based approaches by considering whether 14 the protocols are created at design time or run time. Obviously, approaches which create their protocols at run-time are more flexible, but less predictable; and those which use design-time created protocols have less flexibility, but more predictability. Finally, approaches which create their protocols at design-time are divided into two subclasses based on the time of loading protocols: approaches which load the pro­ tocols at run-time and those which load at design-time; again, the latter being less flexible, but more predictable. 2.5 T he M u tu al A ssistan ce P ro to co l (M A P ) The M utual Assistance Protocol (MAP) [Nalbandyan, 2011, Polajnar et al., 2012] has been specifically designed for helpful behavior in agent teamwork, distinguished by its distributed deliberation approach. MAP provides a mechanism for helpful behavior by enabling agents in a team to directly assist each other. Its bidding sequence is similar to the one in the Contract Net Protocol [Smith, 1980]. The help deliberation process is jointly determined by two agents through a bilateral distributed agreement. In this section, we first describe its team model, principles, and variations, and then review some design features, introduced in [Nalbandyan et al., 2013], which formulate criteria for deliberation on whether to engage in helpful behavior. 15 2.5.1 M A P Characterization and Principles 1. The Agent Team M odel A team A consists of agents A j , i e / = { l , . . . , n } , n > l , situated in an environment E. The agents are able to perform actions from a domain Act = {qj. . . . , a rri}, m > 1. Each agent A, has its individual skill profile, which defines A,’s efficiency for the domain Act in the context of environment state. A, knows its own skill profile precisely; but its knowledge of team m ates’ profiles may be limited to a varying degree. The team is assigned a task T, with each agent A, addressing a subtask Tt th a t has a resource budget Ri. The subtask assignment process can be random or optimal [Polajnar et al., 2014]. The environment E can change dynamically by events other than the actions of any agent in the team. Agent A, acts rationally with the objective to advance the team performance. 2. Local Planning Autonom y (LPA) Agents in a team are equipped with a property called local planning autonomy (LPA), which enables each agent to generate its own local plan, 7rt , for its assigned subtask, Ti. An agent A, rationally selects the best plan among its candidate plans using its own beliefs, J3,-, based on its individual view of team interest rather than self-interest, represented by A,’s team utility function Ui : Plans x BeliefSets — >R + 16 where K+ is the set of positive real numbers. Thus, each individual agent can au­ tonomously assess how to best contribute to the team. Agents with LPA can engage in helpful behavior when they jointly determine through a bilateral distributed agree­ ment, th a t the help act is in the interest of the team. з. Bilateral D istributed Agreem ent (BDA) The fundamental design concept underlying the MAP protocol is th a t the deliberation on whether to perform a helpful behavior is based on a bilateral distributed approach in contrast to unilateral approaches as in [Kamar et al., 2009]. In a bilateral distributed agreement (BDA), the two parties deliberating about a help transaction jointly decide on performing the helpful behavior based on their individual assessments of team benefit and team loss. The agent A, th a t considers receiving help, calculates the team benefit, A,+ = ufin', B,) — 11,( 71,, B i), where irt is its original plan and 7r' its new plan affected by the possible received help. Similarly, the agent Aj th at considers providing help calculates the team loss, A “ = u fiir ^ B j) - U j ( n " , B}), where iij is its original plan and i t " its new plan that includes the help act. The help transaction may occur only if the difference A tj = A t+ —A c a l l e d the net team impact (NTI), is positive. Note th a t agent A * calculates A f using its individual belief set Bi, while agent Aj calculates AJ using its own beliefs Bj. (In particular, note th a t agent’s individual beliefs include its cost vector.) The team benefit and team loss calculations are done from the receiver and provider’s perspectives, respectively, and therefore the functions и, and Uj must be properly mutually scaled to allow meaningful comparisons. The agreement is reached through a bidding sequence similar to the one in the Contract Net Protocol [Smith, 1980]. The behavioral advantages of the bilateral distributed approach are presented in [Polajnar et al., 2012] by modeling and simulation of the 17 MAP protocol and two unilateral protocols. 4. Variations of the M AP There are two generic versions of the MAP protocol: Action MAP and Resource MAP. In Action MAP, an agent can perform an action on behalf of a teammate. In Resource MAP, an agent can provide a part of resources needed for an action to be performed by a teammate. Action help is performed by a single helper agent, while multiple helper teammates can cooperate in resource MAP. Furthermore, helpful behavior can be initiated either by a requester agent who asks for help or a helper agent who offers help. The corresponding interaction proto­ cols, called the Requester-Initiated Action MAP (RIAMAP) and the Helper-Initiated Action MAP (HIAMAP), are introduced in [Nalbandyan et al., 2013]. 2.5.2 M etrics for Help D eliberation 1. Estim ating the Cost of a Plan Each agent A, is able to generate candidate plans to accomplish its assigned subtask Ti. The agent selects the least-cost plan among the generated plans and remains committed to it. However, the environment can change dynamically and the initial plan cost can change as the agent proceeds. If the agent knows the model of environ­ ment dynamism, it can estimate the expected cost of its initial plan, or its remainder [Nalbandyan et al., 2013]. 18 2. A gent’s Individual W ellbeing The individual wellbeing metric, introduced in [Nalbandyan et al., 2013], expresses how well an agent A{ is accomplishing its subtask Tt. It is defined as: w-- Ri — EcostAPi) ■ (e+m ( Z1) where EcostAPi) is the estimated cost of the remaining plan Pi for agent A i) ( is the number of actions in Pt, Ri is the remaining resources, and c, is the average expected cost of an action for A{ (i.e., c, = (l/m)S]JL:1Cjfc). The wellbeing value changes as A, performs actions, gets involved in a help act, or as the environment state changes. An agent with positive wellbeing expects to accomplish its subtask with its own resource budget and have some excess of resources; while a negative wellbeing indicates shortage of resources and th at help may be needed. Agents apply wellbeing thresholds called watermarks in order to deliberate on helpful behavior. 3. Proxim ity to Accomplishing a Subtask Agents incorporate the notion of proximity to accomplishment into their deliberation on help act in order to favor the teammates who are in proximity of accomplishing their subtasks and are more likely to score for the team [Nalbandyan et al., 2013]. An agent A t applies a proximitiy bias function, pbias, which maps its remaining plan length, £, into a positive priority value, to its assessments of the team benefit and team loss. 19 2.6 T h e R eq u ester-In itiated A ctio n M A P In the Requester-Initiated Action MAP (RIAMAP), agents can proactively request action help, but they do not offer help [Nalbandyan et al., 2013]. Agents can bid to the requests, but if an agent has sent a request, it does not bid to other requests. The protocol comprises three interaction phases as follows. The corresponding sequence diagram is given in Fig. 2.1. R equest n-i Bid Select best bid Confirm Figure 2.1: The RIAMAP bidding sequence. 1) Help request generation: At the start of every round, agent Ai with next action a k at the cost of cost,k and the current calculated wellbeing of Wi broadcasts a help request message containing a k and the calculated team benefit, A+, if any of the following three conditions holds: (i) Ri < costlk\ 20 (ii) Wi < W LL and costik > LowCostThreshold; (iii) costik > RequestThreshold; where LowCostThreshold is the upper limit of the “cheap” action range, RequestThreshold is the lower limit of the “expensive” action range, and W LL is a fixed low watermark value for wellbeing. Condition (i) means th a t the agent’s remaining resources Ri me insufficient for A t to perform a*,; only help can unblock the agent. Condition (ii) indicates th a t the agent’s wellbeing is low, and th at a*, is not cheap in its skill profile, which will further reduce the wellbeing; help will improve the agent’s chances of reaching the goal. Condition (iii) indicates th a t ak is expensive for agent Af, help may benefit the team regardless of A ,’s wellbeing. 2) Bidding to a request: Each agent receives the request and deliberates on bidding to it, while an agent who has sent a request does not. Agent A} calculates the team loss, A~, for performing the requested action a*, and NTI using the received team benefit, A*. If NTI is positive, A j sends a bid containing a*, and the associated NTI value to the requester A t. In case of having multiple qualified requests, Aj bids to the one with highest NTI. 3) Confirming the chosen bid: Agent Ai receives the bids, selects the one with highest NTI, and sends a confirmation to the bidder agent Aj. 21 A n Improved Version o f R IA M A P Polajnar et al. [2014] introduce a “frugal” version of RIAMAP, called RIAMAP2, in which the help transaction may also occur when NTI is equal to zero. The original criterion, NTI > 0, allows a help act only if it is expected to increase the team score. However, the calculation of the expected team score does not take into account the potential impact of help. RIAMAP2 recognizes the fact th a t if help saves the total resources at team level, the savings can later be used for help acts th a t may eventually increase the team score. When agent Ai requests help from agent Aj for an action a*,, and NTI = 0, savings occur if Aj performs a k at lower cost than Aj. The cost comparison accounts for the fact th a t agent A j , when performing a* on behalf of a teammate (rather than within its own local plan), incurs additional overhead cost, represented here by a constant help overhead, h. In RIAMAP2, the help act is allowed to occur if either NTI > 0, or else NTI = 0 and costik ~ costjk > h. 2.7 T h e H elp er-In itiated A ctio n M A P In the Helper-Initiated Action MAP (HIAMAP), agents can proactively offer action help, but they do not otherwise request help [Nalbandyan et al., 2013]. Agents can bid to the offers, but if an agent has sent an offer, it does not bid to other offers. The protocol comprises three interaction phases as follows. The corresponding sequence diagram is given in Fig. 2.2. 22 Offer n-l Calculate team benefit IBid If NTI >0 Bid Select bid Figure 2.2: The HIAMAP bidding sequence. 1) Help offer generation: At the start of every round, agent Ai calculates its individual wellbeing IT,. If is above a fixed high watermark W HH, it broadcasts an offer message containing pairs for each action for which A<’s cost is below the OfferThreshold parameter, where Aj*^~ denotes the team loss associated with a*. 2) Bidding to an offer: Each agent receives the offer and deliberates on bidding to it, except an agent who itself has sent an offer. If an agent A j finds th a t its next action a*, is included in the offer, it calculates the team benefit for not performing the offered action, A ^ '+, and uses the team loss, A ^ , received in the offer, to calculate the value of NTI. If NTI is positive, Aj sends a bid containing a*, and the associated NTI value to A t. In case of having multiple qualified offers, A j bids to the one with highest NTI. 23 3) Confirming the chosen bid: Agent Ai receives the bids, selects the one with highest NTI, and sends a confirmation to the bidder agent A j . 2.8 T he A gen t In teraction M od elin g Sim ulator The Agent Interaction Modeling Simulator (AIMS) framework introduced in [Alemi et al., 2014] is developed for design-oriented simulation studies of interaction models used in agent teamwork. It allows concurrent simulation of multiple teams in identical dynamic environments. It facilitates representation of agent interaction protocols and modeling of environment dynamism. The AIMS framework has been specifically developed to support incremental im­ provement of interaction protocol design by performance optimization of the protocol through iterative adjustment of its parameter values in a series of simulation experi­ ments. Its interactive simulation mode provides direct insights into behavior compar­ isons between teams th a t employ different help protocols, which can be used in each iteration of the protocol design. In the AIMS framework, protocols are represented as state machines of a partic­ ular type, with synchronous communication and alternating states for sending and receiving messages. Such representations are easily translatable to executable code in AIMS. 24 Chapter 3 Refinements of Action M AP This chapter presents the motivation for designing a new combined protocol together with its design objectives. It includes the refined versions of the requester and helperinitiated protocols which are required for composition of the new combined protocol. Section 3.1 describes the research motivation, Section 3.2 formalizes a refined team model which explicitly represents the impact of the environment state, Sections 3.3 and 3.4 present refined models of the two one-sided protocols which allow simultaneous providing and receiving help, and Section 3.5 provides the state-machine representa­ tions of the refined protocols. 3.1 T he M otivation and R esearch O b jectives The motivation for incorporation of helpful behavior in agent teamwork emanates from its potential benefits in practical applications. In this study, we are interested 25 in investigating the practical impact of direct help upon team performance from an engineering perspective. Our research objectives involve developing a comprehensive mechanism for the deliberation process in help interactions in order to advance the team performance. The M utual Assistance Protocol (MAP) family provides a direct help mechanism by enabling agents in a team to directly assist each other. Distributed deliberation on direct help is the underlying principle of MAP. In a decentralized manner, the members of an agent team use their own individual beliefs to jointly deliberate on performing a help act. Each direct help act results from a bilateral distributed agreement th a t is in the interest of the team. The deliberation on getting involved in helpful behavior, whether to provide or receive direct help, can be initiated from two different sides. One possibility is for an agent to make a request for help to its potential helper teammates. Alternatively, it is possible for an agent to make an offer of help to its potential requester teammates. The former approach is realized in the Requester-Initiated Action MAP (RIAMAP), and the latter in the Helper-Initiated Action MAP (HIAMAP). The practical impact of the two different approaches on team performance has been investigated through simulation studies. It was noted in [Nalbandyan et al., 2013] th a t the performance profiles of RIAMAP and HIAMAP are complementary. In certain situations, where one is performing weakly, the other is often dominating. While neither of them gen­ erally outperforms the other, together they maintain superiority across the param eter space, representing environment dynamism, agent resources, and communication cost. This has given rise to the research question of whether a possible combination of the two one-sided protocols might show an even better performance profile. This combi­ nation can be realized by letting agents to initiate help deliberation by both requests and offers in a single combined protocol. 26 In this study, we compose the requester and helper-initiated protocols in order to specify a single interaction protocol that integrates both proactive requesting and proactive offering of direct help. The new protocol specification is intended to leverage the advantages of the two one-sided protocols. It is expected to exhibit better overall performance by being robust and superior in most of the param eter space. The impact of the new combined protocol on team performance are explored through simulation experiments in Chapter 5. While investigating possible combinations of the two one-sided protocols, we have realized a limitation in the existing MAP protocols: an agent is not allowed to both provide and receive help at the same time. In our new combined protocol, simultane­ ous providing and receiving help act can potentially happen as there are both offers and requests made at the same time and an agent can potentially be involved in two different roles. Moreover, the practical impact of simultaneous providing and receiv­ ing direct help is of interest from an engineering perspective, as it enables the agents with specialized roles to more flexibly share their knowledge and expertise. An agent may be expert in a particular situation for which it can help one of its teammates who has difficulty in performance, but at the same time, it is also beneficial for the team if the agent dealing with an unexpected situation can receive help from another teammate who is expert in th at situation. Hence, as a prerequisite step, we first refine the existing requester and helper-initiated protocols in order to enable the agents to provide and receive help simultaneously. The immediate impacts of refined protocols on team performance are examined through simulation experiments in Chapter 5. 27 3.2 T he R efined A gen t Team M od el In the previous studies of Action MAP (such as [Polajnar et al., 2012, Nalbandyan et al., 2013]), the cost of an action performed by an agent only depends on the par­ ticular type of action and the agent’s skill profile, but not on the nature of the envi­ ronment in which the agent is situated. Malek Akhlagh and Polajnar [2014] formalize the following refined agent team model, in which the cost of performing an action also depends on a component of the environment state th a t impacts the execution of the action. This refinement explicitly represents the impact of the environment state in the agent’s action cost model. Using this refined model, the criteria for help deliberation such as estimating the cost of a plan or agent’s individual wellbeing are not restricted to a particular microworld and can potentially be applied to different practical application domains. A team consists of agents A i, A 2 , ■ ■ ■ ,A„, n > 1, th at operate in an environment E by performing actions from a domain Act. The team is assigned a task T, and each A t is given an individual subtask Ti with a budget Ri. Each instance of action a € Act performed towards Ti has a cost th at is charged to R.,. Let A ctE = { oj . . . . , n m}, m > 1, be the set of all augmented actions of the form < a, e >, where e is the component of environment state th a t impacts the cost of performing a. Then the agent Ai performs a*, at a cost represented as a positive integer constant costik■ The vector costi represents the A,’s skill profile with respect to the augmented actions, and the n x m matrix cost represents the individual abilities of all agents. The environment is dynamic in the sense th at its state can be changed by events other than agents’ actions. Each agent maintains its own belief base through percep­ tion and communication, and acts rationally in the interest of the team. To model 28 agent interaction protocols without explicit synchronization details, it is assumed th at agents perform actions in synchronous rounds and communicate only at the start of each round, in a sequence of synchronous phases. 3.3 T h e R efined R eq u ester-In itiated A ctio n M A P In the requester-initiated approach, agents can proactively request, but they do not offer help. In the previous model, RIAMAP [Nalbandyan et al., 2013], agents th a t consider providing help can bid to requests, but if an agent has sent a request, it does not concurrently bid to other requests in the same round, resulting in only providing or receiving help in one round. In order to allow simultaneous providing and receiving of help, we present the refined Requester-Initiated Action MAP (RIAMAP*) th a t removes this restriction. In the refined model, an agent is allowed to concurrently request help in one protocol session, and bid to a request in another session. In other words, the agent is able to take the roles of bidder and requester at the same time. The potential advantage of the refined model becomes manifest when an agent has sent its request for having an expensive action performed by a teammate, and simultaneously bids to a request from another agent to perform a cheaper action in its skill profile. A protocol session comprises three interaction phases as follows. Its interaction sequence is illustrated in Fig. 3.1. 1) Help request generation: This phase of RIAMAP* is identical to the request generation of RIAMAP (Section 2.6); we restate the formal conditions here to make the description of RIAMAP* 29 Requester Ak ' Requester A Bidder Ai Bidder AJ I Calculate team benefit Calculate team benefit I R equest n - i ^ B R equest n -i i i Calculate team lo s s! 1 Calculate team loss R equest Bid if NTl > o Bid If NTl > 01I I1 ^ m2 Bid II ^(m 2 < n ) 1■Select best bid „ la J ^ ml Bid p (m l< n ) Select best bid „ 1 Confirm 1 Figure 3.1: The RIAMAP* bidding sequence. self-contained. At the start of every round, agent A t with next augmented action a*, at the cost of costik and the current calculated wellbeing of W t broadcasts a help request message containing a k and the calculated team benefit, A t+ , if any of the following three conditions holds: (i) Rj < costlk\ (ii) Wi < W lh and costik > LowCostThreshold; (iii) costik > RequestThreshold; where LowCostThreshold is the upper limit of the “cheap” action range, RequestThreshold is the lower limit of the “expensive” action range, and W LL is a fixed low watermark value for wellbeing. 30 2) Bidding to a request: All agents including the ones who have sent requests, receive the request from Ai and deliberate on bidding to it. The rest is identical as in HIAMAP (Section 2.7), restated here for completeness. An agent Aj calculates the team loss A~ for performing the requested action a and NTI using the received team benefit, A+. If NTI is positive, A j sends a bid containing and the associated NTI value to the requester A,. In case of having multiple qualified requests, Aj bids to the one with highest NTI. The potential advantage of the refined model becomes manifest when an agent has sent its request for having an expensive action; and it also bids to requests from other agents for cheaper actions in its skill profile. 3) Confirming the chosen bid: Agent Ai receives the bids, selects the one with highest NTI, and sends a confirmation to the selected bidder agent A j . 3.4 T he R efined H elp er-In itiated A ctio n M A P In the helper-initiated approach, agents can proactively offer, but they do not other­ wise request help. In the previous model, HIAMAP [Nalbandyan et al., 2013], agents th a t consider receiving help can bid to offers, but if an agent has sent an offer, it does not concurrently bid to other offers in the same round, precluding the simultaneous providing and receiving of help. In order to remove this restriction, we present the refined Helper-Initiated Action MAP (HIAMAP*), which allows an agent who has 31 sent a help offer to concurrently bid to offers from other agents in the same round, taking the roles of offerer and bidder at the same time, and may as a result provide and receive help simultaneously. The potential advantage of the new model becomes manifest when the agent has offered help for some low cost actions in its skill profile, and simultaneously bids to an offer from another agent to receive help for its next expensive action. A protocol session comprises three interaction phases as follows. Its interaction sequence is illustrated in Fig. 3.2. Offerer Ak Offerer A Bidder Ai Bidder A| | i Calculate team loss Calculate team loss I Offer n-1 Offer Calculate team Bid if NTI > i Calculate team benefi Bid if NTI > ml (ml 0) for (each received request) calculate NTI; if (highest NTI > 0 && has enough resources to send a bid) send a bid to the requester with the highest NTI; go to state R-Await; else if (has enough resources to perform next action) go to state R-OwnAct; else go to state R-Blocked; else if (has enough resources to perform next action) go to state R-OwnAct; else go to state R-Blocked; } Figure 3.5: A RIAMAP* sample state specification, shows an instance of the delib­ eration on transitions from a given state. The states specifications are translated to executable code in the AIMS framework. State R-Bid2: { read received bids; if (number of received bids > 0) bestBid := select the bid with highest NTI; go to S-Confirm2; else go to S-Await3; } Figure 3.6: A HIAMAP* sample state specification. 38 Chapter 4 The Bidirectionally Initiated A ction M AP This chapter presents the Bidirectionally Initiated Action MAP (BIAMAP), which combines the refined requester and helper-initiated protocols, RIAMAP* and HIAMAP*. Section 4.1 provides an analysis of the requester and helper-initiated behavior over all param eter space, which highlights the motivation for combining the requester and helper-initiated approaches, Section 4.2 presents a discussion on possible combina­ tions of the two one-sided protocols, Section 4.3 presents the design and specification of BIAMAP, and Section 4.4 provides its state-machine representation. 39 4.1 R eq uester vs. H elp er-In itiated B ehaviors The design of the new combined protocol is expected to balance the requester and helper-initiated behaviors in such a way as to maximize the advantages of the two one-sided approaches. In order to do th a t we need an analysis of their behavior over all parameter space. In the sequel, starting from the observations by Nalbandyan et al. [2013], we analyze their relative performance in some critical situations with respect to environment dynamism, initial resources, and communication cost, in order to determine how their performance profiles complement each other. Environment Dynamism Generally, a high level of environment dynamism hampers the helper-initiated proto­ col significantly more than the requester-initiated protocol. When the environment state changes at a low rate, the helper-initiated protocol dominates with high initial resources. This happens as in this situation, a typical agent follows its initial selected plan without dramatic changes of costs and therefore it is able to make offers to other agents whenever its wellbeing is higher than the offering threshold. But when the en­ vironment change rate is high, a typical agent's wellbeing is below the offer threshold and the agents make fewer offers resulting in fewer help acts. On the other hand, the requester-initiated protocol dominates because it can adjust its teamwork to dramatic changes by broadcasting requests particularly when the communication cost is low. 40 Initial Resources Generally, decreasing the initial resources hampers the helper-initiated protocol sig­ nificantly more than the requester-initiated protocol. When the initial resources are low, for the helper-initiated protocol, a typical agent’s wellbeing is below the offering threshold and the agents cannot make offers. On the other hand, the requester- initiated protocol dominates especially with low communication cost. This happens as with low initial resources, typical agent wellbeing is below the requesting thresh­ old and agents can make requests in case of resource shortage while this increased broadcasting activity is mildly penalized with low communication cost. But when initial resources are high, the helper-initiated protocol dominates since typical agent wellbeing is above the offering threshold and agents are able to make more offers to enhance the team performance. Communication Cost As the communication cost increases, it hampers the requester-initiated protocol sig­ nificantly more than the helper-initiated protocol. Hence, the helper-initiated protocol dominates with high communication cost. This is mainly because of the way in which two protocols generate help requests and help offers: the agents with declining well­ being generate more requests in the requester-initiated protocol, but fewer offers in the helper-initiated protocol. Hence, in the requester-initiated protocol, an increased cost of broadcasting depletes the remaining resources. However, when the commu­ nication cost is low, helpful behavior can be provided at a lower cost and the agent team can benefit from increased broadcasting activity with little penalty. Hence, the requester-initiated protocol dominates when communication cost is low. 41 The design of the new combined protocol seeks to optimize the balance between the requester and helper-initiated behavior in order to leverage the advantages of each across the param eter space and integrate their strengths into a single protocol with superior performance. The design process involves the composition of two individual state machines into one, and performance optimization of the new protocol through iterative adjustment of its parameter values in a series of simulation experiments. The AIMS framework has been specifically developed to support this type of incremental improvement of interaction protocol design. 4.2 C om bining th e O ne-Sided P rotocols One might envision different possible combinations of the requester and helper-initiated protocols. One possible combined mechanism is to allow both one-sided protocols to initiate the deliberation on getting involved in helpful behavior, whether to provide or receive direct help, by broadcasting a help request and/or an offer, possibly in a combined message, in the same interaction phase. This approach might increase the quality of finding a match between a requester and a potential helper, because of the variety of available requests and offers to all agents; however, it can increase the com­ plexity of deliberation by incurring extra computation cost of finding team benefit or team loss, and increase the communication cost of broadcasting requests and offers in the same interaction phase, without informed pre-estimation of the need for sending a message. An alternative approach is to impose a fixed order on the request generation and offer generation phases in the combined mechanism, with one of the two phases always preceding the other. This specified order of interaction phases can be incorporated in 42 agents’ deliberation on whether to generate a help request or an offer. The design is which the request generation phase precedes the offer generation phase allows agents to broadcast help requests th at may be rendered unnecessary by subsequent offers. When resources are low, the requests become frequent, and unnecessary communication expenditures become prohibitive. The rationale for the alternative design, in which the help offer generation comes first, is to let the agents first consider what they are able to offer, and then deliberate on making requests based on what offers they have received. This approach avoids the unnecessary help request generation; we adopt it as basis for our design of the combined protocol. 4.3 T h e B id irection ally In itiated A ctio n M A P The Bidirectionally Initiated Action MAP (BIAMAP) incorporates the proactive re­ questing and proactive offering of direct help in a single protocol, by realizing th at the deliberation on help can potentially be initiated from both helper and requester sides in a combined mechanism design. In BIAMAP, an agent is allowed to make an offer and/or a request in their respected interaction phases. In other words, the agent can potentially take the roles of offerer and requester in the same protocol session. It can then bid to an offer and/or a request based on whether it has made an offer and/or a request. A protocol session comprises four interaction phases, one more in comparison to the requester-initiated and helper-initiated protocols. For instance, an agent needing help can either bid to received help offers or broadcast a help request. In the current design, the agent should do the latter only if suitable offers are not available. The 43 interaction phases are described as follows. 1) Help offer generation: At the start of every round, agent A, deliberates on offering helpful behavior to its teammates. In case it decides to provide action help, it broadcasts its offer. 2) Help request generation: Having received offers from teammates. A, deliberates whether it needs help for its next action. In case it determines its need, it processes the offers by calculating the NTI value for those offers which match with its next action. If its next action matches with any offer for which the NTI value is positive, it does not send a help request. Otherwise, in case there is no suitable offer, it broadcasts a help request. 3) Bidding to requests and /or offers: Once Ai has received all offers and requests from its teammates, four different sit­ uations arise, depending on whether Ai has sent a help offer and/or help request, described in the following. 44 Case 1. A, has not sent any help offer or request. In this ease, it considers bidding to both the received offers and requests. It deliberates and decides whether to bid to an offer, a request, or both. The protocol interaction sequence for this situation is illustrated in Fig. 4.1. Bidder Ai Offerer Ak i Calculate team loss Offer i Calculate team benefit Process] offer Calculate team Bid If NTI > 4 Select 1 Bid n -i R equest I Calculate team loss Bid if NTI > 0 (ml Calculate team Ic Bid if NTI: ' ^ m Sel e c tB best b i d j Bid (m 0 I Calculate team benefit Bid if NTI > 0 Bid Bid (mKn)1 m2 (m20 Bid ml (m l0 4™ ? Bid j Select best t Select best bid Confirm Confirm Figure 4.4: BIAMAP case 4: Ai has sent an offer and a request. 48 4) Confirming the chosen bids: In this phase, the agent A, who has sent a help offer or a request, receives possible bids to its offer or request. In each case, it selects the bid with highest NTI and sends a confirmation to the selected bidder agent. In the case th at A * has sent both an offer and a request, it may receive bids for both, hence it may send confirmations to two selected bidders. 4.4 T h e S tate-M ach in e R ep resen tation o f B IA M A P We represent BIAMAP using the state-machine formalism illustrated in Fig. 4.5, sup­ ported by full specifications of states together with the respective transition rules. At the end of every protocol session, each agent A* is in one of the final states introduced in Section 3.5, which determines the agent’s behavior in the current round: Blocked, OwnAct, GetHelp, Help Act, or Help&GetHelp. As an illustration, consider the behavior of agent A, in BIAMAP case 3 (Fig. 4.3). Initially, the agent is in state S-Offer. If it decides to broadcast an offer, it transits to state R-Offer2, in which it receives offers from other agents. Then it transits to S-Request2, where it deliberates on sending a request based on the offers it may have received. If it decides not to make a request, it transits to R-Aw aitl, in which it ignores requests from other agents. Next, it transits to S-Bid3, where it only deliberates on bidding to the offers. If the deliberation results in bidding to an offer, the agent transits to R-Bid4, in which it receives bids from other agents for its own offer. After th a t it transits to S-Confirm4 and, if it has received any bids, sends a confirmation to the selected bidder. In the case th at confirmation was sent, it transits to R-Confirm5, 49 where it may receive confirmation for its own bid to another agent’s offer. If it does receive the confirmation, it transits to S-Act.4- Finally, it transits to R-Help&GetHelp, which results in providing and receiving help simultaneously. A sample state specification is presented in Fig. 4.6. The state specifications are translated into executable code in the AIMS framework. 50 R-H 0) go to state R-Requestl; else if (has enough resources to broadcast) broadcast request; go to state R-Request2; else go to state R-Blocked; else go to state R-Requestl; Figure 4.6: A BIAMAP sample state specification. 52 Chapter 5 Evaluation This chapter presents the simulation models and performance results for agent teams th a t employ our new interaction protocols, in comparison with teams th at employ previously existing versions of MAP. Using the AIMS framework, we compare the performance of teams through their concurrent simulations in identical dynamic en­ vironments for different values of parameters representing environment dynamism, agent resources, and communication costs. The simulation models are based on the same board game microworld (Subsection 5.1.1), the same model of Action MAP (Subsection 5.1.2), and the same parameter settings (Subsection 5.2.1), th at were used in previous studies of MAP [Nalbandyan et al., 2013]. In a preliminary round of experiments (Subsection 5.2.2), we optimize the values of wellbeing thresholds (water­ marks), th a t are subsequently used in deliberation on making a help request or offer. In the next round of our simulation experiments (Subsection 5.2.3), we explore the team performance impact of the refinements in the requester and helper-initiated pro­ tocols, i.e., the impact of allowing agents to simultaneously provide and receive help. 53 We examine whether the refined one-sided protocols, RIAMAP* and HIAMAP*, out­ perform their previously existing counterparts, RIAMAP and HIAMAP, respectively. The next round of experiments (Subsection 5.2.4) demonstrates the complementary behavior of RIAMAP* and HIAMAP*, analogous to the previously observed comple­ mentarity of RIAMAP and HIAMAP. In the final round of experiments (Subsection 5.2.5), we explore the team performance impact of combining the requester and helperinitiated approaches into a single protocol. Using the same param eter settings as in the previous round, we examine our new combined protocol, BIAMAP, in comparison to RIAMAP* and HIAMAP*. The observed behavior and performance confirm the design expectations of exploiting the complementarity and leveraging the strengths of the two constituent protocols. 5.1 T h e Sim ulation M odels We study agent interaction protocols for direct help in the context of a board game microworld, inspired by the Colored TYails game [Gal et al., 2010] and introduced in [Polajnar et al., 2012]. Its demonstrated suitability for the analysis of the impact of help protocols on team performance in several publications on MAP makes it fully adequate for our comparative studies of new and existing MAP variations in this research. We use the AIMS framework, introduced in Alemi et al. [2014], which allows concurrent simulation of multiple teams in identical dynamic environment. It facilitates representation of agent interaction protocols and modeling of environment dynamism. We use it to compare the performance of teams th at employ different help protocols, but are otherwise identically designed and pursue identical tasks in identical dynamic environments. 54 5.1.1 The Microworld The players in the game are software agents. The board is a rectangle divided into squares of different colors from a color set S = { S i , ... ,S m}, as shown in Fig. 5.1. The color of a square represents the component of the environment state th a t impacts the cost of performing the agent’s action of moving to th at square. At the start of the game, the board color setting is uniformly random, and each agent A, is randomly assigned an initial location Li and a goal G, on the board. Agents are able to observe the entire board. A, moves toward G ,, which represents A , working on its assigned subtask T i . The resource budget R , allocated for T, is defined as !ta ' . where li is the shortest distance (i.e. number of squares) from L i to G,, and a' a positive integer constant called the initial resources coefficient. The game proceeds in synchronous rounds. In every round, Ai can move to a neighboring square, which represents performing an action. Agents are allowed to be on the same square at the same time. The cost of making a move on the board in any direction depends only on the color of the square to which the agent moves, and on the agent th a t performs the move, but not on the direction. This makes the color alone correspond to the augmented action of Section 3.2. A*’s individual skill profile is represented as a vector cosL with positive integer values: when A, moves to a square with color £*,, it pays cost,k from its resource budget Ri. All the cost vectors for all agents are included in a n x m positive integer matrix cost. A, may be blocked for shortage of recourses required for its next move. The game terminates when no one can make a move because of either being blocked or having reached the goal. The team performance is represented by team, score, which is the sum of all indi­ vidual scores calculated at the end of the game as follows. If A, has reached its goal, 55 A c tio n c o s t s o f A1 A c tio n c o s t s o f A 2 Figure 5.1: The board game microworld. Each agent A, has its individual vector costi, indexed by the color set. At the start of the game, A t plans its path by estimating the least-cost path among the shortest paths, from its initial location Li to its individual goal G,. The board colors change dynamically, affecting costs of the selected paths. [Adapted from [Nalbandyan et al., 2013].] its individual score is the goal achievement reward 0 is the proximity bias coefficient. 58 (5.5) The difference between team benefit and team loss is called net team impact (NTI). The help transaction occurs if NTI, A ^ = A f - A j, is positive. Our final comment regarding this simulation model concerns its treatm ent of ob­ servability. Our agents observe the board with two purposes. The first one is planning, and it only requires th a t the agent see the rectangle determined by its initial posi­ tion and its goal position. The second purpose is agent’s perception of environment dynamism, currently represented by disturbance D. In this thesis, we assume that each agent can see the entire board and thus assess the disturbance with maximum accuracy. In general, MAP and its variations could be studied in conjunction with various observability models. 5.2 Perform ance C om parisons 5.2.1 The Param eter Settings The game board has the size 10 x 10 with six possible colors. Each agent’s cost vector includes three entries randomly selected from an ‘expensive’ range: {250,300,350,500} and three entries from a ‘cheap’ range: {10,40,100,150}. Hence, each agent’s skill profile is specialized for certain colors. Each team includes eight agents. The ini­ tial subtask assignment process is random. The goal achievement reward is 2000 points. The cell reward is 100 points. The help overhead, h, is 20 points. The RequestThreshold and LowCostThreshold, used in request generation process are 351 and 50, respectively. The Offer Threshold, used in the offer generation process, is 299. In our experiments, we vary: the level o f disturbance in the dynamic environment, D\ 59 the initial resources for each step of the path, a'; and the communication cost of send­ ing a unicast message, U. The final team scores are averaged over 10,000 simulation runs, using random initial board settings. Our simulation experiments use the same parameter space as previous studies of MAP [Nalbandyan, 2011, Polajnar et al., 2012, Nalbandyan et al., 2013]. The ranges of the parameters representing environment dynamism, agent resources, and communication cost are limited to a certain degree as follows. The disturbance level, D, does not exceed 0.5, as with a greater level of unpredictability in the environment, rational planning and deliberation are no longer meaningful. The initial resources coefficient, a', varies in a range th at provides the agents with moderate amounts of resources, as too severe shortage discourages providing help and overabundance eliminates the need for receiving help. The unicast cost, f/, is relatively low compared to action costs, as with very high communication costs, unilateral approaches have advantages over the bilateral approach in MAP [Nalbandyan, 2011, Polajnar et al., 2012 ], 5.2.2 O ptim izing the W atermarks An agent’s individual wellbeing is incorporated in its deliberation on whether it can offer direct help or needs to receive help and hence to broadcast an offer or a request to its teammates. The wellbeing threshold values, watermarks, used in both the requester and helper-initiated approaches, need to be optimized in order to advance the agent’s interactive behavior. In this section, we present simulation results for individually optimizing W LL and W HH for the refined requester-initiated protocol (RIAMAP*) and refined helper-initiated protocol (HIAMAP*), respectively, and then collectively 60 for the bidirectionally initiated protocol (BIAMAP). In the following experiments, the other parameters are kept constant at selected mid-range values in the parameter space: the level of disturbance, D, is 0.2; the initial resources for each step, a', is 160; and the communication cost of sending a message, U, is 9. 15000 14500 14000 • *R 13500 13000 Figure 5.2: Team scores vs. W LL Fig. 5.2 shows the team performance of employing RIAMAP* while varying the low watermark value for proactive requesting of help, W LL. As can be seen, the team performance is optimal when W Ll is close to -0.2. The general shape of this graph and the location of its maximum do not change significantly when we vary the values of disturbance, initial resources, and communication cost. Hence, we adopt the location of the maximum in this graph as the experimentally optimized value of W LL for RIAMAP* in the rest of our simulation experiments. 61 14000 13500 13000 / / f 12500 ..........— ....................................................................................................................................................................................................................... f 12000 ■+• 0 * 0.1 - 0.2 ’ 0.3 - * ' ’ ' 0.4 0.5 0.6 0.7 ' .......................... 0.8 0.9 WHH Figure 5.3: Team scores vs. W HH In a similar experiment, we observe the team performance of employing HIAMAP* while varying the high watermark value for proactive offering of help, W HH. Fig. 5.3 shows the experimentally optimal value of W HH found to be close to 0.4. Again, the location of this optimum is not very sensitive to variations in disturbance, initial resources, and communication cost values. Hence, we adopt W HH = 0.4 for HIAMAP* in the rest of our simulation experiments. Fig. 5.4 presents the performance of a team employing BIAMAP, which incorpo­ rates both requester and helper-sided deliberation, and uses both the low and high watermarks. Hence, we observe its team performance while varying both W lL and W HH parameters. It can be seen th a t BIAMAP team score highly depends on these wellbeing thresholds, and one needs to find their optimal values. In this particular 62 BIAMAP Figure 5.4: Team scores vs. W LL and W HH situation, the maximum team score of 14547 is-achieved when W LL and W HH are -0.8 and 0.6, respectively. We henceforth adopt the optimized parameter values: W LL = -0.8, W HH = 0.6, along with the setting of other parameters used in the optimization process, namely D = 0.2, a' = 160, and U — 9, as the standard set of values for BIAMAP experiments. In every simulation experiment for evaluating BIAMAP, the parameters th a t are kept constant have these standard values. 63 5.2.3 The Team Performance Impact of R IA M A P* and H IA M A P* In the following simulation experiments, we observe the team performance impact of the refined one-sided protocols th a t enable agents to provide and receive direct help simultaneously in both requester and helper-initiated approaches. We compare four teams th a t employ different interaction protocols: RIAMAP, RIAMAP*, HIAMAP, and HIAMAP*. The teams are otherwise identical and operate in identical environ­ ments. We explore the relative team performance in three experiments, in which we vary one of the parameters: level of disturbance, Z), initial resources coefficient, a', or the unicast cost, U. 16000 15000 £ I 14000 I ft 13000 12000 0 0.1 0.2 0.3 0.4 D isturbance Figure 5.5: Team scores vs. Disturbance 64 0.S Fig. 5.5 shows the comparative team scores for varying the level of disturbance, D. The other param eter values are set as a' = 200 and U = 21. The relatively high values of initial resources and communication cost favor the helper-initiated approach, HIAMAP*, when the disturbance is low. However, with high disturbance HIAMAP* degrades significantly and falls below RIAMAP*. One can note th a t each new protocol (RIAMAP* or HIAMAP*) outperforms the previous version (RIAMAP or HIAMAP, respectively), in situations when the respective one-sided behavior is more effective, as discussed in Section 4.1. For example, it can be observed th at RI and RI* achieve same team score when there is no disturbance; but RI degrades more as disturbance increases, and RI* prevails at high disturbance. On the other hand, HI* outperforms HI when disturbance is low, but achieves almost same result when disturbance is high. As discussed before, this occurs because high disturbance hampers the helperinitiated protocols significantly more than the requester-initiated ones. The picture also illustrates the complementary performance profiles of the requester and helperinitiated approaches at different ends of the disturbance range. In the second simulation experiment, we observe the comparative team scores of RIAMAP, RIAMAP*, HIAMAP, and HIAMAP* for varying the unicast cost, U. The other parameter values are set as D = 0.2 and a' — 160. We extend the communica­ tion cost up to 30, in order to explore its impact on different protocols. Fig. 5.6 shows the simulation results. As can be seen, the requester-initiated protocols, RIAMAP and RIAMAP*, outperform the helper-initiated protocols, HIAMAP, and HIAMAP*, in low U values; and then they degrade significantly more in high U values. The team performance impact of simultaneous providing and receiving help becomes manifest only when the requester or helper-initiated protocols are more effective, i.e. when they have advantage over the other one-sided protocols in particular situations. In this case, when communication cost is low, the requester-initiated approach is more 65 14000 13000 12000 11000 10000 10 14 18 22 26 30 U nicast Cost Figure 5.6: Team scores vs. Communication Cost effective, and hence there is team performance gain by employing RIAMAP* over RIAMAP. At the other side, when communication cost is high, the helper-initiated approach has advantage over the requester-initiated one, and thus we observe some team performance gain by employing HIAMAP* over HIAMAP. Again, the comple­ mentary behaviours of the two different approaches are found in the results. Fig. 5.7 presents the simulation results of the third experiment, in which we ob­ serve the behaviors of RIAMAP, RIAMAP*, HIAMAP, and HIAMAP* by varying the initial resources coefficient, a!. We set D to a relatively high value of 0.3, and U to a relatively low value of 3. In this situation, the requester-initiated approach has advantage over the helper-initiated one, as high disturbance hampers it less sig66 16000 14000 K 12000 8 RI* RI 10000 ■--HI* HI 8000 6000 100 120 140 160 180 200 InitResource Figure 5.7: Team scores vs. Initial Resources nificantly and it is more effective when communication cost is low. Accordingly, it can he seen th at there is significant team performance gain by employing RIAMAP* compared to RIAMAP, i.e. allowing simultaneous providing and receiving help in the requester-initiated approach. However, with rise of the initial resources, the impact of this feature decreases, as the requester-initiated approach becomes less effective, i.e. there will be less need for making a request, and a typical agent may decide to perform its action on its own, rather than receiving help from a teammate. Fur­ thermore, with more of initial resources, the helper-initiated approach achieves better result, and all the protocols achieve almost same team score when a' = 200. Also, the team performance impact of employing HIAMAP* over HIAMAP is not manifest 67 because of high level of disturbance and low communication cost, which make the helper-initiated approach less effective. 5.2.4 Com plem entary Profiles of R IA M A P* and H IA M A P* In this section, we explore the complementary performance profiles of the refined re­ quester and helper-initiated protocols (RIAMAP* and HIAMAP*) across the param­ eter space. We present three simulation experiments in order to compare the relative performance of two agent teams employing different help protocols, RIAMAP* and HIAMAP*, but otherwise identical and operating in identical environments. In each experiment, we vary two of the following three parameters: level of disturbance, D, initial resources coefficient, a', and the unicast cost, U; the third param eter is kept constant at the standard value selected in Section 5.2.2. The experiments are designed to examine the requester and helper-initiated be­ havior hypothesis, discussed in Section 4.1. The behavior analysis investigates the efficiency of each one-sided approach in some critical situations with respect to en­ vironment dynamism, initial resources, and communication cost. The reasoning is centered on the agent’s individual wellbeing, which impacts its deliberation on direct help. Generally, in a situation which results in low value of individual wellbeing, the requester-initiated approach is more effective; while with high value of individual wellbeing, the helper-initiated approach is more effective. In the first experiment we vary the level of disturbance, D, and initial resources coefficient, a', while the unicast cost, U , is set to 9. Fig. 5.8 presents the comparative team scores achieved by RIAMAP* and HIAMAP*. It can be observed th at the requester-initiated protocol, RIAMAP*, dominates when disturbance is high. This 68 RIAMAP* ■ ■ H IA M A P * u o 1UU Initial R e s o u rc e s D istu rb an ce Figure 5.8: Team scores vs. Disturbance and Initial Resources happens because the individual wellbeing of agents decline as dist urbance grows, which hampers the proactive offering of help, but encourages the proactive requesting of help (Section 4.1). For lower disturbance, RIAMAP* still dominates when initial resources are low to moderate, because HIAMAP* is less effective in this situation. However, HIAMAP* dominates when disturbance level is low and initial resources are highly available; i.e. the situation in which the helper-initiated approach is highly effective. Fig. 5.9 presents the simulation results of the second experiment in which we 69 RIAMAP* HIAMAP’ D is tu rb a n c e U n ic a s t C o s t Figure 5.9: Team scores vs. Disturbance and Communication Cost vary the level of disturbance, D, and unicast cost, U, when the initial resources coefficient, a', is set to 160. The comparative team scores show th a t the requesterinitiated protocol, RIAMAP*, dominates when communication cost is low and also when disturbance is high. This is because the requester-initiated approach is more effective with low communication cost, and also high disturbance level has more neg­ ative impact on the helper-initiated approach. On the other side, the helper-initiated protocol, HIAMAP*, dominates when disturbance is low and communication cost is 70 high, as high communication cost has more negative impact on the requester-initiated approach. ■ ■ _ RIAMAP’ HIAMAP’ Initial R e s o u rc e s U n icast C o st Figure 5.10: Team scores vs. Communication Cost and Initial Resources In the third experiment, we vary the unicast cost, U, and initial resources coeffi­ cient, a', while the disturbance level, D, is set to 0.2. Fig. 5.10 presents the simulation results and again confirms the complementary performance profiles of the requester and helper-initiated approaches. As can be seen the requester-initiated protocol, RIAMAP*, dominates when communication cost is low and initial resources are low. On the other hand, the helper-initiated protocol, HIAMAP*, dominates when com­ 71 munication cost is high and initial resources are high. This is again in agreement with our analysis of the requester and helper-initiated behavior in critical situations. The simulation results in this section confirm our previous analysis of the relative strengths and weaknesses of each particular one-sided approach, and identify the critical situations where one of them dominates significantly. This analysis of their complementary performance profiles is incorporated in the BIAMAP design strategy, in which we leverage the strengths of the two different approaches in a combined protocol. 5.2.5 The Team Performance Impact of B IA M A P In the following, we present three simulation experiments in order to examine the new combined protocol, the Bidirectionally Initiated Action MAP (BIAMAP), which is a composition of RIAMAP* and HIAMAP*. We compare three agent teams employing RIAMAP*, HIAMAP*, and BIAMAP, but otherwise identical and operating in iden­ tical environments. The simulation results show the impact of combining requester and helper-initiated approaches on the team performance. In each experiment, we vary two of the following three parameters: level of disturbance , D, initial resources coefficient, a', and the unicast cost, U; the third parameter is kept constant at the standard value selected in Section 5.2.2. In the first simulation experiment, we explore the relative team performanee for varying the disturbance level, D, and initial resources coefficient, a', while the uni­ cast cost, U, is set to 9. Fig. 5.11, which is equivalent to Fig. 5.8 with addition of BIAMAP, shows the comparative team scores for the three teams. The immedi­ ate observation is th a t the team which employs BIAMAP outperforms the other two 72 BIAMAP RIAMAP' HIAMAP' 05 100 Initial R e s o u rc e s D is tu rb a n c e Figure 5.11: Team scores vs. Disturbance and Initial Resources teams in most of the param eter space. This suggests the superiority of the combined protocol, BIAMAP, th a t allows initiative from both helper and requester sides. How­ ever, in the corner which corresponds to low disturbance and high initial resources, HIAMAP* still outperforms BIAMAP. This is because the helper-initiated approach is very effective in this situation, while the requester-initiated component of BIAMAP is not, and HIAMAP* prevails for being a simple one-sided protocol with no cost of unproductive requester-initiated behavior. 73 BIAMAP RIAMAP' HIAMAP' D istu rb a n ce U nicast C o st Figure 5.12: Team scores vs. Disturbance and Communication Cost Fig. 5.12, which is equivalent to Fig. 5.9 with addition of BIAMAP, presents the simulation results of the second experiment in which we vary the the level of distur­ bance, D, together with the unicast cost, U , while the initial resources coefficient, a', is set to 160. The comparative team scores show that BIAMAP outperforms the two one-sided protocols in most of the parameter space. The exception arises when the communication cost is low, especially in the corner which corresponds to high distur­ bance level. This is because the requester-initiated approach is very effective in this situation, while the helper-initiated component of BIAMAP is not, and RIAMAP* 74 prevails for being a simple one-sided protocol with no cost of helper-initiated behavior. BIAMAP RIAMAP' HIAMAP’ Initial R e s o u rc e s U n ic ast C o s t Figure 5.13: Team scores vs. Communication Cost and Initial Resources In the third experiment, we vary the unicast cost, U, and initial resources coeffi­ cient, a', while the disturbance level, D, is set to 0.2. Fig. 5.13, which is equivalent to Fig. 5.10 with addition of BIAMAP, presents the simulation results and again shows the superiority of the combined protocol, BIAMAP, th at allows initiative from both helper and requester sides. BIAMAP outperforms the two one-sided protocols in most of the param eter space, except in the situation when the communication cost 75 is low, especially in the corner which corresponds to low initial resources. This is again because the requester-initiated approach is very effective in this situation, while the helper-initiated component of BIAMAP is not, but still incurs the extra cost of helper-initiated behavior. The simulation experiments presented in this section confirm th a t the overall per­ formance of BIAMAP compared to RIAMAP* and HIAMAP* is superior. It outper­ forms both one-sided protocols in most of the parameter space, except in the critical situations where one of the one-sided approaches becomes unproductive, but BIAMAP still incurs the overhead costs of incorporating th a t approach. By optimizing the wa­ termark parameters in Section 5.2.2, we have better tuned the balance between the proactive requesting and proactive offering behaviors, and we have better leveraged the advantages of the two different approaches in different situations. The experi­ mental results in this chapter are slightly better than the ones in [Malek Akhlagh and Polajnar, 2014], because of the watermark optimizations. 76 Chapter 6 Conclusions and Future Work This thesis presents a new agent interaction protocol, called the Bidirectionally Initi­ ated Action MAP (BIAMAP), for incorporating direct help into agent teamwork. The new protocol is a new version of the M utual Assistance Protocol (MAP), and inherits its principles and design features such as its agent team model, the local planning au­ tonomy (LPA), and the bilateral distributed agreement (BDA). The novel features of BIAMAP allow an agent to initiate deliberation on direct help by either a request or an offer, and also to simultaneously provide and receive help. The impact of BIAMAP on team performance is investigated through simulation experiments with the same simulation models, microworld, and param eter space used in the previous studies of MAP. The experiment results suggest superiority of BIAMAP compared to existing MAP variations. Building on previous research on help protocols, we have analyzed two different ap­ proaches to initiating deliberation on direct help, including their deliberation patterns and state-machine representations. In the first approach, an agent can proactively re­ 77 quest help from its teammates; in the second, it can proactively offer help. We have first refined the bidding patterns of the Requester-Initiated Action MAP (RIAMAP) and Helper-Initiated Action MAP (HIAMAP) by realizing the possibility th a t the same agent can both provide and receive direct help simultaneously, which has re­ sulted in their refined versions: RIAMAP* and HIAMAP*. Then, by analysis of the relative strengths and weaknesses of the requester and helper-initiated behaviors and realizing the possibility th a t a single protocol can incorporate both proactive re­ questing and proactive offering of help, we have designed the new combined protocol, BIAMAP, which is a composition of RIAMAP* and HIAMAP*, and leverages the advantages of the two one-sided approaches in different situations. In our simulation models, we have represented the new interaction protocols in the particular state-machine formalism used in the Agent Interaction Modeling Simulator (AIMS) framework. It includes full specifications of states together with the respective transition rules, in a form that is easily translated into executable code in AIMS. We have investigated the impacts of employing RIAMAP*, HIAMAP* and BIAMAP on team performance through simulation experiments in AIMS, by varying the levels of parameters representing environment dynamism, agent resources, and communication costs. The simulation results demonstrate the team performance gains for agent teams that employ the new protocols. RIAMAP* and HIAMAP* outperform the original one-sided protocols, RIAMAP and HIAMAP. respectively. BIAMAP outperforms all other MAP versions in most of the studied parameter space. This indicates th a t direct help in agent teamwork is the most effective when it can be both offered and requested within the same protocol, and also simultaneously provided and received by the same agent. 78 In the current version of BIAMAP, in the deliberation process on whether to perforin the help transaction, an agent bids to an offer or a request only when the net team impact (NTI) is positive, i.e., when the team benefit within the receiver’s subtask exceeds the team loss within the provider’s subtask. Further studies can be established in order to incorporate the complementary saving feature introduced in the “frugal” version of RIAMAP, in which the help transaction may also occur when NTI is equal to zero but the help act saves resources at the team level. The BIAMAP performance is superior in most of the param eter space except in the critical situations when one of the one-sided approaches becomes unproductive, while BIAMAP still incurs the overhead costs of incorporating th a t approach. This motivates us to introduce additional metrics into BIAMAP th a t may allow finer bal­ ancing of the one-sided behaviors in such critical situations. The possible design strategy requires an analysis of the potential benefits and costs of maintaining such metrics and verifying whether the benefits outweigh the costs. For future work, we can also investigate the team performance impact of incorpo­ rating the Resource MAP, in which one or multiple potential helpers provide a part or all of the total required resources for performing a team m ate’s action, and compare it with the impact of Action MAP. We can further specify a combined approach which incorporates both Action and Resource MAP variations, and investigate its potential impact on the team performance in different scenarios. We have investigated our help protocols in the context of a board game microworld, supported in the AIMS framework. In this study, we have simplified our modeling of the agent team, the subtasks, and the environment, in order to investigate the behavior of direct help in relatively pure form, without the impact of extraneous factors th at may vary greatly across practical systems. When adapting our protocols to real79 world engineering applications, one may need to analyze the design requirements with respect to the observability of environment, asynchronicity of communications and actions, scalability issues of increasing the number of agents or the size of environment, time and deadlines for completing subtasks, etc. However, we expect to investigate these problems mainly in the context of specific classes of concrete applications. 80 Bibliography Omid Alemi, Desanka Polajnar, Jernej Polajnar, and Denish Mumbaiwala. A simula­ tion framework for design-oriented studies of interaction models in agent teamwork. In Proceedings of the 2014 Agent-Directed Simulation Symposium (A D S ’14), pages 28-35. Soc. for Computer Simulation International, April 2014. Eaphen Becker, Shlomo Zilberstein, Victor Lesser, and Claudia V. Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artifi­ cial Intelligence Research, 22:423-455, 2004. Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27:819-840, 2002. Daniel S. Bernstein, Eric A. Hansen, and Shlomo Zilberstein. Bounded policy iter­ ation for decentralized POMDPs. In Proceedings o f the 19th International Joint Conference on Artificial intelligence (IJ C A I’05), pages 1287-1292, 2005. Patrick Blackburn, M aarten de Rijke, and Yde Venema. Modal logic. Cambridge University Press, New York, NY, USA, 2001. 81 Michael Bratman. Intention, Plans, and Practical Reason. Harvard University Press, 1987. Michael E. Bratman, David J. Israel, and M artha E. Pollack. Plans and resourcebounded practical reasoning. Computational Intelligence, 4:349-355, 1988. Sen Cao, Richard A. Volz, Thomas R. Ioerger, and Michael S. Miller. On proactive helping behaviors in teamwork. In Proceedings of the International Conference on Artificial Intelligence, 2005. Philip R. Cohen and Hector J. Levesque. Intention is choice with commitment. A rti­ ficial Intelligence, pages 213-261, 1990. Philip R. Cohen and Hector J. Levesque. Teamwork. Special Issue on Cognitive Science and Artificial Intelligence, pages 487-512, 1991. Behrooz Dalvandi. A model of emphathy for artficial agent teamwork. M aster's thesis, University of Northern British Columbia, Canada, 2012. Barbara Maria Dunin-Keplicz and Rineke Verbrugge. Teamwork in Multi-Agent Sys­ tems: A Formal Approach. Wiley, 2010. Hywel Dunn-Davies, Jim Cunningham, and Shamimabi Paurobally. Prepositional statecharts for agent interaction protocols. Electronic Notes in Theoretical Com­ puter Science, 134:55-75, 2005. Xiaocong Fan, John Yen, and Richard A. Volz. A theoretical framework on proactive information exchange in agent teamwork. Artificial Intelligence, 169:23-97, 2005. Foundation of Intelligent Physical Agents. FIPA specification part 2 - agent commu­ nication language. Available at: www.fipa.org, 1997. 82 Ya’akov Gal, Barbara Grosz, Sarit Kraus, Avi Pfeffer, and Stuart Shieber. Agent decision-making in open mixed networks. Artificial Intelligence, 174(18):1460 - 1480, 2010. Michael Georgeff and Amy Lansky. Reactive reasoning and planning. In Proceedings of the 9th National Conference an Artificial Intelligence, pages 677-682, 1987. Michael P. Georgeff and Anand S. Rao. The semantics of intention maintenance for rational agents. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJC A I’95), Montreal, Canada, August 1995. Michael P. Georgeff and Anand S. Rao. A profile of the Australian Artificial Intelli­ gence Institute. IEEE Expert: Intelligent Systems and Their Applications, 11(6): 89-92, December 1996. Barbara J. Grosz and Sarit Kraus. Collaborative plans for complex group action. Artificial Intelligence, 86:269-357, 1996. George Hughes and Maxwell John Cresswell. A New Introduction to Modal Logic. Routledge, 1996. Hideshi Itoh. Incentives to help in multi-agent situations. Econometrica, 59(3), 1991. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101: 99-134, 1998. Ece Kamar and Barbara J. Grosz. Applying MDP approaches for estimating outcome of interaction in collaborative human-computer settings. In Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM) Workshop, Honolulu, Hawaii, May 2007. 83 Ece Kamar, Ya’akov Gal, and Barbara J. Grosz. Incorporating helpful behavior into collaborative planning. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (A A M A S ’09), volume 2, pages 875882, 2009. Jeffrey A. LePine, Mary Ann Hanson, Walter C. Borman, and Stephan J. Motowidlo. Contextual performance and teamwork: Implications for staffing. Research in Per­ sonnel and Human Resources Management, 19:53-90, 2000. Hector Levesque, Philip Cohen, and Jose Nunes. On acting together. In Proceedings of the 8th National Conference on Artificial Intelligence, pages 94-99, 1990. Magnus Ljungberg and Andrew Lucas. The OASIS air-traffic management system. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Seoul, Korea, 1992. Roger Mailler and Victor Lesser. Solving distributed constraint optimization problems using cooperative mediation. In Proceedings of the 3rd International Joint Confer­ ence on Autonomous Agents and Multiagent Systems (A A M A S ’04), volume 1, pages 438-445, 2004. M ojtaba Malek Akhlagh and Jernej Polajnar. Distributed deliberation on direct help in agent teamwork. In Proceedings of the 12th European Conference on Multi-Agent Systems (EUM AS 2014), Prague, Czech Republic, December 2014. James Mayfield, Yannis K Labrou, and Tim Finin. Evaluation of KQML as an agent communication language. In Intelligent Agents II, volume 1037, pages 347-360. Springer-Verlag, 1996. Maria Miceli, Amedeo Cesta, and Paola Rizzo. Autonomous help in distributed work 84 environments. In Proceedings o f the 7th European Conference on Cognitive Er­ gonomics, pages 367-377, 1994. Pragnesh Jay Modi, Wei-Min Shen, Milind Tambe, and Makoto Yokoo. ADOPT: Asynchronous distributed constraint optimization with quality guarantees. Artifi­ cial Intelligence, 161:149-180, 2005. Ranjit Nair, David Pynadath, Makoto Yokoo, Milind Tambe, and Stacy Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multia­ gent settings. In Proceedings of the 18th International Joint Conference on Artificial intelligence (IJCAP03), pages 705-711, 2003. Ranjit Nair, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In Proceedings of the 20th National Conference on Artificial Intelligence (A A A I’05), volume 1, pages 133-139, Pittsburgh, PA, July 2005. Narek Nalbandyan. A mutual assistance protocol for agent teamwork. M aster’s thesis, University of Northern British Columbia, Canada, 2011. Narek Nalbandyan, Jernej Polajnar, Denish Mumbaiwala, Desanka Polajnar, and Omid Alemi. Requester vs. helper-initiated protocols for mutual assistance in agent teamwork. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC'13), pages 2741-2746, Manchester, UK, October 2013. Shamimabi Paurobally, Jim Cunningham, and Nicholas R. Jennings. Verifying the contract net protocol: A case study in interaction protocol and agent communi­ cation semantics. In Proceedings of the 2nd International Workshop on Logic and Communication in Multi-Agent Systems, pages 98-117, 2004. 85 Jernej Polajnar, Behrooz Dalvandi, and Desanka Polajnar. Does empathy between artificial agents improve agent teamwork? In Proceedings of the 10th IEEE In­ ternational Conference on Cognitive Informatics and Cognitive Computing, Banff, Alberta, Canada, 2011. Jernej Polajnar, Narek Nalbandyan, Omid Alemi, and Desanka Polajnar. An inter­ action protocol for mutual assistance in agent teamwork. In Proceedings o f the 6th International Conference on Complex, Interactive, and Software-Intensive Systems (CISIS 2012), pages 6-11, Palermo, Italy, July 2012. Jernej Polajnar, M ojtaba Malek Akhlagh, Narek Nalbandyan, Denish Mumbaiwala, and Desanka Polajnar. Decentralized reactive adjustment of agent teamwork orga­ nization to changing environment. In Proceedings of the 20I f IEEE International Conference on Systems, Man, and Cybernetics (S M C ' H ), pages 1457-1462, San Diego, California, October 2014. Anand S. Rao and Michael P. Georgeff. Modeling rational agents within a BDIarchitecture. In Proceedings o f the 2nd International Conference on Principles of Knowledge Representation and Reasoning, pages 473-484. Morgan Kaufmann publishers Inc.: San Mateo, CA, USA, 1991. Anand S. Rao and Michael P. Georgeff. A model-theoretic approach to the verifica­ tion of situated reasoning systems. In Proceedings of the 13th International Joint Conference on Artifical Intelligence, volume 1, pages 318-324. Morgan Kaufmann Publishers Inc., 1993. Anand S. Rao and Michael P. Georgeff. BDI-agents: from theory to practice. In Proceedings of the 1st International Conference on Multiagent Systems, 1995. 86 Stuart J. Russell and Peter Norvig. Artificial Intelligence - A Modem Approach. Prentice Hall, 1995. Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Gametheoretic, and Logical Foundations. Cambridge University Press, 2009. Reid G. Smith. The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on Computers, 29:1104-1113, 1980. Matthew E. Taylor, Manish Jain, Christopher Kiekintveld, Jun young Kwak, Rong Yang, Zhengyu Yin, and Milind Tambe. Two decades of multiagent teamwork re­ search: Past, present, and future. Collaborative Agents - Research and Development, 6066:137-151, 2011. Usman Wajid and Nikolay Mehandjiev. A survey of flexible agent interaction ap­ proaches. Multiagent and Grid Systems, 9:247-278, 2013. Michael Wooldridge. A n Introduction to MultiAgent Systems. Wiley, second edition, 2009. Michael Wooldridge and Nicholas Jennings. Formalizing the cooperative problem solving process. In Proceedings of the 13th International Workshop on Distributed Artificial Intelligence, pages 403-417, 1994. Michael Wooldridge and Nicholas R. Jennings. Intelligent agents: Theory and prac­ tice. Knowledge Engineering Review, 10(2): 115-152, 1995. John Yen, Xiaocong Fan, and Richard A. Volz. Information needs in agent teamwork. Web Intelligence and Agent Systems, 2:231-247, 2004. 87