Ergodic Flocking of Swarm Robots by Conan Veitch B.Sc., University of Northern British Columbia, 2016 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA July 2019 © Conan Veitch, 2019 Abstract The design of efficient control strategies is a well studied problem. Due to recent technological advancements and applications in the field of robotics, exploring novel ways to design optimal control for multi-robot systems has gained interest. In this respect, the concept of ergodicity has successfully been applied as an effective control technique for tracking and area coverage. The generation of flocking behaviour is a problem that involves both tracking and coverage, and as such is also suited for the use of ergodicity. The main contribution of this thesis is the application of ergodicity to emulate flocking behaviour. This approach is appealing because control and communication is assumed to be local, self-organized, and does not require separate algorithms in order to generate different behaviour. Simulation results show that the proposed approach is effective and a prototype provides evidence that flocking behaviour is possible using ergodicity in a real-life setting. 3 TABLE OF CONTENTS Abstract 3 Table of Contents 4 List of Tables 5 List of Figures 6 List of Acronyms 8 Acknowledgements 9 1 Introduction 10 2 Swarm Behaviours 2.1 Natural Swarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Swarm Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Swarm Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Flocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 15 17 18 3 Ergodic Flocking 3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Discrete Dynamical System Example . . . . . . . . . . . . . . . . . . 3.3 Ergodic Systems - Swarm Robots as Example . . . . . . . . . . . . . 3.3.1 Ergodicity of a Rectangular Domain in R2 . . . . . . . . . . . 3.3.2 Time-Elapsed or Time-Varying Distributions . . . . . . . . . . 3.4 Ergodic Coverage and Ergodic Trojectories . . . . . . . . . . . . . . . 3.4.1 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Multi-Trajectory Generation . . . . . . . . . . . . . . . . . . . . 3.5 Ergodic Flocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Localized Control and Communication . . . . . . . . . . . . . 3.5.2 Local Coefficient Generation . . . . . . . . . . . . . . . . . . . 3.5.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 22 23 24 28 31 31 34 36 37 38 39 40 41 42 4 3.6.2 3.6.3 Time-Elapsed Aggregation . . . . . . . . . . . . . . . . . . . . Flocking by Distribution Tracking . . . . . . . . . . . . . . . . 3.6.3.1 Linear Flocking . . . . . . . . . . . . . . . . . . . . . . 3.6.3.2 Non-Linear Flocking . . . . . . . . . . . . . . . . . . 43 44 44 45 4 Prototype Implementation 4.1 Commercially Available Platforms . . . . . . . . . . . . . . . . . . . . 4.2 Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3.1 Received Signal Strength Indication (RSSI) . . . . . 4.2.3.2 Data Noise . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Coordinate Finding . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Ergodic Flocking Implementation . . . . . . . . . . . . . . . . . . . . 4.4.1 Implementation Parameters . . . . . . . . . . . . . . . . . . . . 4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2.1 Trial One . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2.2 Trial Two . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2.3 Trial Three . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 50 52 56 57 58 60 61 62 63 66 67 68 69 71 71 71 72 5 Conclusions and Future Work 76 Bibliography 78 5 LIST OF TABLES 4.1 4.2 4.3 Cost Comparison of Commercial Swarm Platforms . . . . . . . . . . RPERT State Information by LED Colour . . . . . . . . . . . . . . . . Implementation Parameters . . . . . . . . . . . . . . . . . . . . . . . . 6 49 54 69 LIST OF FIGURES 2.1 2.2 Weaver Ants Forming a Bridge [1]. . . . . . . . . . . . . . . . . . . . A Murmuration of Starlings. Source: STARFLAG Project, ISC-CNR 14 18 3.1 3.2 3.3 State Transition in Conway’s Game of Life . . . . . . . . . . . . . . . Time-Averaged Behaviour vs Spatial Average Behaviour . . . . . . Left: System with perfect ergodicity, Right: System with imperfect ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-ergodic Coverage vs Ergodic Coverage . . . . . . . . . . . . . . Centralized vs distributed ergodicity with 3 agents and step 0.5. . . Aggregation step, 7 agents, T=100, each time step is 0.1, agent speed is 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time-Elapsed Aggregation step, 7 agents, T=100, each time step is 0.1, agent speed is 0.1, distribution speed is 0.11 . . . . . . . . . . . . Local ergodic trajectories with respect to a time-evolving uniform distribution. 7 Agents, T=100, each time step is 0.15, agent speed is 0.1, distribution speed is 0.11 . . . . . . . . . . . . . . . . . . . . . . . 24 26 Robotic Prototype for Ergodic RSSI-informed Trajectories (RPERT) CAD Drawings of RPERT Chassis . . . . . . . . . . . . . . . . . . . . Relationship Between RSSI and Distance . . . . . . . . . . . . . . . . Actual Distance Heat Map vs Raw RSSI Heat Map . . . . . . . . . . Equivalent Distance of Robots on Unit Circle . . . . . . . . . . . . . Differential Drive Dyamics . . . . . . . . . . . . . . . . . . . . . . . . Experimental Domain for Trial One and Two . . . . . . . . . . . . . Trial One: Aggregation Simulation vs Application . . . . . . . . . . Trial Two: Flocking Simulation vs Application . . . . . . . . . . . . 51 55 59 60 64 66 69 74 75 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 7 27 32 39 43 45 46 LIST OF ACRONYMS API Application Program Interface ASF Advanced Software Framework EEDI Ergodic Exploration of Distributed Information EPFL Ecole Polytechnique Federale NDOF Nine Degrees of Freedom IMU Inertia Measurement Unit IRS Infrared Sensor LSB Least Significant Bits LTC Local Trajectory Generation PDF Probability Distribution Function MAC Media Access Control address RA RSSI Anchor RPERT Robot Prototype for Ergodic RSSI-Informed Trajectories RSSI Received Signal Strength Indicator SE(2) Special Euclidean Group with translational and rotational symmetries SMC Spectral Multiscale Coverage 8 Acknowledgements It’s been a long haul for a guy that didn’t finish high school. Thank you all for your incredible patience and support. Dr. Alex Aravind for your insight, your rigour, and your relentless optimism – you’ve been far more than a supervisor to me, and I can never repay your kindness. Rafael Roman Otero, mi amigo; my spiritual guide to hardware and robotics. Darshik Shirodaria, a scholar and a gentleman – but mostly a scholar. Brett Kelly, Brian Duffels, and James McCurdy, the coffee chats have kept me sane. Dr. Edward Dobrowolski, Dr. Mark Dale, and Dr. George Jones for their inspiration, and mentorship. Rene Francis, Damen DeLeenheer, Desmond Ng, Al Green, and Erin Beveridge, for their friendship. Kyle Pollard: Sweet-Jams made me both better.. and worse. Louis Dressel, for his graciousness. To my mom, dad, grandma and grandpa for their unshakeable faith in me. And finally my partner in crime, Cecilia Castro, without whom I would be lost. This work would not have been possible without her. Thank you, Cecilia, I can’t wait to meet our Dean. 9 Chapter 1 Introduction Multi-robot systems have been promoted in recent years as a consequence of rapid advances in embedded computing technologies. A task, that would otherwise be difficult for a single agent, can instead be accomplished collaboratively by a distributed system of robots. These systems boast flexibility, adaptability, and are more robust than a system consisting of a single complex robot. In the context of autonomous mobile multi-robot systems, it is essential that the implementation of tasks be robust, reliable, and above all, scalable. This requires a coordination between robot agents with some notion of individual and group agent position that acknowledges task objectives and domain boundaries. Swarm robotics is one solution to the problem of coordinating many robots that is based on the success of biological systems – animals like weaver ants, bees, and many migratory birds and fish. In this thesis, a swarm is defined as a large number of agents capable of jointly accomplishing tasks without external or centralized control. Similarly, a flock (natural or otherwise) is defined by Reynolds in [2] as: ”a group of objects that exhibits a general class of polarized, non-colliding, aggregate motion.” The behaviours of a swarm, also known as emergent behaviours, are the aggregate motions of the 10 agents that make up the swarm. A typical swarm behaviour is the ability to explore a given domain: this is known as area coverage. In the swarm robotic context, area coverage is a well explored problem [3–7]. The ability for a swarm of robots to search an area in order to map it, or find an item is beneficial. In addition to coverage, the ability for a swarm to track a target is another fundamental task for robot swarms. Target tracking is useful for applications such as search and rescue [8], mine sweeping [9], and even educational tasks [5]. The generic behaviour that emerges from the combination of tracking and coverage is known as flocking. Flocking can be described as the aggregation and group movement of a large collection of agents, generally accomplished without those agents having global system information [10]. Without this knowledge, each member of the system instead relies on local interactions with neighbouring agents to collectively produce global behaviours such as flocking. The direction and target of flocking behaviour can be emergent, selected by the swarm itself through some internal method or predetermined [11]. Several approaches have been studied for area coverage behaviours [10–13]. Among these approaches is the innovative technique of ergodic coverage introduced by Mathew and Mezić in [14]. In this technique a robot’s trajectory is given constant feedback to ensure it is covering the area as intended. The work presented in this thesis builds on Mathew and Mezić’s methods to calculate ergodicity and generate ergodic coverage in order to 1. Emulate two common swarm behaviours, aggregation, and flocking without requiring two separate behavioural algorithms; and to, 2. Refine ergodicity calculation so that robots only require local knowledge, allowing for completely decentralized behaviour after initialization. Chapter 2 of this thesis provides background on swarm behaviours, both nat11 ural and artificial. A brief discussion with concrete examples of social animals is presented, followed with the advantages that such organization provides to artificial swarms. Terminology for common swarm behaviours are provided, with careful consideration given to flocking. In Chapter 3 we present ergodic flocking, a behavioural method for robotic swarms. Ergodic systems, and ergodicity are discussed in depth prior to showing the construction of the ergodic flocking algorithm. Simulation results of the algorithm in a swarm setting show that it is effective at generating the desired behaviour. An experimental prototype is shown in Chapter 4 that implements the ergodic flocking algorithm. The development of the prototype is catalogued for hardware and calibration, and simple robot navigation is explored. The coordination and control of the prototype as a multi-robot system is provided and followed by physical experimentation in which the prototype system is given the same parameters as the Chapter 3 simulation. The results of these experiments show that the ergodic flocking algorithm is capable of emulating swarm flocking behaviours in physical systems. 12 Chapter 2 Swarm Behaviours It is common in robotics for engineers to take inspiration from biological systems [4]. The dynamics of an industrial welding robot, for example, are informed by the structure of human limbs. It is not enough to design artificial joints, bones and ligaments for a robotic arm, however, as the range of motion and predictable behaviours must be captured in the robots design as well. Stable, predictable behaviour is important in single robot systems, but is difficult to achieve in distributed multi robot systems [11]. The coordination of large numbers of robots is a non-trivial task [6]. 2.1 Natural Swarms Some birds, fish, and insects are capable of complex group movements and behaviours, most of which are entirely unchoreographed [15]. This lack of choreography stems from the decentralized nature of natural swarms of animals; each member of a swarm is an individual creature, with a separate ”agenda”. There appears to be no centralized command structure wherein each swarm member is informed of their next move or step within the swarm. Honey bees use highly organized group movements to form parallel lines of 13 honeycomb cells. A circular wax deposit is secreted by the bees, which is then heated by a living chain of honey bees [16]. When a bee recognises there is enough wax in an area, it moves to an unoccupied position around the circle, ensuring it is as close to its neighbours as possible. The heat generated by their combined thoraxes slowly softens and then eventually melts the wax, allowing it to form the cell. The chain of bees forms an array of closely packed cylinders for which the now-heated wax can flow. The entire cell is heated evenly in this way, which allows the circular deposit of wax to take on a familiar honeycomb hexagonal shape. This behaviour is entirely undirected, and is generated by the initial deposit of wax [11]. Figure 2.1: Weaver Ants Forming a Bridge [1]. Weaver Ants form living chains from their own bodies, as shown in Figure 2.1. The chains are intended to create bridges between branches, and paths through otherwise hazardous terrain for the rest of the swarm to move across [11]. Chain formation is initiated by ants walking over the hazardous terrain. Each ant will then move as close to the end of the chain as it can, without moving too far away 14 from all of its closest neighbours. These two simple rules allow sturdy chain formation, as each ant will always attempt to have as many other ants as possible nearby, while still trying to be as far forward as they can [1]. Starlings are songbirds that collectively flock in enormous numbers, creating elaborate patterns. Biologists currently believe the reason that starlings flock together in such complex movements is to watch for predators like the peregrine falcon [17]. Starling flocking behaviour, known as murmuration, is a complex group movement initiated by simple local interactions of each starling. A starling only considers its seven closest neighbours at all times, regardless of the density of the flock around them. This is due to very high starling densities leading to sensing uncertainty, creating difficulties for the flock to come to a consensus heading. Natural swarms have been studied for years by artificial intelligence and robotics researchers [18–21]. We will discuss the inherent benefits in having a swarm of robots to accomplish a task, as opposed to a more centralized multi-robot approach where multiple robots are controlled by a central system. 2.2 Swarm Advantages Centralized systems carry inherent flaws to each task they approach; it is difficult to find one that is robust, flexible, and scaleable. In order to be desirable in real world applications, it is necessary for a system to have all these qualities. Robustness is the insensitivity of a robotic system to any modifications to the system itself, or the environment in which it sits. This does not necessarily imply a robustness against run-time error or erroneous input, but rather the ability of a system to function correctly under changing environmental factors. Typically, robot failure negatively impacts centralized robot systems [7], whereas a swarm of decentralized robots is typically very resilient to individual robot failure. No 15 single member of a swarm has a key role within the swarm, and as such the failure of some agents is not considered mission critical. A flexible system is capable of accomplishing tasks within diverse environments without changing the behaviours of the members of that system [4]. For example, if robots are tasked with searching a house for an object are moved to a different house, or an apartment complex, they should be equally capable of accomplishing their task. A robot swarm is flexible due to a concept called self organization. This refers to the ability of a swarm to create global patterns, or behaviours strictly from many local interactions between component swarm members [5]. If a swarm is searching a home, it is not because each member of the swarm has been told exactly where to search. Each member of a swarm typically only knows its immediate surroundings and neighbours. The interactions with these neighbours and environment informs the decisions of each individual robot. This is in contrast with centralized robot systems, in which it is typical for the robot(s) involved to have access to more complex environmental information. In such a system, this information would need to be updated with each new task [22]. Scalability is the insensitivity of a group of robots to modifications to the number of robots in the system [23]. Since each member of the swarm communicates only locally and does not rely on global or centralized information, the addition of any number of robots does not complicate communication within the swarm. A common term for the organization and problem-solving of natural and artificial swarms is swarm intelligence. The generally accepted [10, 11] definition of swarm intelligence is: ”the emergent collective intelligence of groups of simple agents”. This emergent collective intelligence is desirable and even though swarm robot systems are robust, flexible, and scaleable, they are only useful if they are capable of generating behaviours that lead to predictable, reliable actions. 16 2.2.1 Swarm Behaviours In the field of swarm robotics, a wide variety of algorithmic solutions have been proposed and investigated in order to perform numerous swarm related tasks. Often these solutions are designed around the use of specific hardware or technologies, without concern for scalability with regards to the number of robots, or the limitations and costs of the hardware required in order to implement them [23]. There is a distinct lack of “top-down” design methods for swarm tasks where the methodology is derived from and regulated by a set of mathematical principles [12]. Design methods often show limited application and extensibility to tasks other than the specific ones they were proposed for [4]. Hence there is a lack of formal mathematical analysis with regards to swarm tasks. Any proposed swarm algorithm must produce consistent, but flexible behaviours. Most popular approaches to swarm tasks often involve random elements or the implementation of a probabilistic finite state machine [6, 23], which can lead to varying, but still predictable performance between similar groups of robots on collective tasks. The following are common coordination sub-tasks in swarm robotics; • Aggregation: the ability of a swarm to form clusters of swarm members; • Self-assembly: the physical connectivity of swarm members, allowing complex problems to be handled in a distributed fashion; • Coordinated movement: the collective, purposeful movements of swarm members in restricted environments – that is, environments that are bounded in some way; • Flocking: A specific variation of coordinated movement and restricted self assembly in which swarm members move with one another, allowing the swarm to act as a whole. 17 This thesis focuses on the problem of swarm aggregation and flocking, while nonflocking forms of coordinated movement and self-assembly are left as future work. 2.3 Flocking Murmurations of starlings, as shown in Fig. 2.2, are an example of natural swarm flocking. Figure 2.2: A Murmuration of Starlings. Source: STARFLAG Project, ISC-CNR Schools of anchovies, locust swarms, and other similarly collective animal group movements show flocking behaviours that are vital to their survival. In natural swarms, flocking often increases the safety of each member of the swarm since the number of eyes checking for predators and food sources [4] is increased. Depending on the animal, flocking can have other purposes as well. Flocking birds, for example, expend less energy since the updraft created by each neighbour creates wing lift [11]. Flocking can be thought of as an emergent behaviour that occurs as a result of 18 swarm intelligence. This is because each member of the flock interacts only locally with its neighbours and its environment. Each member of a flock follows a set of simple rules that may be biologically hard-coded in order to produce patterns as complex as starling murmurations [17]. Reynolds provides a set of simple rules that allows an artificial swarm to emulate natural flocking behaviour [2]. These rules rely only on local interactions and sensing, and do not require any global information. They are described as, • Collision avoidance: each member of the swarm must avoid collision with fellow swarm members; • Velocity matching: each member of the swarm should match the average velocity of its neighbouring flock members; and, • Flock centring: a flock member should attempt to move towards the centre of mass of all neighbouring swarm members. Collision avoidance is fundamental as it ensures that the swarm does not trip over itself and cause harm to the individuals creating the flock. Velocity matching is necessary to ensure the swarm is not broken into separate pieces during transit. As the swarm accelerates and decelerates, the individuals attempt to match their neighbours current velocity which forces the swarm to move as a collective. In order to retain swarm cohesion, flock centring is used to ensure that individuals within the flock do not travel out on their own if they are on the edge of the flock. Flock centring is capable of splitting the swarm to avoid obstacles. The swarm bifurcates easily, since individuals only care about staying near their neighbours rather than the rest of the flock. Artificial flocking can be accomplished using Reynolds’ three rules, but it is not the only method that can create this behaviour. In the next chapter, we present a 19 method for swarm robot flocking that does not require individual robots to understand the concept of flocking. Instead, we provide a method for area coverage to each robot that emulates flocking behaviour. 20 Chapter 3 Ergodic Flocking This chapter describes an ergodic flocking algorithm for swarm-robot systems which only relies on local information. We show that this algorithm can emulate the flocking behaviours found in many biological and artificial swarm systems. Flocking is achieved in this algorithm by taking advantage of the time-averaged behavioural properties of ergodic coverage. Due to its reliance on time-averaged behaviour, ergodic coverage does not necessitate precision movements on the part of the robot, making it suited to swarms of simple robots. We begin with definitions of common terminologies, followed by a concrete example of a dynamical system. In reality, most dynamical systems are continuous in time. However, in the computational domain, such systems are suitably approximated by discrete systems. The Context and Motivation section then outlines what discrete dynamical systems are, and by extension ergodic dynamical systems. A metric for ergodicity is presented in the Ergodicity section, which is used in the ergodic flocking algorithm. A simulator is then used to show supporting evidence that the ergodic flocking algorithm allows a multi-robot system to achieve desired flocking behaviours. 21 3.1 Terminology The following are common terminologies used throughout this chapter. • System: A collection of one or more interrelated components organized to accomplish a common task; • State: The value of a component defined on its domain; • System state: The union of all component states, or the collection of all state variables; • Control law: The method in which a system moves between states, also commonly known as a transition function; • Discrete Dynamical system: A system that evolves between states with time; • Phase space: The set of all possible states of a discrete dynamical system; • Trajectory: The set of states reached by a dynamical system’s control law given an initial state; • Ergodic dynamical system: A dynamical system that is capable of reaching all possible states in its phase space given enough time; • Ergodicity: A metric for how ergodic a given trajectory is performing. In this thesis, all references to ergodicity imply the metric provided in [14]; • Distribution over a domain: A probability density function defined over a set of points in a domain; • Time-elapsed distribution over a domain: A distribution that modifies its own behaviour over a set of points in a domain as a function of time; 22 • Basis function: A set of building block functions which spans a function space. Each continuous function in the function space can be constructed using a linear combination of basis functions; • Fourier basis function: Basis functions that make up a T-periodic function space (a periodic function in time with period T is called T-periodic). The Fourier basis functions are formed of sines and cosines. 3.2 Discrete Dynamical System Example John Conway’s Game of Life is a cellular automata, a collection of cells on an nxn grid that evolves through time in discrete steps. Each cell on the grid is assigned a state of alive or dead. A transition function called an evolution assigns a new state to each cell based on the current system state. The transition function is: • Each cell on the grid is either alive or dead - for this Chapter we depict alive cells as shaded blue, and dead cells as white; • Any alive cell with fewer than two alive neighbours dies from solitude • Any alive cell with more than three alive neighbours dies from starvation • Any alive cell with two or three live neighbours continues to be alive • Any dead cell with exactly three live neighbours becomes alive Conway’s game of life is an example of a discrete dynamical system. The set of all possible states that the grid can assume is the dynamical system’s phase space [24]. Correspondingly, the evolution steps are collectively called the system’s transition function (control law), and the discrete series of new system states are the system’s trajectory. 23 (a) State A (b) State B Figure 3.1: State Transition in Conway’s Game of Life Though Conway’s Game of Life cannot reach every possible state in its phase space from a given initial state, it is possible for it to reach every possible state in some local area [25]. 3.3 Ergodic Systems - Swarm Robots as Example There are many applications for ergodic theory in the multi-robot systems field, since it deals with the time-averaged behaviour of dynamical systems. For example, a collection of robots covers an area ergodically if that area can be closely approximated by the system’s trajectory formed by those robots over time. Such a system is called ergodic if the time-average of the agents’ trajectory equals the average of all system states in the system’s phase space. In other words the fraction of time a trajectory spends in a subset of a domain is equal to the measure of that subset [26]. In the case of a system like multi and swarm robotics, we can expect to approximate the phase space, rather than equating to exact time averages. A swarm trajectory consists of agents and a bounded domain. As the agents move (in discrete steps) through a domain, a system trajectory is generated. In order to quantify how ergodic a trajectory is, one approach is to formulate a metric that defines ergodicity [14]. A metric provides a comparison of how well a trajectory is sampling a given probability distribution function (PDF). The metric is then 24 used to inform the control of each agent in the system, forcing each agent to move in such a way that the metric is minimized. Over a duration of time t, the goal of agent control is to ensure the metric ε vanishes so that ε → 0 as t → ∞. In this way, a multi-robot system can cover any area based upon some PDF over some bounded domain. Each agent’s control inherently takes a provided distribution into account, without the need for any secondary coverage algorithms. A collection of robots covers an area ergodically if that area can be closely approximated by the states of the system’s trajectory. The work in this thesis assumes that each new system state is generated in identical time intervals. The Mathew and Mezić metric is used to correct a multi-agent system’s trajectory if it is not searching ergodically. In doing so, the ergodic metric provides a feedback law which informs a multi-robot system to cover an area ergodically. Consider Figure 3.2 where the rectangular domain has a unitless area of 1 and in which a probability measure φ(x) has been provided (the darker the region the higher the density of the PDF), two regions D1 and D2 , a trajectory over a period T , and an initial state x(0). In a time period T , φ(x) is within the region D1 twice in the time intervals [t3 , t4 ] and [t7 , t8 ]. In the same period, φ(x) is within the region D2 twice in the time intervals [t1 , t2 ] and [t5 , t6 ]. Here, � D φ(x) is the measure of the region D. The system shown in Figure 3.2 is ergodic iff � φ(x)dx = (t4 − t3 ) + (t8 − t7 ) T (3.1) φ(x)dx = (t2 − t1 ) + (t6 − t5 ) T (3.2) D1 and � D2 Ergodic theory has been applied to the problem of autonomous domain ex- 25 Figure 3.2: Time-Averaged Behaviour vs Spatial Average Behaviour ploration, allowing an agent to explore an arbitrary space while retaining explicit knowledge of boundary restrictions, objectives, and agent orientations. Since each state in an ergodic system will eventually occur, we can use a measure of how ergodic an agent’s trajectory is in order to uniformly cover an entire domain. One can then give a higher weight to areas in a domain that require exploration. This is so that the agent’s trajectory spends most of its time these areas, as shown in 3.2. Consider a swarm robotic system’s trajectory that is perfectly ergodic, represented by the left-hand system in 3.3. In the figure, each white circle represents the spatial average of the system, each blue circle represents the time-average (up to that moment) of the system’s trajectory. If a sample of the system’s state is taken at seven time instances between 11:00 and 14:00, and ergodicity is zero, we can say that the system is perfectly ergodic, and that there is no difference between the spatial and time-average of the system as shown. 26 Figure 3.3: Left: System with perfect ergodicity, Right: System with imperfect ergodicity In implementation, the right-hand system in 3.3 is more likely. This system has non-zero ergodicity, and at each time step will attempt to correct its control in order to ensure that it is acting as ergodically as possible. With each new state, the deviation from an ergodic trojectory is computed. The system can feed this information into the next decision cycle via its control law, creating a feedback loop. In this way the system can continuously correct its trajectory to ensure, regardless of the physical barriers, that it is creating as ergodic a trajectory as possible for each robot. Ideally, the measure of the region must match with the time the trajectory has spent within that region. However, in reality, this level of accuracy in behaviour (trajectory) is not easy to achieve, as it would require consistent effort to correct the swarm trajectory at every time to be closer to ergodic. Specifically, such correction requires the knowledge, at every time instant, of how far away the trajectory is from being ergodic. 27 3.3.1 Ergodicity of a Rectangular Domain in R2 Suppose robot agents are situated within a rectangular domain d defined as: d = L1 × L2, where L1 = [0, L1 ], L2 = [0, L2 ], and positive real values L1 , L2 ∈ R+ . They are not allowed to leave in any way; in other words its movement is assumed to be continuous, connected, and bounded. A distribution φ(x) over d is assumed to be given. This distribution is generally application specific; the distribution will change from task to task. φ can be thought of as an information density over the domain and it is assumed that φ is zero at all points outside d. A robot in this system has position coordinates x ∈ d situated within R2 and the set of all agent coordinates in a system at a given time step i is the system’s state si , where i = 0, 1, 2, ..., n. A system trajectory x is the set of all si up to the current time step n. The metric provided by Mathew and Mezic compares the distribution φ with the spatial statistics of x. This metric decomposes both φ and x into Fourier coefficients. It is important to note that this is one of many methods to achieve ergodic trajectories, and as such is not strictly necessary in order to compute ergodicity. This method significantly reduces computational complexity, however. [14, 27]. In order to numerically approximate values for physical application, the number of basis functions must be reasonable. A higher number of basis functions will provide a more accurate representation of the PDF supplied. Similarly, a cut-off must be chosen for the number of Fourier coefficients. With few coefficients, the gaps between agent trajectories will be larger than those with a larger number of coefficients. To implement this metric, φ and x are decomposed into Fourier coefficients. The decomposition of φ into �s i=1 (Ki + 1) coefficients φk where K are the cuttoffs 28 numbers for each state i, is φk = � d φ(x)Fk (x)dx, (3.3) where the multi-index k = [k1 , k2 ] [27], and where Fk is the Fourier basis function for vector k. As d is in R2 , our orthogonal basis functions are taken as [14] Fk (x) = 1 cos(k1 x1 )cos(k2 x2 ) hk (3.4) where k1 = K1 π K π , k2 = 2 L1 L2 (3.5) and hk = � � L1 � L2 0 0 cos2 (k1 x1 )cos2 (k2 x2 )dx1 dx2 . (3.6) hk ensures that each Fk is normalized so each entry will have unit length. In [14] Neumann boundary conditions are needed (the derivative at the boundary of d is 0), and as such a cosine series rather than a full Fourier series is used. After com- puting our spatial distribution’s Fourier coefficients, we must do the same for the system trajectory. To do so, we create a probability density function distribution c(x) from trajectory x 1 c(x) = T �T δ(x − x(t))dt, (3.7) 0 where x(t) is the system state at time t, δ(β) is the Dirac delta function as described by Mathew and Mezic [14], and T ∈ Z+ a finite-time horizon. The Fourier coeffi- 29 cients of c(x) are given in [27] as ck (x) = 1 T �T 0 Fk (x(t))dt. (3.8) After decomposing our distributions into their respective coefficients, we can then compare them in a metric of ergodicity ε ε(x) = � Λk |ck (x) − φk |2 , (3.9) k where Λk is the scaling factor so that the infinite sum converges Λk = 1 3 (a + ||k||2 ) 2 , (3.10) provided that |Ck (x) − φk |2 ∼ |k|2−δ , δ > 0. (3.11) Here Mathew and Mezić have ignored φk in ε which initially caused us confusion. To avoid such confusion we include φk in ε computation as follows. ε(ck , φk ) = � Λk |ck − φk |2 (3.12) k Now it is clear that we are comparing distribution Fourier coefficients with trajectory Fourier coefficients. In this way, ε quantifies the difference between the actual trajectory and the ideal ergodic trajectory that could be generated by the same system given the same initial state. In order for the spatial average of the domain to approximate the time average of the trajectory as t → T , the control of each robot must minimize ε. The above metric, however, only takes into account a single agent’s trajectory. We need a way to measure multi-robot trajectories introduced in Section 3.4.2. 30 R2 is not the only domain for ergodicity, and work has been done on a variety of different domains. Miller and Murphey [28], of special note, have successfully used the domain SE(2) - the special euclidean group with translational and rotational symmetries. This allows their work to take into account the facing of robotic sensors while still implementing ergodic coverage. Fourier basis functions must be found for the chosen domain, but otherwise ergodicity is independent of domain. For this work, R2 is utilized rather than SE(2) in order to reduce computational complexity. The algorithm proposed later in this chapter allows for the ergodic flocking of a multi-robot swarm. With that context in mind we assume the domain to be rectangular in R2 . 3.3.2 Time-Elapsed or Time-Varying Distributions To guide trajectory calculations for applications like tracking, the concept of timeelapsed distribution was introduced in [8]. Time-elapsed distributions are spatiotemporal distributions φ(x, t) [29] that provides a spatial density φ(x) at a given time t. This is accomplished by including a time parameter in our basis function Fk . In R2 , the basis function then becomes Fk = 1 k3 πt ). cos(k1 x1 )cos(k2 x2 )cos( hk T (3.13) Distribution decomposition similarly needs to be updated as per [29]. 3.4 Ergodic Coverage and Ergodic Trojectories In the context of multi-robot systems, the coverage problem is well-studied and most algorithms accomplish their task either deterministically or using greedy algorithms. Greedy algorithms maximize information such that each new state generated by the agent will be closer to a point of highest probability density within 31 the PDF over the search domain. Due to the nature of a greedy algorithm’s control input, it will only ever attempt to visit the region of highest information density [30]. This may cause problems for domains with multiple regions of interest, or with probability distributions that a greedy algorithm cannot plan for [14]. (a) Greedy Algorithm (b) Ergodic Trajectory Figure 3.4: Non-ergodic Coverage vs Ergodic Coverage Figure 3.4 shows how a greedy algorithm can easily find a region of high density. The shaded regions are regions of higher probability density. It then becomes caught in the region, since any movement outside would cause the control input to not be maximized; the greedy algorithm can never explore the second region. Ergodicity was first applied to multi-agent systems by Mathew and Mezić to solve the coverage problem. From this application they developed an algorithm for multi-agent coverage in 2010 [14]. This algorithm uses prior knowledge of a domain by considering the agents searching a domain as an ergodic system. They have developed a metric for ergodicity, a function to measure how far a multiagent system’s trajectory is from being optimally ergodic. The Mathew Mezić met32 ric is widely used by researchers in single and multi-robot research due to being relatively computationally inexpensive. [29,31–34]. [14] also provides an algorithm for creating trajectories for multi-agent systems called Spectral Multiscale Coverage (SMC). An ergodic trajectory formed with SMC will coarsely explore both regions of interest before going over them more finely as t → T . In this way, ergodic trajectories prove to be more robust and flexible than their greedy counterparts. The intent of ergodic coverage is to provide a method of search that overcomes the deficiencies of a lawnmower strategy. In a lawnmower algorithm, an agent (or agents) will begin at an initial fixed position and each be given an equally sized rectangular slice of the domain to search. Each agent then searches its partition in parallel lines until the entire partition is finished [35]. Lawnmower is a common, accurate search strategy, but suffers the following three disadvantages in real world application [36]. 1. Uncertainty: If the domain is irregularly shaped, it is not obvious in how to slice it into partitions for each agent. 2. Fault Tolerance: If an agent fails, it is not obvious how the other agents should adapt. The agents must either begin the process of repartitioning, or adapt in some other way. 3. Myopic: Lawnmowering assumes that all of the domain must be searched equally. This is not the case in many applications, such as search and rescue. Mobile multi-robot systems are increasingly common in exploration tasks as the ubiquity of embedded distributed systems rises. Robot sensors are often unidirectionally fixed to the chassis which requires that robot orientation be adjusted in order to increase sensor efficacy. The searching for and decommissioning of mines by autonomous agents requires recognition of both position and bearing of detection sensors in order to plan movements accordingly [9]. Similarly, accurate 33 surveillance of autonomous unmanned aerial vehicles often requires a camera to be oriented in such a way to minimize disturbance of the photogrammetric survey [22]. It is clear that in many related systems, position without bearing is not enough to accurately and efficiently plan a movement trajectory. Lauren Miller and Todd Murphey have considered the sensor orientation problem, and provided the basis functions necessary to use the Euclidean motion group SE(2) as the domain of search [32]. SE(2) can be considered as the group generated by rotational and translational symmetries on R2 . Miller and Murphey then go on to describe a method for generating trajectories called Ergodic Exploration of Distributed Information (EEDI) [28]. Their contribution is an optimization on continuous trajectories, rather than discrete ones. This method of trajectory generation will ergodically cover a domain very well in comparison to many other methods, but is computationally expensive. There are many other methods for generating trajectories using the Mathew and Mezić metric. Projection-based gradient descent is one of the most popular methods [27, 28, 37, 38]. Gradient descent is an optimization technique that iteratively finds the minimum of a function. Gradient descent works by minimizing an objective function constructed in terms of ergodicity and control costs. There are many extensions to this algorithm, but all surveyed were computationally expensive [27]. 3.4.1 Trajectory Generation A great deal of work has been done on trajectory generation [14, 27, 28, 32, 34, 37], but most of these methods are computationally expensive, and as such are not suitable to the low power microcontrollers common to multi-robot systems. A simple algorithm for trajectory generation, called Spectral Multi-Scale Coverage (SMC), is provided in [14]. This algorithm generates trajectory steps quickly in real 34 time, though at a trade-off of precision. SMC covers a distribution very coarsely initially, and then covers it more finely as time elapses due to its multiscale nature; important features are covered very quickly initially and as time progresses these features are explored exhaustively. The coarse initial coverage by SMC aids in the aggregation step, because the agents will immediately attempt to move directly towards the uniform distribution before hovering in or nearby it. It also is leveraged in the time-elapsed distribution flocking step for similar reasons. The trade-off of precision vs computational speed is acceptable in multi-robot flocking applications where the number of agents will make up for any initial loss in coverage. SMC will only work for robot systems made up of single or double integrator dynamics, where we describe the single integrator case below. Recall that feedback laws are intended to drive the metric ε to zero as quickly as possible by maximizing the rate of decay of the metric at each time step. For single integrators [14], the feedback law for robot j at time t with maximum velocity umax is Bj (t) ||Bj (t)||2 (3.14) Λk (ck (t) − φk )∇Fk (xj (t)) (3.15) uj (t) = −umax where ||x||2 is the euclidean norm, and Bj (t) = tNa � k where ∇Fk (x) is the gradient vector field of the basis functions Fk of our domain. The sum in 3.15 without the gradient is the difference between two PDFs Ck and φk . To correct this difference, we move in the opposite direction of the gradient at maximum velocity as in 3.14. It is important to note again that the choice of cosine basis functions will make the gradient of Fk equal 0 on the boundary of the domain so that the robots cannot move outside of the boundary. uj (t) can be thought of as 35 the next state the robot will move to. Since our domain is R2 , ∇Fk (x) =   1 −k1 sin(k1 x1 )cos(k2 x2 )  . hk −k2 cos(k1 x1 )sin(k2 x2 ) (3.16) The restrictions on agent dynamics were acceptable in this work, as the robots described in Chapter 4: Prototype Implementation are single integrators (or are modelled as single integrators). For more complicated swarm robots, a method of trajectory generation like gradient descent [27] would be ideal, though typically swarm robots have at best double integrator dynamics, and can still utilize SMC [39]. Before describing an algorithm for generating ergodic trajectories, it is necessary to discuss the benefits of local coefficient generation using neighbourhood communication. 3.4.2 Multi-Trajectory Generation Ergodicity is extended for multi-robot systems by averaging coefficients of trajectories. For Na robots, each agent has an individual trajectory x, and x̃jn will be the corresponding state at time n, of agent j. We can then average the trajectory coefficients as [27] N N a � � 1 ck (x̃) = Fk (x̃jn ). Na (1 + N) (3.17) j=1 n=0 When these coefficients are considered in the metric, the corresponding dynamics of each individual robot are inconsequential [14]. Each robot can have different dynamics if necessary, though each robot will attempt to explore the closest area of high information density within the distribution as long as their control laws are informed by the metric. This applies to any trajectory generation method using 36 this metric, though trajectory generation itself may put restrictions on the agents dynamics [8, 14, 33]. The system will attempt to match the spatial-average of the domains distribution with the time-average of all of the robots. Concern lies with having Na agents attempting to compute coefficients globally, which is necessary for this style of multi-robot ergodicity. Each agent must communicate its trajectory coefficients to all other robots. This allows the robots to average every other robot’s coefficients in order to generate a unified trajectory coefficient for the system as a whole. In order to find the average coefficients of the system, each robot is required to broadcast its own trajectory coefficients, to receive the coefficients of all other robots, and then to compute the system ck , c̃k while ensuring that the computed value is the same as every other agents. Having each agent receive all other agents trajectory coefficients before generating the average locally can have unintended consequences. Coefficient loss due to obstacles, multipath and weather fade, shadow induced fading, and connectivity between potentially wireless nodes [13] are all significant challenges to a fully distributed multi-robot model implementing this style of ergodicity. 3.5 Ergodic Flocking We consider strictly local communication between agent neighbours as a basic characteristic of flocking. This characteristics also makes flocking behaviour scalable. Therefore, to emulate scalable flocking using ergodicity, we need: (i) an aggregation of robots to one location; and then (ii) tracking a given distribution φ as a group using only local communication. We achieve the two steps required for scalable flocking with three rules: (i) aggregate robots to one location using ergodic coverage; (ii) use a time-elapsed distribution and ergodic tracking to simulate flocking; and (iii) use only local coefficient generation to inform ergodicity 37 and neighbourhood communication. This is described in Section 3.4.3 Local Coefficient Generation. Using these ideas, flocking behaviour has been implemented in two stages, (i) apply ergodic coverage by positioning a spatially “small” circular uniform distribution φ at given coordinates (the size of the distribution should be large enough to include all agents, but small enough to aggregate them together). Each robot will cover the domain ergodically, clustering in and around φ. Then, (ii) define and deploy a time-elapsed distribution to specify a flocking trajectory, and let the multi-robot system exhibit ergodic tracking behaviour that follows the timeelapsed distribution. Two key aspects in achieving scalable ergodic flocking are effective trajectory generation, and effective control and communication. 3.5.1 Localized Control and Communication Multi-robot ergodicity allows the sharing of global trajectory distribution information to guide each robots’ next control step in minimizing the metric. In a fully distributed multi-robot ergodic system, each robot does not have access to global coefficients, and as such will be less efficient with their time in exploring a domain than a robot that is part of a centralized system. This could be due to robots being outside the communication range of other robots within the system, wireless multi-path fade, or obstacles. This loss of efficiency appears to be negligible over time in standard area-coverage problems using SMC, as shown in Fig. 3.5. Note that the coverage in a centralized ergodic trajectory is typically more composite than that of a fully distributed trajectory. This is due to the nature of a centralized ergodic system. As each agent contributes its trajectory coefficients to the combined system, the combined system will predictably cover as many as possible different areas from what the rest of the system is covering. This is why the blue agent remains predominantly in the south of the uniform distribution, the purple 38 (a) Centralized Ergodicity (b) Distributed Ergodicity Figure 3.5: Centralized vs distributed ergodicity with 3 agents and step 0.5. in the equator, and the green agent the north-east. In fully distributed ergodicity each agent is its own system until another agent approaches its local neighbourhood threshold distance. When two agents come into threshold range, they form a single system until they leave this threshold again, allowing them to share trajectory information, and informing each others next state. The end result is a less composite, but still very thorough coverage of the distribution. 3.5.2 Local Coefficient Generation In order to create a robust system that can leverage ergodicity in a scalable, intuitive way, this work modifies the standard averaging of trajectory coefficients. Each robot within a group will communicate to only a local subset of all group members. This is both a useful, and a functional limitation placed on each member, as each robot is typically very restricted in both sensor capacity, size, and as an indirect result, power. Constant communication with all members of a group would require non-direct transfer of information; an expensive prospect. 39 This new averaging of robot trajectory coefficients (hereafter Local Trajectory Coefficients (LTC)) follows simple, emergent rules: rather than an average of every member of a global collection, each robot will receive the coefficients of all members within its local neighbourhood threshold. This neighbourhood can be defined in any way natural to the system in question, whether by communication hops, physical distance, or some other way. Each robot is then free from being forced to synchronize with all other members of the group at each time step, and does not need to communicate to a potentially unmanageable number of other robots. This is avoided in decentralized trajectery generation since the threshold distance can set to a value in which a manageable number of robots can physically fit. Each robot essentially coordinates only with its neighbors. 3.5.3 Algorithm The algorithm that each robot has to execute for flocking behavior in a subregion D of R2 is given as Algorithm 1: Ergodic Flocking on R2 with LTC. Here: φ(x) is a distribution with density x from D to R2 ; φk (x) = Fourier decomposition of spatial distribution φ; K1, K2 = indices along each coordinate; B = set of all robots β in the swarm; c(x) = distribution from trajectory xt ; ck Fourier decomposition of trajectory distribution c; Λ = Scaling factor; ε(x) = sum of squared differences between ck (x) and φk (x); ckβ = trajectory coefficients from robot β; ψ ∈ R+ = a threshold value quantifying distance of local neighbourhood; δβ = distance of robot β from α; εa = average ergodicity value for all robots in robot β’s local system. Note: In a time-elapsed distribution, the algorithm must be modified to take time into account, and so is modified to include for K3 do at each iteration of calculating φk and ck . 40 Algorithm 1: Ergodic Flocking on R2 with LTC for K1 do for K2 do 3 Compute φk end 4 5 end 6 while true do 7 for K1 do 8 for K2 do 9 Compute ck 10 end 11 end 12 if Num Agents > 1 then 13 for β in B do 14 for Num Agents do 15 if δβ � ψ then 16 Receive ckβ 17 end 18 end 19 Compute Λ 20 Compute εa 21 end 22 end 23 else 24 Compute Λ 25 Compute ε 26 end 27 Get next state using SMC given agent dynamics, ε (εa ) 28 end 1 2 3.6 Simulation It is possible to approximate flocking behaviour with multi-robot ergodic trajectories. Typically a collection of algorithms or built-in behaviours would be required to accomplish flocking behaviour in a multi-robot system. Beneficially, ergodicity provides a stable framework for which collective group navigation can be performed without having to introduce additional algorithms. Rather than having a large library of algorithms available to each robot, it is preferable for each member of a multi-robot system to take advantage of a navi- 41 gation framework that can adapt depending on the situation or required task. It is important to note that the flocking behaviours provided by a system of robots generating local ergodic trajectories are not accomplished in the same manner as a more traditional multi-robot task algorithm. The end results approximate the desired behaviours well, however, as shown in fig. 3.8. 3.6.1 Simulator For our work on ergodic flocking, we extend the ergodic trajectory simulator written by Louis Dressel in the Julia language, named ErgodicControl.jl [31]. Specifically, the ability to create local ergodic trajectories using both stationary and timeelapsed spatial distributions has been added, as well as some new functionality allowing the automation of image and GIF generation. The extended simulator can be found at: https://github.com/Cveitch/ErgodicControl.jl For all task simulations, seven agents were used. Different time-elapsed distributions were used for each simulation, and a new simulation was loaded for each trial. The same domain d = [0, 1] × [0, 1] was used for all tasks to ensure comparability of results. The total time for each trial was T = 100, but the time step between each state in trials can differ. This was done strictly to avoid outpacing the time-elapsed distribution, whose time steps need to keep pace with the agents. It should be noted that the generation of distributions for flocking is a nontrivial task, and will be explored in future works. The distribution generation depends on domain, coverage region, obstacles, number of agents, and intended subtask (sensing, moving, etc). We simulate flocking in two steps: (i) Aggregation to a fixed location, shown in Section 3.6.2; and (ii) Moving as a flock from there following a trajectory, shown in Section 3.6.3.2. 42 3.6.2 Time-Elapsed Aggregation (a) Time Step t = 0 (b) Time Step t = 20 (c) Time Step t = 50 (d) Time Step t = 70 Figure 3.6: Aggregation step, 7 agents, T=100, each time step is 0.1, agent speed is 0.1 Aggregation is the ability of a multi-robot system to collect itself into a common region. As per the Ergodic Flocking algorithm 1, all agents in the system must aggregate before moving towards a common target. A uniform circular distribution with radius 0.15 was provided to the system, centred at point (0.2, 0.2). As shown in fig. 3.6, the robots immediately move to cluster within the distribution without exception. Once all robots have aggregated, 43 the uniform distribution is removed and a new time-evolving distribution is created to guide flocking behaviour. Area coverage of ergodically informed trajectories can be halted at any time by providing the group with an empty distribution over the domain. In application this could be accomplished by giving the distribution and an alert to one robot who will then broadcast the alert and new distribution to its neighbours. This trickle-style network would eventually transmit the necessary information to all robots within the distribution. If the desired effect is to aggregate and then stop moving, providing a clustering uniform distribution followed by an empty distribution would be satisfactory. 3.6.3 Flocking by Distribution Tracking First, flocking in a line (linear flocking) is simulated and then flocking in a curve (non-linear flocking) is simulated. 3.6.3.1 Linear Flocking Figure 3.7 shows a new, time-elapsed distribution overlaid on the domain. Another uniform circular distribution with radius 0.15 was centred at point (0.2, 0.2). This distribution then evolved with time, linearly increasing its x, y coordinates by 0.01 at each time step. The aggregation and time-elapsed aggregation steps are shown to be possible in figures 3.6 and 3.7, but these trials only show linear flocking progression. In order to be useful for swarm tasks, it needs to be shown that non-linear movement is also possible with this method. 44 (a) Time Step t = 0 (b) Time Step t = 15 (c) Time Step t = 30 (d) Time Step t = 50 Figure 3.7: Time-Elapsed Aggregation step, 7 agents, T=100, each time step is 0.1, agent speed is 0.1, distribution speed is 0.11 3.6.3.2 Non-Linear Flocking Figure 3.8 shows how non-linear flocking behaviour provided by local ergodic trajectories can be influenced with a time-evolved distribution. Agents were first aggregated to a static uniform distribution with mean (0.4, 0.4) and a radius of 0.18. A time-evolving distribution with the same radius and initial mean was then provided to the agents, which then moved counter-clockwise around the domain. Each robot manages to remain in approximate alignment with 45 (a) Time Step t = 15 (b) Time Step t = 35 (c) Time Step t = 65 (d) Time Step t = 85 Figure 3.8: Local ergodic trajectories with respect to a time-evolving uniform distribution. 7 Agents, T=100, each time step is 0.15, agent speed is 0.1, distribution speed is 0.11 all other agents. Ergodic Flocking can be accomplished in the ErgodicControl.jl simulator with agents taking advantage of LTC. Figure 3.8 clearly shows that non-linear flocking is possible, and a physical implementation of this algorithm is presented in the next chapter. 46 Chapter 4 Prototype Implementation This chapter provides evidence in a physical setting that ergodic flocking is possible with a prototype. To the best of our knowledge, distributed swarm ergodicity has been shown only in a simulated environment. In order to provide supporting evidence for real-world ergodic flocking a prototype was designed, and four robots used in an experiment intended to closely approximate simulation results. A description of the hardware and dynamics of the robots involved is presented. We begin with an investigation of the advantages and disadvantages of commercially available robotic platforms, followed by a discussion on why we ended up building the prototype in-house. The hardware components of the prototype are discussed next, with explanations why each was selected and how they are calibrated. The method of distance-only navigation used by the prototype is described in the Navigation section. It is followed by an explanation of the received signal strength indicator of the wireless packets used to extract the distance. An algorithm for how multiple robots implementing distance-only navigation techniques are able to coordinate between one another is presented in the Coordination section, as is the control dynamics of the prototype. The Ergodic Flocking Implementation section describes how the prototype system is used in a series of 47 ergodic flocking experiments with similar parameters as the simulations in the Ergodic Flocking chapter. 4.1 Commercially Available Platforms Commercially available robots fall into one of three categories [40]; industrial, academic, or entertainment. There are no suitable industrial platforms available, since this work focuses on the flocking of robotic swarms and there are no commercially available industrial swarm robots yet. A collection of academic and entertainmentfocused robots were investigated but none found suitable to ergodic flocking. This was due to each platform ultimately failing to meet all of the necessary criteria for ergodic flocking. The criteria for a platform to be usable in the following ergodic flocking implementation are: • Affordability: since multiple robots were needed in the experiment, each robot had to cost less than $150.00 CAD; • Communication: each robot must transmit and receive 80-byte packets (trajectory coefficients consist of a 4 × 4 × 4 matrix of bytes). This 80-byte restriction is to avoid network complexity; • Mobility: Robots must be mobile and capable of navigating an environment; • Minimally sensored: equipped with as few sensors as is necessary to achieve a specific task. This is required, as the intent of this work is to be extended to a swarm of simple, low cost, low-power robots; and, • Reprogrammable: the robot must be easily reprogrammed. The amount of debugging required in computer hardware work meant that the robots should take as little time as possible to reprogram. 48 Platform AERobot Alice AMiR Cellulo Colias-4 E-puck Kilobot R One S Bot Sensors Distance, Light Camera, Distance Bump, Infrared (IR) Communication Pattern-sensing camera, capacitive touch Bump, IR Communication, IR Proximity Distance, Camera, Inertial Measurement Unit (IMU) Distance, Light Bearing, Bump, IMU, Light Humidity, IMU, IR, Light, Temperature Cost ($ CAD) 20 120 50 NA 360 1070 110 250 NA Table 4.1: Cost Comparison of Commercial Swarm Platforms Table 4.1 gives a listing of robotic platforms investigated for this work. Of the swarm robots surveyed, these nine platforms are the closest to meeting system requirements. All prices are given in Canadian dollars, and are relative conversions at time of platform choice as of Winter 2017. Though inexpensive, the Harvard Kilobot and AERobot platforms [21, 41] are incapable of transmitting the 80 byte packets required to communicate trajectory coefficients. Their IR sensors are strictly for detecting ambient light and distance to other swarm robots and environmental obstacles. In comparison to the Kilobot and AERobot, the S-bot by Êcole Polytechnique Fédérale (EPFL) [42] de Lausanne meet all implementation criteria, but all attempts to price the robots failed. It is very likely given the complexity and engineering of the S-bot that it is outside the price restrictions of this work. EPFL’s Alice platform [43] met all criteria (its cost was over budget, but within acceptable limits), but is no longer in production and at the time only two were found via secondary resale. Similarly, the Cellulo [44] meets almost all requirements but was not commercially available at the time. Since Cellulo is not open source, it also fails the reprogrammable requirement. 49 Fully open-sourced and reprogrammable, the Colias-4 [45] by University of Lincoln is above minimum affordability if bought commercially. The Colias-4 could be used in this work, however, as the PCB design is given as a file on the developers website to allow researchers to cut their own board and build their own robot. The reason Colias-4 was not selected is due to the time and costs involved in outsourcing the PCB manufacturing proving prohibitive. AMiR by Islamic Azad University is capable of IR communication and is open source with free PCB board designs available [46]. The documentation for the AMiR was unclear if the data transfer by the IR transmitters would be capable of sending 80 byte bursts without significant modification. Attempts to contact the researchers were unsuccessful. The R-One by Rice University [47] and the E-Puck, also developed by the EPFL [40], are cost prohibitive for this thesis, and are not minimally sensored. The decision to build a new platform was made after considering the available platforms due to the lack of suitable commercially available robots. 4.2 Prototype The Robotic Prototype for Ergodic RSSI-informed Trajectories (RPERT) Fig. 4.1 described in this section is a low cost, low power, minimally sensored prototype designed and developed by the Distributed and Embedded Systems Lab of UNBC. The RPERT is used to show that it is possible for a minimally sensored collection of robots to perform ergodic flocking behaviour in a real-world setting. The minimally sensored restriction is important; • Fewer sensor peripherals allows the power requirement for each RPERT to be low. This extends battery life, and allows the size of the robots involved to be reduced since the power supply can be smaller than would normally be 50 (a) RPERT External View (b) RPERT Internal View Figure 4.1: Robotic Prototype for Ergodic RSSI-informed Trajectories (RPERT) required for a platform with full sensor suite [23]; and, • Each minimally sensored robot has fewer points of failure. In a swarm of dozens or hundreds of robots, fewer sensors overall implies less downtime for each robot. Additionally, many linked peripherals may stop working if one sensor goes down [15]. These points informed all design decisions on the RPERT. If ergodic flocking can be achieved with a minimally sensored robot swarm, then the same problem can be overcome by a collection of robots with a more complex sensor suite. Swarms are predominantly very simple at the member level, and showing that a minimally sensored homogeneous collection of robots can ergodically flock is evidence that any sufficiently complex collection of robots can [6]. The aim of this work is to provide an algorithm for the general case. 51 4.2.1 Hardware Hardware selection for the RPERT is intended to keep each robot minimally sensored. As such, only the following components are necessary; • A microcontroller to control robot behaviour and control all peripheral hardware; • Motors for mobility; • A mechanism to detect other robots to avoid collision; • An IMU, to allow the RPERT to know how it is situated inside of its environment; and, • A mechanism to communicate trajectory coefficients to neighbouring RPERTs. The Atmel SAM4S Xplained Pro development board is the computer running each RPERT. The development board contains the SAM4SD32C, a microcontroller built around the ARM Cortex-M4 core, a 32-bit RISC ARM processor [48]. Though a more powerful microcontroller than necessary this controller was chosen due to our familiarity with it. This choice was made in order to speed prototype construction, as programming the RPERT required extensive assembly code. This choice of microcontroller does not change the scope of this project, since any code written for the prototype can be ran on a smaller Atmel board with only minor cosmetic code modification. This is due to Atmel Studio, the IDE software provided by the company Microchip, supplying a collection of embedded libraries for use with all Microchip/Atmel controllers. These libraries are collectively called the Advanced Software Framework (ASF). It offers an abstraction to hardware, and a series of simple drivers for pin access in an attempt to reduce the amount of assembly code required. 52 Using the pin drivers supplied by the ASF, two DG01D 4.5V DC motor gearboxes provide mobility to the robot. Each motor is 140 RPM, and both are connected to a RV8833 DC motor driver. This allows the DC motors to be controlled by the microcontroller pins, as well as move bi-directionally. The RV8833 was chosen for its low voltage capacity – since each motor is 4.5V it can be powered by the SAM4S vcc pin. Since we are powering the motors directly from the board, care is taken to mitigate motor current-draw that could potentially damage the SAM4S. To account for this, the RV8833 has built-in current limiting, but resistors were added to the circuit just in case. As the motors and all other peripherals are not powered by a secondary power source, low-power sensors are necessary to ensure the SAM4S doesn’t draw too much current and reset. The Bosch BNO055 IMU Xplained Pro daughter board [49] runs on 2.4-3.6V, and has a low power mode. This mode means that if the robot has not moved along an axis for a default 5 seconds, the sensor is put to sleep – this value has been modified to 1 second to further conserve power. The BNO055 is inserted into EXT1 of the SAM4S Xplained Pro board. It contains a 16 bit gyroscope, a 14 bit accelerometer, and a magnetometer that are collectively used to determine the heading of a RPERT. Each RPERT communicates ergodic trajectory coefficients with a wireless module: the Digi XBee Pro 802.15.4 [50]. This module is a low cost, low power device running on the IEEE 802.15.4 technical standard, though this protocol specifies only the physical and MAC layers. These layers are sufficient as each RPERT communicates coefficients by broadcasting to all other modules in range. The module operates on the 2.4 Ghz frequency band, which is the same frequency that WiFi can operate on. Given this shared frequency, the channel has been selected in such a way to mitigate collision. Channels 0x13 (2.415 Ghz), and 0x19 (2.475 Ghz) have been selected for this reason. Each RPERT is given a unique Media Access Con53 trol (MAC) address identifier defined at startup. This address can be altered later through the operating system if required. This wireless module was chosen for its very low power requirements, and relative simplicity for driver-writing. This simplicity stems from the Application Program Interface (API) mode that the XBee Pro can be configured for. The API commands can be sent by the drivers directly to the XBee controller, allowing the specific implementation details of the XBee to be abstracted. With the minimal sensors covered, there were two sensor additions to the RPERT to make experimentation and debugging easier. Each is equipped with a single infrared optical sensor. This sensor is strictly to aid in experimentation. It is only used to detect physical collisions. Without the sensor, RPERTs will require human intervention if they collide with a physical obstacle or another RPERT. This intervention adds a degree of uncertainty, and would be difficult to appropriately analyze. The second addition is a red-green-blue LED bulb inserted on either side of the top of the robot. These are for user communication only and supply both debugging and error information, as well as robot-state information. Listing 4.2 is a list of all error and status codes. COLOUR Left LED (Error) Right LED (Status) WHITE YELLOW RED CYAN PURPLE BLUE Green Initializing Cannot read SD Card Kernel Panic No instructions Battery Low Null Null Initializing Transmitting Orientation: Q1 Orientation: Q2 Orientation: Q3 Orientation: Q4 Achieved Goal Table 4.2: RPERT State Information by LED Colour 54 • Each LED is white on power-on, as the robot initializes, • Q1-Q4 directly corresponds to quadrants one through four of the unit circle. These values are measured by the IMU, • When the SD card program is inserted into the SAM4S Xplained Pro, the program must toggle the “has instructions” flag in the OS. If this flag is not toggled, the left hand LED will be cyan. If no SD card is inserted when the robot is powered on, the LED will be yellow. (a) RPERT Bottom Plate (b) RPERT Top Plate Figure 4.2: CAD Drawings of RPERT Chassis Each RPERT has a 3D printed chassis that was designed in OpenSCAD, as shown in Fig. 4.2. OpenSCAD is open source 3D rendering software that allows for precise manipulation and creation of 3D objects. Each robot was printed on a Tinkerine Pro 3D printer, and each chassis is 17 cm diameter by 12 cm tall. One 2200 mAh cellphone charger is inserted into a canal in the bottom of each RPERT as a power supply, allowing the robot to be charged by micro USB. A total of four RPERTs were constructed for this work. 55 4.2.2 Calibration Due to the nature of sensors that make up IMUs, they require an environmental calibration in order to measure accurate values [49]. To calibrate the accelerometer the Bosch BNO055 requires it be placed in 6 different stable positions for two seconds apiece, with slow movement between each position. During this movement the robot must be perpendicular to the X and Y axis at least once each. Similarly, to calibrate the magnetometer the robot must move in a figure eight pattern until the CALIB STAT register registers calibration [49]. Experimentally, the time it takes to calibrate the magnetometer is 10 seconds on average at the RPERT’s lowest speed (5 cm/s). To calibrate the Gyroscope, the IMU must be in a single stable position for 2 seconds. When powered on, each RPERT goes through initialization and calibration steps to ensure all sensors are working properly. The steps are; 1. Plug USB into the power supply; 2. SAM4S, LEDs, motors, and XBee Pro are initialized, in that order; 3. The BNO055 is set to CONFIG MODE by default, each sensor In the IMU is configured separately; 4. Calibrate Magnetometer: RPERT moves in an hourglass pattern on medium speed, polling bits 0/1 of CALIB STAT register on the IMU. If 3 is returned, magnetometer is calibrated. If 3 is not returned, repeat; 5. Calibrate Gyroscope: the RPERT remains still and polls bit 2/3 of CALIB STAT register on the IMU. If 3 is returned, Gyroscope is calibrated. If 3 is not returned, repeat; 6. Calibrate Accelerometer: The RPERT moves in a square pattern, pausing for 2 seconds at each vertex. At the end of the square, the RPERT turns 90 de56 grees, moves forward for 1 second with a 2 second pause, then moves backward for 1 second with a 2 second pause. Poll bits 2 and 3 of CALIB STAT register on the IMU, if 3 is returned, Accelerometer is calibrated. If 3 is not returned, repeat; and, 7. The BNO055 is set to NDOF MODE, setting the IMU to nine-degrees-offreedom mode. This gives access to absolute orientation data from the accelerometer, gyroscope and magnetometer. 4.2.3 Navigation Since the RPERT is minimally sensored, it is difficult to have it accurately navigate its environment [51]. In order to adequately search a domain without a lawnmower algorithm, each robot in the swarm must be capable of navigating its domain. In order to navigate accurately, the minimum following information is required [51–53]: A notion of proximity to neighbouring robots and a bearing, where proximity can be absolute, or relative to neighbours. Each RPERT makes a call to its IMU to retrieve the robot’s heading relative to magnetic north. The IMU is set to Nine Degrees of Freedom (NDOF) mode in the configuration step, which handles much of the computation onboard. Values are returned in Euler angles. Euler angles describe a 3D angle as a three-tuple (φ, θ, ω): where φ is the rotation around the x axis, θ is the rotation around the y axis, and (ω is the rotation around the z axis. As each RPERT is not oriented perfectly par- allel to the xy plane, a tilt compensation is made by the onboard microcontroller prior to returning the heading. A conversion is then used to turn these values into conventional compass radians for navigational purposes. These angles are represented internally as: 1 radian = 900 LSB (least significant bits). The 16 bit unsigned integer value returned is in the range of 0-5759, so to return as a radian value we 57 divide the returned value by 900. 4.2.3.1 Received Signal Strength Indication (RSSI) Each RPERT has a heading, but this alone is not enough to know a position in its domain. If each robot always started in the same location, they could navigate a room as long as they are aware of its dimensions. They would move forward knowing exactly how far their wheels are capable of taking them in a single quantum. Navigation in this way relies on hard-coded motor timings. This is inherently inaccurate since it does not consider the different surfaces the wheels move along, the wear on mechanical parts, motor slippage, etc [51]. If a robot navigating in this way ends its movement in a position that is not exactly where it believes itself to be, it cannot correct its course. In this way, the error propagates through the system and becomes worse with each successive quantum. It is possible to mitigate this if the robot was capable of knowing a landmark it could measure itself against. As each RPERT has only a wireless device and IMU, this is difficult, but possible. The IMU cannot be reliably used in the long term to correct propagating error, but the onboard wireless module can be leveraged to extract an approximation of distance to another external wireless module [52, 54]. Each time a wireless module broadcasts a packet of information there is an inherent signal strength, or power level, upon being received by an accepting wireless module. This RSSI ranges from -100dBm to 0dBm, where -100 is a very weak signal, and 0 a very strong one. The RSSI values received by an XBee Pro are hexadecimal, and so range from -64 to 0. Figure 4.3 is the data from an experiment we conducted at UNBC whose results suggest the relationship between raw RSSI values and distance. Peaks and valleys in RSSI value typically correspond with some form of signal agitation, collision, bounce, or loss. RSSI is used as an indication of link quality between two nodes, but beneficially 58 Figure 4.3: Relationship Between RSSI and Distance to this work it shares a nonlinear relationship with physical distance [55]. This relationship (known as the Log-Distance pathloss model) can be modelled as: rssim = −10νlog10 (d) − RSSId0 , (4.1) where RSSIm is the measured RSSI value, d is the distance in cm between the transmitter and the received, and RSSId0 is a calibration signal strength when a transmitter and receiver are positioned d0 from one another. A common choice for d0 is 100cm [52]. ν is a propogation constant, and is described by: ν= RSSIdc − RSSId0 , dc 10 log10 d0 (4.2) where RSSId c is an RSSI value measured from dc cm away from a calibration target. This value is typically chosen as 50cm, and is the default for this work. ν is called a path-loss exponent, and is used to consider the medium that the signal is moving through. Signal strength is directly effected by medium, and different domains may give wildly different values if calibration does not take place. Once 59 the calibration values are measured and recorded, distance can be found by rearranging the initial equality in 4.1 so that: d = 10 4.2.3.2 RSSIm −RSSIdc 10ν . (4.3) Data Noise This distance approximation, even when calibrated, is often very inaccurate when taken with raw RSSI data. RSSI measurements are inaccurate, and prone to data noise. Figures 4.3 and 4.4 give an example of this instability. Raw RSSI data is fed into a distance approximation equation to create an RSSI-to-distance heat map of room 10-2088 (MACE) in UNBC. To generate these, distance was measured from the furthest corner of each box from the wireless module at the centre of the room. The experiments were done in late evening, during the summer to try to mitigate student WiFi collision. The results clearly show that converting raw RSSI to distance is very inaccurate (b) Raw RSSI Heat Map (a) Actual Distance Heat Map Figure 4.4: Actual Distance Heat Map vs Raw RSSI Heat Map In order to mitigate this data noise for the RPERTs, all RSSI values are now processed through first a low-pass filter, then a Kalman filter as in [54]. 60 4.2.3.3 Filtering Initially each piece of data is ran through a low-pass software filter designed to impede high frequency signals. This removes spikes from incoming data, making it easier for a Kalman filter to predict values afterwards. A Kalman filter is an iterative filter that, based upon prior knowledge and current values, estimates what the next value will be. As RSSI is the only variable of interest, a linear Kalman filter is used. It takes an initial sensor value RSSII , and at each step processes a measured sensor value RSSIM . An initial variance VAR is required. It does not have to be accurate, since it implies that there a 97% confidence that RSSII is within VAR2 dB of the true RSSI value. We often choose an artificially high initial variance, as the Kalman filter will modify it at each step. Also required is a sensor variance SENSVAR ; this value is the inherent measurement variance of the XBee Pro itself. The variance of the system, RSSI, is also required by the filter. RSSIVAR represents the variance inherent to RSSI, which we experimentally found to be 252 = 625. Once the initial values are found, the Kalman filter begins a process of update and prediction. In the prediction step the filter makes a new estimate. In the update step, the prior estimate and a new measured value are scaled by weights that depend on their variances. The values are then combined to create a new value. Each iteration, the Kalman filter will: 1. Receive a new RSSI measurement NEWR SSI 2. Begin prediction step (a) Set RSSII + = u, where u is the expected movement in measurement; often this is 0 (b) Set VAR+ = RSSIVAR 61 3. Begin estimation step M +RSSII ·SENSVAR (a) Set RSSII = VAR·RSSI VAR+SENSVAR (b) Set VAR = 1 1 1 VAR+1 + SENSVAR For this work, RSSII was set to the calibration value RSSIdc . VAR was initially set to 252 = 625, SENSV AR was found to be approximately 5 [50], and RSSIV AR was experimentally found to be approximately 8. A Kalman filter is a linear filter, and RSSI is not a linear system. The extended Kalman filter used in this work makes the predictions and measured value linear about their mean. Other than this initial preparation of data, the extended filter performs as explained above. The benefit of the extended Kalman filter lies not only in its ability to more accurately predict the nonlinear RSSI values, but in also being able to better adapt to noise generated by sensor movement [56]. 4.2.4 Coordination With the low pass and Kalman filters smoothing RSSI values, the distance values the RPERTs receive are useable. The values do not perfectly correspond to the robot’s true distance from a transmitter, but they are close enough to allow navigation; the RPERTs understand a notion of distance from one another. This notion of distance from one another allows each RPERT to avoid collision with one another, but it does not totally solve the problem of navigating a domain. Formation control using only distance between agents is not trivial [57]. Beneficially, as each RPERT is already communicating its trajectory coefficients wirelessly, they are capable of adding their current position coordinates to the end of each packet. To this end, the RPERTs elect a RPERT from amongst them to act as an RSSI Anchor (RA): a robot that does not move, and broadcasts its position coordinates (but not trajectory coefficients) to all neighbouring RPERTs. 62 An election is held in order to elect an RA. The algorithm to elect an RA from a group of RPERTs follows: • Broadcast personal MAC address; • Compare each incoming MAC address; • If incoming MAC address is lower than personal MAC address, set RAanchor = false; • If incoming MAC address is higher than personal MAC address, idle; and, • Stop broadcasting after n seconds. RA election happens immediately after calibration. If an RA dies due to power loss or some other system failure, it would be prudent to immediately elect another RA. This functionality is not currently written, and is intended for future work. In addition, the problem of RPERT’s being outside of one another’s local neighbourhood is not covered in this work. In this case, leader election becomes more difficult, and work has been done on this problem. Multiple leader interactions are covered in [58, 59]. 4.2.5 Coordinate Finding Each robot receives a positional coordinate packet from the RA that also allows it to extract an RSSI value. This information is used by the RPERT to know both know the coordinates of the RA and an estimate of distance from it. This introduces a new problem illustrated in Fig. 4.5: distance to a point does not inform a robot of its location on the unit circle centred on the RA. A robot can be located at any coordinates along the unit circle distance d away from the RA. Each RPERT knows its absolute initial fixed coordinates. It receives its heading from its IMU, and is assumed to travel at constant speed. This would be enough 63 Figure 4.5: Equivalent Distance of Robots on Unit Circle to navigate naively by using dead-reckoning. This method of navigation involved motor timings; move both motors for one second, then turn the right motor for one and a half seconds in order to get forward movement followed by a turn measurable in radians according to robot dynamics. Dead reckoning introduces an error that quickly propagates as discussed above. To offset propagated error, the Euclidean distance Ed is computed between the RPERTs current coordinates and the RA’s. A second linear Kalman filter is used with the Euclidean distance from the RPERTs current (x, y) coordinates to the RA as initial sensor value, and the distance generated by RSSI to the RA Rd as a measured 64 Algorithm 2: RPERT location-finding and navigation algorithm Broadcast M for n in N do receive Mn if M < Mn then 3 ANCHOR = true 4 end 5 end 6 Compute (x0 , y0 ) 7 if ANCHOR = false then 8 while true do 9 Compute (x1 , y1 ) //ergodic flocking algorithm 10 Move to (x1 , y1 ) 11 Receive (xa , ya ) 12 Compute RSSIa 13 Low-pass filter RSSIa 14 Extended Kalman filter RSSIa 15 Compute Rd 16 Compute Ed 17 Compute �d 18 if Ed > Rd then 19 (x1 , y1 ) = (x1 − �d , y1 − �d ) 20 end 21 else 22 (x1 , y1 ) = (x1 + �d , y1 + �d ) 23 end 24 end 25 end 26 else 27 Broadcast (x0 , y0 ) 28 end 1 2 value. As before, this provides an updated value �d that takes Ed and Rd account. We then add or subtract �d to each coordinate as shown in Algorithm 2, where; M is the MAC address of the RPERT; Mn is the MAC address of other RPERTs within ergodicity threshold range; ANCHOR is a flag toggled when a RPERT is elected as an ergodic anchor; (x0 , y0 ) are initial coordinates; (x1 , y1 ) are the next state; computed using the ergodic flocking algorithm in Chapter 3; (xa , ya ) is the packet sent by the ergodic anchor; RSSIa is the RSSI valued extracted from Pa , Rd is the distance value converted from RSSIa ; Ed is the euclidean distance between (xa , ya ) and (x1 , y1 ); and, �d is the Kalman filtered difference between Ed and Rd . 65 This method is a general approximation. It trades precision for simplicity and ease of calculation. This is necessary as swarm robots do not tend to have a powerful onboard microcontroller. Precision is not required for the ergodic flocking implemented in this work, so this trade is acceptable. 4.2.6 Control Each RPERT has two wheels that are driven independently. Each wheel can be driven forward or backward, allowing the RPERT to pivot in place. Each wheel has an orientation of π/2 with respect to its axle. The mathematical models used to control the robots navigation are purposely uncomplicated are described below. [60] Figure 4.6: Differential Drive Dyamics 66 S= sr − sl . 2 (4.4) This is the average speed of the robot, where sr is the speed of the right wheel, and sl the speed of the left. The robots intended angle of rotation is described as θ= sr − sl + θ0 , b (4.5) where b = 14cm is the invisible line segment connecting the centre of each wheel (the axel), and θ0 is the initial heading of the robot. The coordinates of a robot can then be described as x = S cos(θ) + x0 , (4.6) y = S sin(θ) + y0 , (4.7) where (x, y) are the intended x and y coordinates of the robot, and x0 and y0 are the robots initial x and y coordinates. Each RPERT weighs approximately 270g, and as such the inertia is small enough that changes in speed are treated as instantaneous in this work. It is for this reason that the RPERTs are treated as having first order dynamics, rather than second; acceleration is not taken into account when generating an ergodic trajectory. 4.3 Operating System MINIOS [61] is the operating system installed on each RPERT. MINIOS is an operating system designed for academic use with the SAM4S Xplained Pro. It allows 67 a program to be loaded onto a micro-SD card rather than having all RPERT behaviours hard coded and sitting in kernel space. For this work, MINIOS loads the rpert logic program into memory on startup. This program gives the robot do- main specifications, navigational and dynamics logic, and the distributions to be followed for ergodicity. MINIOS has been extended for this work and the following modifications were made to the kernel: • Drivers and corresponding system calls: motor, IMU, RGB LED, IR optical sensor, XBee Pro; • Low pass filtering, Kalman filtering, IMU functionality, RGB LED, IR optical sensor, XBee Pro functionality, RSSI functionality, ergodicity; and, • Minor scheduler modifications in regards to sleeping processes 4.4 Ergodic Flocking Implementation The RPERT platform was used in a series of three trials intended to replicate the ergodic flocking simulation results of Chapter 3. The three trials were: 1. The aggregation of RPERT’s using ergodicity; 2. Linear ergodic flocking of RPERT’s from one corner of a rectangular domain to the opposite corner; and, 3. Non-linear ergodic flocking of RPERTs in a square pattern around the domain, per the third simulation in Chapter 3. The first two trials were successful, and approximate simulation results. The third trial proved to be impossible, though the limiting factor was the RPERT hardware, rather than the ergodic flocking algorithm. 68 4.4.1 Implementation Parameters For all three trials, the same domain and ergodic parameters were used, with the exception of initial RPERT starting coordinates and the distribution over the domain. The parameter values are shown in Table 4.3. The domain shown in Figure 4.7 was 160 cm × 160 cm. Figure 4.7: Experimental Domain for Trial One and Two d θ Ip (x, y) ψ k T-1 T-2 T-3 R(160) C(15, 15) (65, 30), (50, 135), (115, 95) 60 R(160) Ci (15, 15) (5, 35), (25, 25), (30, 5) 60 R(160) Ci,j (15, 15) NA 60 5 5 3 Table 4.3: Implementation Parameters 69 The parameters and variables for the trials are defined as follows: • T-1, T-2, T-3 correspond to the trial; • d is the domain boundary of the implementation; • R(n) is an n × n rectangle (in this implementation, measured in cm); • φ is the distribution over d; • C(x, y) is the uniform circular distribution centred at (x, y); • Ci (x, y) is a set of uniform circular distributions centred at (x + i, y + i), where i = 0, 1, 2...; • Ci,j (x, y) is a set of uniform circular distributions centred at (x + i, y + j), where i, j = 0, 1, 2...; • Ip (x, y) is the initial coordinates of RPERT p on d; • ψ is the threshold value for local coefficient generation: the maximum value (in this implementation measured in cm); and, • k is a multi-index along the x, y coordinates correlating to the number of Fourier coefficients; the higher k is, the better approximation. Ergodicity and trajectory generation in Chapter 3 relies on continuity of domain and trajectories over the domain. The domain and its distribution φ was discretized into 5 cm × 5 cm squares in order to facilitate quicker, more efficient computation of trajectories [27]. The collision detecting IR sensors on the RPERTs were turned off during all three trials. The current draw between the motors, wireless device, IR sensors and IMU were too much for the single battery powering the RPERT which caused the RPERTs to reset intermittently. This had unintended consequences on Trial Three, discussed below. 70 4.4.2 Results Trials one and two were successful in emulating ergodic aggregation and flocking behaviours. In each trial the ideal trajectories generated by the simulator is supplied in contrast to the implemented robot trials. These comparisons show that though the robots do manage to achieve the required behaviours, the trajectories formed are not ideal in most cases. Trajectory discrepancies are due to the imprecise nature of dead reckoning navigation and RSSI correction. This is because ergodic trajectories at each time step are informed by the RPERTs current (often very approximated) coordinates. 4.4.2.1 Trial One In trial one, the RPERT’s aggregated into the lower left corner of the domain as shown in Fig. 4.8f. Rather than have the RPERTs continue to cover the distribution once they had arrived, they were instructed to stop once they were aware that they were fully within the distribution. The green and yellow RPERTs (Identifier Marianna and Jean-Baptiste respectively) turn and quickly find the distribution as intended. The red RPERT (identifier Horatio) appears to have received wildly inaccurate RSSI correction data, pulling it first towards the Y axis, then in the opposite direction. Eventually the Kalman filter manages to correct its coordinates, and the RPERT is able to find the distribution as shown in Figure 4.8 A video of trial one can be found at: https://youtu.be/tyY35RD3ZxQ. 4.4.2.2 Trial Two In trial two, the three RPERTs track a time-elapsed uniform circle distribution. Marianna and Horatio both end up outside of the domain physically due the propagated error from dead reckoning navigation. This is a limitation of the hardware. 71 Even moving outside the domain, both RPERTs still end up flocking with JeanBaptiste from the lower left corner of the domain to the top right, as required. A video of trial two can be found at: https://youtu.be/YaW7KD5PBws 4.4.2.3 Trial Three In trial three, the RPERTs were to flock in a rectangular formation around the domain. This kind of nonlinear movement caused collision scenarios between robots, as they had to ensure they were covering the time-elapsed distribution as it changed directions. Ergodic flocking does not replicate Reynolds’ three rules of flocking [2] but rather emulates the desired behaviour. Due to ergodicity informed trajectory generation the swarm robot system moves toward a shared heading without dispersing. As long as the distance traveled by a robot in a single time interval is strictly less than the local neighbour threshold distance used in local coefficient generation, the Spectral Multiscale Coverage [14] control law ensures that no two robots generate the same coordinates at any one time step in their trajectory. In the same vein, each time interval is equal to the amount of time it takes a robot to move fully out of the location it occupied at the end of the last time interval. In this way, ergodic flocking avoids collision. Though the ergodic flocking algorithm doesn’t allow for the agents involved to collide, this assumes that the agents are precisely where they believe themselves to be. To make up for this, the IR sensor on each RPERT was intended to be a safety backup, so that if agents did end up colliding due to faulty coordinates, they would still recognise when they had collided. Since the IR collision detector was removed due to the board constantly resetting, rectangular flocking predictably caused the robots to collide with one another. This removed any hope of deadreckoning working as a navigation scheme. Even with RSSI correction, in each 72 flocking attempt the RPERT’s internal domain representation was too inaccurate after collision to be useable. Ultimately the trial was abandoned. 73 (a) Simulation: Time Step t = 25 (b) Simulation: Time Step t = 45 (c) Implementation Graph: Time Step t = (d) Implementation Graph: Time Step t = 70 25 (f) Implementation: Time Step t ≈ 70 (e) Implementation: Time Step t ≈ 25 Figure 4.8: Trial One: Aggregation Simulation vs Application 74 (a) Simulation: Time Step t = 25 (b) Simulation: Time Step t = 45 (c) Implementation: Time Step t = 10 (d) Implementation: Time Step t = 60 (e) Implementation: Time Step t ≈ 25 (f) Implementation: Time Step t ≈ 70 Figure 4.9: Trial Two: Flocking Simulation vs Application 75 Chapter 5 Conclusions and Future Work Flocking tasks involve large groups of individuals aggregating together, and moving to a common target. This behaviour occurs as a consequence of local interactions of agents; schooling fish swim towards the centre of mass of their neighbours, and towards the average heading of all neighbouring fish while avoiding collision with one another. These local interactions, when induced in a robotic swarm, typically require algorithmic behaviours. We have shown that it is possible to emulate some swarm behaviours, specifically aggregation and flocking, with local ergodic trajectories informed by timeevolving uniform distributions. Robots can be aggregated into subsets of a bounded domain by having each robot create an ergodic trajectory that is informed by a static uniform distribution. This thesis provides a working simulation of swarm trajectory generation using only local coefficients - allowing for completely decentralized trajectory generation. In addition, physical implementation via a prototype was developed in order to show the viability of ergodic flocking in a real-life application. In the future, we will explore inducing more swarm behaviours with ergodicity. We believe it is possible to emulate self-assembly and more with careful choice of 76 distribution. This would allow any robot swarm implementing local coefficient generation ergodicity to emulate arbitrary swarm behaviours by simply changing the domain distribution. A different robot platform will be explored in the future. Dead reckoning with RSSI-correction was not suitable for non-linear flocking movements, so while we were able to simulate ergodic flocking behaviour we could not verify it experimentally. 77 Bibliography [1] A. Lioni, C. Sauwens, G. Theraulz, and J.-L. Deneubourg, “Chain formation in Œcophylla Longinoda,” Journal of Insect Behavior, vol. 14, no. 5, pp. 679–696, 2001. [2] C. Reynolds, “Flocks, herds and schools: A distributed behavioral model,” SIGGRAPH: Proc. of the 14th annual conference on computer graphics and interactive techniques, vol. 21, no. 4, pp. 25–34, 1987. [3] P. Walker, S. A. Amraii, M. Lewis, N. Chakraborty, and K. Sycara, “Control of swarms with multiple leader agents,” 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3567–3572, 2014. [4] L. Bayindir, “A review of swarm robotics tasks,” Neurocomputing, vol. 172, no. 8, pp. 292–321, 2016. [5] J. Mclurkin, J. Smith, J. Frankel, D. Sotkowitz, D. Blau, and B. Schmidt, “Speaking swarmish: Human-robot interface design for large swarms of autonomous robots,” 2006 AAAI Spring Symposium, pp. 72–75, 2006. [6] M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarm robotics: A review from the swarm engineering perspective,” Swarm Intelligence, vol. 7, no. 3, pp. 1–41, 2013. [7] S. Miah and J. Knoll, “Area coverage optimization using heterogeneous robots: Algorithm and implementation,” IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 6, pp. 1380–1388, 2018. [8] L. M. Miller, “Optimal ergodic control for active search and information acquisition,” PhD Dissertation, Northwestern University, 2014. [9] E. Gelenbe and Y. Cao, “Autonomous search for mines,” European Journal of Operational Research, vol. 108, no. 2, pp. 319–333, 1998. [10] J. Kennedy and R. C. Eberhart, Swarm Intelligence. Morgan Kaufmann, 2001. [11] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence, from Natural to Artificial Systems. Oxford University Press, 1999. 78 [12] E. Ferrante, A. E. Turgut, C. Huepe, A. Stranieri, C. Pinciroli, and M. Dorigo, “Self-organized flocking with a mobile robot swarm: a novel motion control method,” Adaptive Behavior, vol. 20, no. 6, pp. 460–477, 2012. [13] L. A. V. Reyes and H. G. Tanner, “Flocking, formation control, and path following for a group of mobile robots,” IEEE Transactions on Control Systems Technology, vol. 23, no. 4, pp. 1268–1282, 2015. [14] G. Mathew and I. Mezić, “Metrics for ergodicity and design of ergodic dynamics for multi-agent systems,” Physica D: Nonlinear Phenomena, vol. 240, no. 5, pp. 432–442, 2011. [15] A. Winfield and J. Nembrini, “Safety in numbers: Fault tolerance in robot swarms,” International Journal of Modelling Identification and Control, vol. 1, no. 1, pp. 30–37, 2006. [16] B. A. Karihaloo, K. Zhang, and J. Wang, “Honeybee combs: how the circular cells transform into rounded hexagons,” Journal of the Royal Society, Interface, vol. 10, no. 86, pp. 201–302, 2013. [17] G. F. Young, L. Scardovi, A. Cavagna, I. Giardina, and N. E. Leonard, “Starling flock networks manage uncertainty in consensus at low cost,” PLOS Computational Biology, vol. 9, no. 1, pp. 1–7, 2013. [18] D. Graff, D. Röhrig, R. Jasper, H. Parzyjegla, G. Mühl, and J. Rabaey, “Operating system support for mobile robot swarms,” Proceedings of the Second International Workshop on the Swarm at the Edge of the Cloud (SWEC), pp. 7–12, 2015. [19] A. Becker, C. Onyuksel, T. Brett, and J. McLurkin, “Controlling many differential drive robots with uniform control inputs,” The International Journal of Robotics Research, vol. 33, no. 13, pp. 1626–1644, 2014. [20] D. Maftuleac, S. Kyou Lee, S. P. Fekete, A. K. Akash, A. López-Ortiz, and J. McLurkin, “Local policies for efficiently patrolling a triangulated region by a robot swarm,” 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1809–1815, 2015. [21] M. Rubenstein, A. Cornejo, and R. Nagpal, “Programmable self-assembly in a thousand-robot swarm,” Science, vol. 345, no. 8, pp. 795–799, 2015. [22] S. Nuske, S. Choudhury, S. Jain, A. Chambers, L. Yoder, S. Scherer, L. Chamberlain, H. Cover, and S. Singh, “Autonomous exploration and motion planning for an unmanned aerial vehicle navigating rivers,” Journal of Field Robotics, vol. 32, no. 8, pp. 1141–1162, 2015. [23] L. E. Parker, “Reliability and fault tolerance in collective robot systems,” ch. 6, pp. 612–644, Stanford, USA: Jenny Stanford Publishing, 2011. 79 [24] L. S. Schulman and P. E. Seiden, “Statistical mechanics of a dynamical system based on Conway’s game of life,” Journal of Statistical Physics, vol. 19, no. 3, pp. 293–314, 1978. [25] A. Illachinski, Cellular Automata: A Discrete Universe. World Scientific, 2001. [26] K. E. Peterson, Ergodic Theory. Cambridge University Press, 1990. [27] L. Dressel and M. J. Kochenderfer, “Tutorial on the generation of ergodic trajectories with projection-based gradient descent,” IET Cyber-Physical Systems: Theory & Applications, 2018. [28] L. M. Miller, Y. Silverman, M. A. MacIver, and T. D. Murphey, “Ergodic exploration of distributed information,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 36–52, 2016. [29] L. M. Miller and T. D. Murphey, “Trajectory optimization for continuous ergodic exploration,” 2013 American Control Conference, pp. 4196–4201, 2013. [30] M. Vecchio and R. López-Valcarce, “Improving area coverage of wireless sensor networks via controllable mobile nodes: A greedy approach,” Journal of Network and Computer Applications, vol. 48, no. 1, pp. 1–13, 2015. [31] L. Dressel, “Ergodiccontrol.jl.” https://github.com/dressel/ErgodicControl.jl, last accessed: 12.03.2019. [32] L. M. Miller and T. D. Murphey, “Trajectory optimization for continuous ergodic exploration on the motion group se(2),” IEEE Int. Conf. on Decision and Control, pp. 4517–4522, 2013. [33] I. Abraham, A. Prabhakar, M. J. Z. Hartmann, and T. D. Murphey, “Ergodic exploration using binary sensing for non- parametric shape estimation,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 827–834, 2017. [34] H. Salman, E. Ayvali, and H. Choset, “Multi-agent ergodic coverage with obstacle avoidance,” Twenty-Seventh International Conference on Automated Planning and Scheduling, pp. 242–249, 2017. [35] G. Antonello, S. Chiaverini, and A. Marino, “A coordination strategy for multi-robot sampling of dynamic fields,” 2012 IEEE International Conference on Robotics and Automation, pp. 1113–1118, 2012. [36] A. Hubenko, V. A. Fonoberov, G. Mathew, and I. Mezić, “Multiscale adaptive search,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 4, pp. 1076–1087, 2011. [37] E. Ayvali, H. Salman, and H. Choset, “Ergodic coverage in constrained environments using stochastic trajectory optimization,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5204–5210, 2017. 80 [38] G. De La Torre, K. Flaßkamp, A. Prabhakar, and T. D. Murphey, “Ergodic exploration with stochastic sensor dynamics,” 2016 American Control Conference (ACC), pp. 2971–2976, 2016. [39] V. Gazi, B. Fidan, L. Marques, and R. Ordonez, “Robot swarms: Dynamics and control,” in Mobile Robots for Dynamic Environments, ch. 9, pp. 1–32, New York, USA: ASME Publishing, 2015. [40] P. J. S. Goncalves, P. J. D. Torres, C. M. O. Alves, and F. Mondada, “The epuck, a robot designed for education in engineering,” Proceedings of the 9th Conference on Autonomous Robot Systems and Competitions, vol. 1, no. 1, pp. 59–65, 2009. [41] M. Rubenstein, B. Cimino, R. Nagpal, and J. Werfel, “Aerobot: An affordable one-robot-per-student system for early robotics education,” 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 6107–6113, 2015. [42] F. Mondada, A. Guignard, M. Bonani, M. Lauria, and D. Floreano, “Swarm-bot: from concept to implementation,” Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp. 1626–1631, 2003. [43] G. Caprari, P. Palmer, R. Piguet, and R. Siegwart, “The autonomous micro robot ”Alice”: a platform for scientific and commercial applications,” MHA’98. Proceedings of the 1998 International Symposium on Micromechatronics and Human Science, pp. 231–235, 1998. [44] A. Özgür, S. Lemaignan, W. Johal, M. Beltran, M. Briod, L. Pereyre, F. Mondada, and P. Dillenbourg, “Cellulo: Versatile handheld robots for education,” HRI ’17 Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, pp. 119–127, 2017. [45] F. Arvin, J. Murray, C. Zhang, and S. Yue, “Colias: An autonomous micro robot for swarm robotic applications,” International Journal of Advanced Robotic Systems, vol. 11, no. 7, pp. 635–645, 2014. [46] F. Arvin, K. Samsudin, and A. R. Ramli, “Development of a miniature robot for swarm robotic application,” International Journal of Computer and Electrical Engineering, vol. 1, no. 4, pp. 1793–8163, 2009. [47] J. McLurkin, A. J. Lynch, S. Rixner, T. W. Barr, A. Chou, K. Foster, and S. Bilstein, “A low-cost multi-robot system for research, teaching, and outreach,” in Distributed Autonomous Robotic Systems, ch. 4, pp. 597–609, Berlin, Heidelberg: Springer, 2013. [48] ARM, Cortex-M4 - Revision r0p0: Technical Reference Manual, 2010. Rev. 2. 81 [49] Bosch Sensortec, BNO055 Intelligent 9-axis Absolute Orientation Sensor Datasheet, 2016. Rev. 1.4. [50] Digi International Inc, XBee®/XBee-PRO® RF Modules: Product Manual, 2009. Rev. v1.xEx - 802.15.4. [51] H. Teimoori and A. V. Savkin, “Distance-only based navigation of wheeled mobile robots with obstacle avoidance,” 2008 IEEE International Conference on Robotics and Biomimetics, pp. 1956–1961, 2008. [52] J. Svečko, M. Malajner, and D. Gleich, “Distance estimation using RSSI and particle filter,” ISA Transactions, vol. 55, no. 3, pp. 275–285, 2015. [53] A. Cornejo and R. Nagpal, “Distributed range-based relative localization of robot swarms,” in Algorithmic Foundations of Robotics XI. Springer Tracts in Advanced Robotics, pp. 91–107, Heidelberg, New York, London: Springer, 2015. [54] L. Oliveira, H. Li, L. Almeida, and T. E. Abrudan, “RSSI-based relative localisation for mobile robots,” Ad Hoc Networks, vol. 13, no. 2, pp. 321–335, 2014. [55] O. S. Oguejiofor, V. N. Okorogu, A. Adewale, and B. O. Osuesu, “Outdoor localization system using RSSI, measurement of wireless sensor network,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 2, no. 2, pp. 1–6, 2013. [56] M. B. Kilani, A. J. Raymond, f. Gagnon, and G. Gagnon, “RSSI-based indoor tracking using the extended kalman filter and circularly polarized antennas,” 2014 11th Workshop on Positioning, Navigation and Communication (WPNC), pp. 1–6, 2013. [57] M. Cao, C. Yu, and B. D. Anderson, “Formation control using range-only measurements,” Automatica, vol. 47, no. 4, pp. 776–781, 2011. [58] W. Ni and D. Cheng, “Leader-following consensus of multi-agent systems under fixed and switching topologies,” Systems & Control Letters, vol. 59, no. 4, pp. 209–217, 2010. [59] L. Ding, Q.-L. Han, and G. Guo, “Network-based leader-following consensus for distributed multi-agent systems,” Automatica, vol. 49, no. 7, pp. 2281– 2286, 2013. [60] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control. Wiley, 2005. [61] R. R. Otero and A. A. Aravind, “Minios: An instructional platform for teaching operating systems projects,” SIGCSE ’15 Proceedings of the 46th ACM Technical Symposium on Computer Science Education, pp. 430–435, 2015. 82 [62] D. Krupke, M. Hemmer, J. Mclurkin, Y. Zhou, and S. P. Fekete, “A parallel distributed strategy for arraying a scattered robot swarm,” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2795– 2802, 2015. [63] J. McLurkin and et al, “A robot system design for low-cost multi-robot manipulation,” 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 912–918, 2014. [64] M. W. Lo, “The coverage of elliptical orbits using ergodic theory,” 2004 IEEE Aerospace Conference Proceedings, pp. 541–546, 2004. [65] C. B. Ivić, Ivan and I. Mezić, “Ergodicity-based cooperative multiagent area coverage via a potential field,” IEEE Transactions on Cybernetics, vol. 47, no. 8, pp. 1983–1993, 2017. [66] M. Gerla, T. J. Kwon, and G. Pei, “On-demand routing in large ad hoc wireless networks with passive clustering,” 2000 IEEE Wireless Communications and Networking Conference, pp. 100–105, 2000. [67] D. A. Shell, C. V. Jones, and M. J. Matarić, “Ergodic dynamics by design: A route to predictable multi-robot systems,” Multi-Robot Systems: From Swarms to Intelligent Automata,, vol. 3, pp. 291–297, 2005. 83