A FLEXIBLE SIMULATION FRAMEWORK FOR THE STUDY OF DEADLOCK RESOLUTION ALGORITHMS IN MULTICORE SYSTEMS by Dhruv Desai B.E., Gujarat Technological University, 2012 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA February 2016 © Dhruv Desai, 2016 Abstract Deadlock is a common phenomenon in software applications, yet it is ignored by most operating systems. Although the occurrence of a deadlocks in systems is not frequent , in some cases , the effects are drastic when deadlock occurs. The ongoing trend in processor technology indicates that future systems will have hundreds and thousands of cores. Due to this imminent trend in hardware development , the problem of deadlock has gained renewed attention in research. Deadlock handling techniques that are developed for earlier processors and distributed systems might not work well with multicore systems, due to their architectural differences. Hence, to maximize the utility of multicore systems , new programs have to be carefully designed and tested before they can be adopted for practical use. Many approaches have been developed to handle deadlock in multicore systems, but very little attention has been paid to comparing the performance of those approaches with respect to different performance parameters. To fulfil the above mentioned shortfalls, we need a flexible simulation testbed to study deadlock handling algorithms and to observe their performance differences in multicore systems . The development of such a framework is the main goal of this thesis. In the framework, we implemented a general a scenario, scenario for the Dining Philosopher 's problem and scenario for the Banker's algorithm. In addition to these scenarios, we demonstrate the flexibility, soundness, and use of the proposed framework by simulating two different deadlock handling strategies - deadlock avoidance (the Banker 's algorithm) and deadlock detection (Dreadlocks). The deadlock detection is followed by deadlock recovery to resolve the detected deadlock. We also present result analysis for the different set of experiments performed on the implemented strategies. The proposed simulation testbed to study deadlocks in multicore systems is developed using Java. Contents Abstract Contents ii List of Figures V List of Tables vii Acknowledgements viii 1 Introduction 1 1.1 Overview. 1 1.2 Trends in Hardware Development 6 1.2.1 Distributed Systems vs Multicore Systems 6 1.2.2 Deadlock: Distributed Systems vs Multicore Systems 7 1.3 Motivation . . 10 1.4 Contributions 11 1.5 Thesis Organization . 13 2 Background and Related Work 14 2.1 Deadlock Handling Approaches in Multicore Systems 14 2.1.1 Language-level Approaches . 15 2.1.2 Static Analysis . . 17 2.1.3 Dynamic Analysis . 19 11 2.1.4 Model Checking . . . . . . . . . . . . . . 22 2.1.5 Type and Annotation Based Techniques 23 2.2 Performance Evaluation Studies of Deadlock Handling Algorithms 24 2.3 Summary 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Simulation Framework for Deadlock in Multicore Systems 28 3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Architecture of Multicore Scheduler System Framework 30 3.2.1 Workload Generator 31 3.2.2 Machine . 32 3.2.3 Scheduler 32 3.2.4 Execution Trace and I/0 Trace 33 3.2.5 Performance Calculation Engine . 34 3.3 Simulation of Deadlock in Multicore Systems . 34 3.4 Architecture of Simulation Framework for Deadlock in Multicore Systems 37 3.5 3.6 3.7 3.4.1 Workload Generator . 40 3.4.2 Scheduler and Machine 43 3.4.3 Resource Manager 44 Algorithms Implemented . 44 3.5.1 Deadlock Avoidance: Banker's Algorithm . 45 3.5.2 Deadlock Detection: Dreadlocks . . . . . 48 3.5.3 Deadlock Recovery: Process Termination 49 User Interface . . . . . . . . . . . . . .. . . . . 51 3.6.1 Performance Parameter Setting Window 52 3.6.2 Performance Observation Window . 55 Summary . . . .. .. . . . . . . . . . . . lll 56 4 Execution Trace and Performance Evaluation 57 4.1 Traces . . . . . . . . . . 57 4.2 Performance Evaluation 59 4.3 List of Performance Metrics 62 4.4 5 Performance Metrics (for Process i) 62 4.3.2 Performance Metrics (for System) 63 Summary . . . . . . . . . . . . . . . . . 65 Simulation Study 66 5.1 Experimental Setup . 66 5.2 Analysis . . . . .. . 69 5.2.1 Observations from Experiment 1 . 70 5.2.2 Observations from Experiment 2 . 74 5.2.3 Observations from Experiment 3. 77 Summary . . . . . . . . . . . . . . . . . 82 5.3 6 4.3.1 Conclusions and Future Work 83 6.1 Conclusion . . . . 83 6.2 Future Directions 84 Bibliography 86 lV List of Figures 1.1 Deadlock Situation . . . . . . . 2 3.1 Architecture of MSS Framework 31 3.2 Core Architecture of MSS Framework . 38 3.3 Basic System Requirements for Deadlock 39 3.4 Architecture of Simulation Framework for Deadlock in Multicore Systems 40 3.5 Dining Philosopher's Scenario 42 3.6 Example of Dreadlocks . . . . 48 3.7 Main Window of the Simulator 51 3.8 Workload Parameter Setting Window 52 3.9 Configuration Parameter Setting Window . 53 3.10 Simulation Run Window . . . .. . 54 3.11 Performance Observation Window . 55 4.1 Sample DL Trace . . . . . . . . . . . 59 5.1 Throughput: Minimum Runtime and Maximum Runtime (by varying the number of cores) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 System ARR: Minimum Runtime and Maximum Runtime (by varying the number of cores) 5.3 72 Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by varying the number of cores) . . . . . . . . . . . . . . . . . . . . . . . 5.4 73 Overhead Ratio: Minimum Runtime and Maximum Runtime (by varying the number of cores) . . . . . . . . . . . . . . . . . . . . . . . . . . 74 V 5.5 Throughput: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 System ARR: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5. 7 Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) . . . . . . . . . . . . . . . . . . . . . . 5.8 Overhead Ratio: Min. Runtime and Max. Runtime (by varying the mean arrival rate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 76 77 Turnaround Time: Banker's Algorithm (by varying the number of cores) 78 5.10 Throughput: Banker's Algorithm (by varying the number of cores) . . . 79 5.11 Process Wait Time: Banker's Algorithm (by varying the number of cores) 80 5.12 Overhead Ratio: Banker 's Algorithm (by varying the number of cores) VI 81 List of Tables 3.1 Entities, States and Events . 35 3.2 System in a Safe State . . . 46 3.3 System in An Unsafe State . 47 5.1 Simulation Parameters for Experiment 1 67 5.2 Simulation Parameters for Experiment 2 68 5.3 Simulation Parameters for Experiment 3 69 vii Acknowledgements Although the title page of this thesis shows my name, this thesis would not have been possible without the contributions of many others. I cannot possibly name here all of them, but I would like to forward my special thanks to a few. My deepest gratitude is to my supervisor Dr. Alex Aravind for his continuous guidance, support, constant encouragement and endless patience. Alex has been a mentor not only academically, but also throughout my graduate experience at UNBC. I would like to thank my supervisory committee members Dr. Waqar Haque and Dr. Geroge Jones, for their time and valuable inputs that helped in shaping this thesis up. I am also thankful to Dr. Ajit Dayanandan for his suggestions during the initial stage. My thanks are also due to the external examiner Dr. Balbinder Deo and the chair of my defense for reading this thesis. I extend a special thanks to Dr. Andreas Hirt for giving me opportunities to assist him in his courses. A special word of gratitude to all my colleagues over the past few years who have helped me to become a better researcher and a better person. The list would definitely include Dhawal, Denish, Giridhar, Darshik, Rafael, Rahim, Shanthini, Suresh, Behnish, Raman and Mehul. I also thank Brooke for proofreading the draft of my thesis. I specially thank our systems administrator Chris for always being there for technical support. A special thanks to Ms. Mahi Aravind for her support and great food on various occasions. Most importantly, none of this would have been possible without the love and patience of my family. I will be forever thankful to my parents and my brother for their unwavering support and encouragement. Vlll Chapter 1 Introduction Deadlock was introduced and studied in the mid-1960s by Dijktra [11], originally referred to as "Deadly Embrace". Additional work was done in late 1960s and early 1970s by Coffman [9], Habermann [16], Holt [20] and others. In this chapter, we start by introducing the problem of deadlock, necessary conditions for deadlock to happen, and techniques to deal with deadlock. A discussion about changing trends in hardware, the rationale behind this thesis, and the contribution it lends to the knowledge will be discussed further on. 1 .1 Overview Deadlock is a situation in a resource allocation system in which two or more processes are in a simultaneous wait state, each one waiting for one of the others to release a resource before it can proceed [4]. In the above definition of deadlock, processes are the active entities in a system that share resources such as data items in a database, a data structure, a file etc. Figure 1.1 shows a system in a deadlocked state. 1 Figure 1.1: Deadlock Situation In the figure above, ellipses represent processes while rectangles represent resources. An arrow going towards a resource from a process shows that a request is made by the process to acquire the resource, while an arrow going towards a process from a resource shows that the resource is held by the process. Moreover, a resource may have several instances and in that case a process can request one or more instances of a resource. Presence of four conditions is necessary for deadlock to occur in any system. These conditions are known as Coffman's conditions as they are from a 1971 publication by Coffman et al. [9]. Coffman's conditions are Mutual exclusion, Hold and Wait , No preemption and Circular wait. • Mutual exclusion means that only one process may use a resource at a time, in a given system. • Hold and wait describes that a process may hold allocated resources while wait- 2 ing for the requested resources to be allocated. • No preemption indicates that no resources can be forcefully taken back from a process holding it, unless a process releases the held resources upon completion. If the first three conditions hold, as a consequence, a system may end up in the fourth condition of circular wait. • Circular wait is defined as, a closed chain of processes, such that each process holds at least one resource needed by the next process in the chain [31 J. In Figure 1.1 , process Pl has exclusive access to resource R2 and process P2 has exclusive access to resource Rl (Mutual Exclusion). To complete their executions, process Pl needs resource Rl and process P2 needs resource R2 (Hold and wait). Any process can not release the resources that they hold before finishing their executions (No preemption). Thus both processes wait for the resources that are held by other processes ( Circular wait) and hence they stay in waiting state forever. In this situation, the system is deadlocked. Knapp [22] and Singhal [30] discuss two types of deadlock: resource deadlocks and communication deadlocks. Resource deadlocks involve reusable resources while communication deadlocks involve consumable resources. A reusable resource is one that can be safely used by only one process at a time and is not depleted by that use such as CPU, printer, disk etc. whereas a consumable resource is one that can be created and consumed such as messages or signals used between processes to communicate [31 J. This thesis focus only on resource deadlock, referred to as deadlock further. There are three general strategies to deal with deadlock (i) Deadlock prevention, (ii) Deadlock avoidance and (iii) Deadlock detection and recovery. 3 • Deadlock prevention: A deadlock prevention technique breaks at least one of the four Coffman's conditions [9]. Breaking one of the first three conditions is hard because it is difficult to determine which of the conditions may lead the system to deadlock. Moreover, satisfying these conditions is the fundamental requirement for most of the systems. Hence , most of the deadlock prevention techniques work by preventing circular waiting of processes. • Deadlock Avoidance : If some specific information, such as all required re- sources, about processes in a system are known in advance, deadlock can be avoided by always keeping the system in a safe state. A system is considered to be in a safe state if all the processes can complete their execution without forming a deadlock [17]. All the systems are in a safe state at the beginning because the processes can always be executed sequentially. Deadlock can never occur in a sequential execution as there is no competition for resources. A safe state never ends in a deadlock. Also not all unsafe states result in a deadlock. Safe or unsafe states only determine the probability that a system might enter a deadlock. One of the limitations of this strategy is that knowing all the required resources in advance is unrealistic. Determining that information in advance is not possible for modern systems due to their dynamic behaviour. The Banker 's algorithm is a popular method for deadlock avoidance which is explained in Chapter 3. • Deadlock Detection and Recovery: Due to the limitation of prevention and avoidance algorithms as discussed above, detection and recovery techniques are more popular in practical systems. They allow processes to acquire resources whenever possible and a check can be done periodically to detect if deadlock has occurred in the system. Upon detection of deadlock, a recovery strategy is applied. Typically, a recovery strategy includes aborting all or some of the 4 deadlocked processes. These techniques are suitable in systems where deadlocks are not frequent, and recovery cost is not very expensive. We have implemented a deadlock detection technique known as Dreadlocks [23] in our system. This technique is explained in Chapter 3. Once a deadlock is formed and detected in t he system , t he next step is deadlock recovery. Deadlock recovery involves t he system taking corrective actions by selecting some processes t o forcefully terminate and make t heir acquired resources available. The terminated processes can be re-init iated later. If t he selection of victim processes is not done carefully, abort ing can lead to livelocks. " L ivelock is a sit uation in which two or more processes continuously change t heir states in response to changes in t he other process(es) wit hout doing any useful work" [31J. The selection criteria to choose one or more processes to terminate are discussed in Chapter 3. Most of t he modern operating systems- including Windows and t he UNIX familyignore deadlock [32]. In t hese systems, if t he user observe t hat t he system is slowing down or freezes due to some problem , probably deadlock, it is simply restarted. There are some reasons to ignore a deadlock: (i) If t he occurrence of deadlock is very rare, and prevent ion, avoidance or detection and recovery overhead is very high, (ii) If restart ing jobs does not incur significant ly high costs, and (iii) If other deadlock handling techniques make t he system slow, or are restrictive and complex, ignoring deadlock is preferred. Ignoring deadlock is chosen assuming t hat most users would prefer t he occasional occurrence of deadlock rather t han living wit h t he limitations of deadlock handling strategies. 5 1.2 Thends in Hardware Development There has been a significant change in computer systems hardware in the past couple of decades. Initially, most systems were standalone using single cores. Later, distributed systems were developed, followed recently by multicore systems becoming popular for both personal and commercial purposes. Hardware developments in turn demand changes in programming for effective use of multicore systems. So, multithreaded programs designed to frequently share resources is expected to become a norm for program development in the future . 1.2.1 Distributed Systems vs Multicore Systems Distributed systems consist of geographically distant , separate, autonomous comput- ers, connected through a network, communicating with each other by passing messages. They could often coordinate their activities to solve a common problem or to share the computing resources and storage devices of the systems. System or application processes executing on any network node can use both local and remote sha red resources , simultaneously and exclusively. A multicore processor is an inte- grated circuit (IC) to which two or more processors have been attached for enhanced performance, reduced power consumption, and more efficient simultaneous processing of multiple tasks. Distributed and multicore systems share two similarities: concurrency and resource sharing. To achieve these, nodes in distributed systems and cores in multicore systems have to coordinate with each other. For the purposes of communication and coordi- 6 nation, message passing is used in distributed systems, whereas reading and writing through shared memory is used in multicore systems. Each node in distributed systems is an autonomous system while in multicore systems a core is not. Concurrency and resource sharing can lead to deadlock in both the systems. 1.2.2 Deadlock: Distributed Systems vs Multicore Systems Two types of deadlock can occur in distributed systems, either resource deadlock or communication deadlock. Resource deadlock happens mainly because of circular dependencies of processes on each other for resources. Communication deadlock primarily occurs due to lost or delayed messages. Because of the architectural differences discussed above, resource deadlocks are more common than communication deadlocks in multicore systems. Deadlocks are dealt with either in a centralized way or a distributed way in distributed systems. In centralized deadlock handling algorithms, a single node keeps list of all other nodes and has complete authority to allocate and preempt resources from all other nodes. In distributed deadlock handling algorithms all the nodes keep list of other nodes and update the list frequently, whereas in multicore systems, deadlocks are mostly dealt with in a centralized way because cores are not autonomous and are typically managed by a single operating system. Distributed deadlock handling algorithms are developed keeping communication overhead in mind. Multicore systems do not have to deal with the overhead of keeping the node address-lists and different topologies among the nodes. Moreover , to the best of our knowledge, no performance evaluation studies have been performed for multicore systems. Multicore systems use locks to avoid data races in programs. Improper use of these 7 locks can create deadlock in t he system. Improper lock acquisition can happen due to following reasons: (i) It becomes difficult to follow t he lock order discipline t hat could avoid deadlock as most of t he t ime software systems are written by multiple programmers. (ii) As stated before, adding new locks to fix race conditions introduces deadlock. (iii) Sometimes t hird-party software, such as plugins, are incorporated in software systems; such t hird-party software may not follow t he locking order discipline followed by t he software systems and t his could result in deadlock. Due to t he uncertainty of deadlock and its occurrence in modern systems, researchers have come up with different approaches to deal wit h deadlock in mult icore systems. The approaches are classified as follows: Language-level [5 , 18], Static analysis [12, 14, 27], Dynamic analysis [1, 21, 29], Model checking [15] and Type and annotation based techniques [7]. Language-level approaches strive to make concurrency control easier by providing higher level constructs t hat do not allow uses t hat can cause errors. Even t hough languages alleviate some of t he complexity of concurrency, t he problem is not completely eliminated. Static program analysis techniques do not require any execution of t he application as t hey directly work wit h source code. They examine all t he possible deadlocks and often give no false negatives, but t hey report many false positives. Dynamic program analysis techniques find all potential deadlocks during t he program execution. As they operate at runtime, t hey only visit feasible paths and have accurate views of t he values of variables. M odel checking takes simplified descriptions of t he code and uses techniques to explore vast state spaces efficient ly as it systematically tests t he code on all inputs . Because of t his, t his approach does not need to explore massive state spaces in order to find deadlocks. Type and annotation based techniques help to avoid deadlocks 8 during coding by allowing programmers to specify a partial order among locks. Also, t he type checker statically ensures t hat t he partial order among locks is maintained, and well-typed programs are deadlock-free. We discuss t hese approaches in detail in Chapter 2. Parallel systems share more similarities with mult icore systems than distributed systems. But due to t heir architectural and functional level differences, deadlock handling techniques for parallel systems cannot be directly applied to mult icore systems. The differences between parallel and mult icore system are as follows: • Large tasks are divided into several, similar , smaller subtasks to carry out computations in parallel systems. Then , each of t he smaller subtask is solved independently and at the same t ime. The results of t he smaller subtasks are combined up on completion. In multicore systems, t he tasks are often not related to each other. Also, t hey often do not work at t he same time rather multicore systems use concept of interleaving. • Parallel systems are basically designed for speed hence t hey have well defined bases such as switching context. They also share less of controllers and cache memory. Context switching depends on type of scheduler in multicore syst ems. Multicore systems also share more of controllers and cache t han parallel systems. Another type of hardware system , t hat is similar to multicore systems, is multiprocessor systems. There are multiple ICs in single processor of multicore systems whereas multiprocessor systems consist of more t han one processors. Each of t he processor in multiprocessor systems may contain one or more ICs in t hem. In terms of effi ciency, mult icore systems perform better on a single program because t he cores can 9 execute multiple instructions at the same time but not multiple programs . When using multiple programs, multiprocessor system perform better than multicore systems. Due to the differences in their architectures, multicore system is more favourable system for ordinary users. Any extra support or configuration is not required for multicore systems and they cost less too. Multiprocessor systems are more favourable for special purpose uses. Often they require extra support or configuration based on the purpose, and they are expensive too. 1.3 Motivation Deadlock is a very common phenomenon in software applications. Lu et al. showed that out of 105 randomly selected real world concurrency bugs that are collected from 4 large open-source applications: MySQL, Apache, Mozilla and OpenOffice representing both server and client applications, 31 are deadlock bugs [26]. Also, a report produced by Oracle's bug database shows that roughly 6500 out of 198,000 reports, or 3%, contain the keyword 'deadlock' [27]. Although occurrence of deadlocks in systems is not frequent in some cases, the effect are drastic when deadlock occurs. During the testing phase, deadlocks can be easily created and detected statically but detecting deadlocks dynamically is difficult because concurrent applications are non-deterministic. Deadlocks happen under certain intricate sequences of lowprobability events, and producing with this subtle sequence of low-probability events is not guaranteed during execution of the application. Stress testing or random testing may not always show the existence of deadlocks in an application, but the probability of their occurrence increases greatly when an application is released to thousands of users. Therefore, deadlock detection tools are required to analyze and find real and 10 potential deadlocks in application. From the discussion in the previous section, we observe that a lot of research has been conducted to develop techniques to handle deadlocks. But very little attention has been paid towards comparing the performance of those approaches to deal with deadlock. To the best of our knowledge , no research has been pursued to compare the performance of different algorithms in multicore systems. All of the performance comparisons that we could find were done on distributed systems, distributed database systems, flexible manufacturing systems and others. Moreover, only a few of the performance studies use simulation testbeds to analyze the performances. The ongoing trend in processor technology indicates that future systems will have hundreds and thousands of cores. To maximize their utility, new programs have to be carefully designed and tested using a large number of cores and a proper set of experiments before they can be adopted for practical use. Thus the significant presence of deadlocks in large-scale, real-world applications, the uncertainty around their occurrence, their crucial effect and lack of research work towards performance study in multicore systems are the main motivating factors for this research. Awareness that the field of multicore systems is quite new and emerging compared to other well-established computer systems also provides rationale for this study. 1 .4 Contributions As discussed before, deadlock handling techniques that are developed for earlier processors and distributed systems will not work well with multicore systems, mainly 11 because of the difference in the architecture of these systems. Moreover, communication overhead is one of the most influencing factors for deadlock handling techniques that are used in distributed systems. As multicore systems use shared memory, the effect of communication overhead is rare. Hence, there is a need to explore more deadlock handling techniques that are efficient in multicore systems. However , there is also lack of availability of tools to study deadlock handling techniques with respect to different performance parameters in multicore systems. This thesis helps to fulfil the above mentioned shortfalls in the field of computer science research. Studying performance of the implemented deadlock handling techniques and its effect in multicore systems is very important. Developing a flexible simulation testbed would be very useful to study deadlock handling algorithms and also to observe different performance parameters for those algorithms in multicore systems. Development of such a flexible framework is one of the primary contributions of this thesis. All the primary contributions of this thesis are listed as follows: • A flexible simulation framework to incorporate different types of deadlock handling algorithms • Three different scenarios implemented to simulate different type of deadlock handling techniques. The scenarios are: General scenario , scenario for the Dining Philosopher's problem, scenario for the Banker's algorithm • Two different deadlock handling strategies are simulated in the framework with the above mentioned scenarios. The deadlock avoidance strategy is the Banker's algorithm and deadlock detection strategy is Dreadlocks. Deadlock detection is followed by deadlock recovery- termination of one of the victim processes to end the deadlock. 12 • Result analysis of experiments performed on implemented algorithms with respect to various performance metrics 1.5 Thesis Organization The overview of the problem of deadlock, necessary conditions that are required for the formation of deadlock and different approaches to handle deadlocks were briefly explained in this chapter. In Chapter 2, a detailed literature review of the current methods to deal with the problem of deadlock in multicore system is presented. The review and the methods are categorized by different types. At the end of this chapter we present a comprehensive survey of all the performance evaluation studies done on deadlock handling techniques in different systems. In Chapter 3, we present the details about design and implementation of the simulation framework for deadlock in multicore systems; the primary contributions of this thesis are explained in this chapter. Later in this chapter , we explain different deadlock handling strategies that are implemented in our framework. We discuss various traces generated from the simulation run and the calculation of performance metrics from these traces in Chapter 4. Results and experimentation are described in Chapter 5. Lastly, we conclude in Chapter 6 with discussing possible future work that can be conducted to extend the research carried out in this thesis. 13 Chapter 2 Background and Related Work Research for t his t hesis began wit h a paper t hat my supervisor gave me, about a deadlock detection algorit hm, called "Dreadlocks: Effi cient Deadlock Detection " [23], developed by Eric Koskinen and Maurice Herlihy. This algorit hm to detect deadlocks in mult it hreaded programs gained our attent ion and we began to look for ot her approaches t hat have been developed to deal wit h deadlocks in mult icore systems. We explain Dreadlocks in detail in Chapter 3. Here, we discuss t he other interesting approaches in t he literature. 2.1 Deadlock Handling Approaches in Multicore Systems As briefly discussed in Chapter 1, many different approaches have been develop ed to handle deadlocks in mult icore systems. The approaches are classified in five categories (i) Language-level approaches , (ii) Static Analysis, (iii) Dynamic Analysis, (iv) Model Checking and (v) Type and annotation based techniques. They are explained in detail as follows: 14 2 .1.1 Language-level A pp roaches Language level approaches make concurrency cont rol easier by providing higher level constructs t hat do not allow error-prone uses . This can be achieved by statically binding shared variables to t he locks in order to protect t hem. Two different languagelevel approaches are discussed here. First, we discuss a framework implemented in J ava which is presented by Bensalem et al. [5]. This framework confirms deadlock potentials, detected by runt ime analysis of a single run of a multit hreaded program. The multit hreaded program under examination is implemented to produce lock and unlock events. A trace is generated when t he implemented program is executed. The trace consists of t he lock and unlock operations performed during t he specific run. An observer is constructed using each cycle t hat can detect t he occurrence of t he corresponding real deadlock. Here, an observer is a data structure t hat characterizes all possible interleavings t hat lead to a deadlock state [5] . It also checks t he possibility of t he deadlock reoccurring during subsequent test runs. In t his framework, a controller determines the optimal scheduling strategy t hat will maximize t he probability for t he corresponding real deadlock to occur, when composed wit h t he program. The methodology of t he approach involves four phases: (i) By examining a single execut ion t race, deadlock potentials in multit hreaded program and t he system under test (SUT ) are automatically detected , (ii) Automatic generation of an observer for each cycle in t he resulting lock graph , (iii) Implementation of t he system under test in order to confirm deadlock potent ials and (iv) By performing multiple "controlled " ".mcorrec t execu t·1011 t races " [5]. runs " search mg 15 Elaborating on these phases, the first step involves detecting deadlock potentials in the SUT. Then, the instrumentation module automatically implements the bytecode class files of the multithreaded program under test. This is done by adding new instructions that, when executed, generate the execution trace consisting of lock and unlock events. The observer module reads the event stream and performs the deadlock analysis. While the lock and unlock events are observed, the implemented program under observation is executed. Then a graph is constructed with the edges between locks suggesting lock orders. Any cycle in the graph signals a deadlock. Once the lock graph is generated , an observer for each cycle in the lock graph is generated automatically. The observer observes the SUT. In the last step , execution of SUT is controlled. This execution contains deadlock if it is "accepted" by the observer [5]. The second approach works for parallel programs that use message passing for communication. One of the common problems in such programs is the detection of deadlocks. Haque presented a deadlock detector, Message Passing Interface Deadlock Detector (MPIDD) , to dynamically detect deadlocks in parallel programs that are written using C++ and Message Passing Interface (MPI) [18]. There are two main parts of the MPIDD system, the detector and the MPIDD wrappers. The detector receives input from the client program about MPI commands being used. Then by using this information, the detector constructs a state and determines if there is a deadlock in the client programs. The MPI function call wrappers stimulate the client program's original calls and provide the information to the detector. MPI's profiling layer requires no significant modification of the user's code and incurs very little overhead when invoked. This property of the MPI's profiling layer is used by the detector. A wrapper containing information needed by the detector is created using the MPI 's profiling layer. The actual function call is embedded inside this wrapper. 16 The new wrapped routines relay the information of what commands are being issued to the deadlock detector program. The key advantages of this tool are its portability, very low overhead and the use of MPI's profiling layer that allows use of this tool without modification of the source code. By including the file 'MPIDD .h' in the original source code, the client can obtain all the functionality transparently [18]. Since MPI is a widely used library, it is a useful approach. Although the main limitation of this tool is that it is MPI specific, it can also be easily adoptable to other libraries. Even though language level approach alleviates some of the complexity of concurrency, the problem is not completely eliminated. Also, to get any benefits from this approach, all the code has to be written in a particular language. These limitations prevent programmers from using other languages that might better suit their requirements. 2.1.2 Static Analysis Static program analysis techniques do not require any execution of the application as they directly work with source code. Here, we discuss three different static program analysis techniques as follows: The first technique is called RacerX [12]. It is a static tool that uses flow sensitive, interprocedural analysis to detect deadlocks. This technique checks information such as which locks protect which operations , which shared accesses are dangerous and which code contexts are multithreaded. On higher level, checking a system with RacerX involves five phases [12] (i) Retargeting RacerX to system-specific locking func- 17 tion, (ii) Obtaining a control flow graph from the checked system, (iii) Running the deadlock and race checkers over the obtained flow graph, (iv) Subsequent-processing and ranking the results, and (v) Examining the results. Out of all these phases, the first and the last phases are done by the user , RacerX does the middle three. The second technique is an effective static deadlock detection algorithm for Java, presented by Naik et al. [27]. The key idea behind this is to show the complex property of deadlock freedom for a pair of threads/locks in terms of six conditions: reachable, aliasing, escaping, parallel, non-reentrant and non-guarded [27]. Effectively approximating these conditions requires precise call-graph and pointsto information. This algorithm uses the combined call-graph and may-alias analysis for effective approximation. This combination of analysis is called k-object-sensitive analysis. It reliably approximates the first four conditions using well-known static analysis, a call-graph analysis, a may-alias analysis, a thread-escape analysis, and a may-happen analysis respectively. In order to soundly approximate the last two conditions , it requires a must-alias analysis which is much harder than a may-alias analysis. This technique addresses this difficulty using an unsound solution of using may-alias analysis to pose as a must-alias analysis. Hence, this algorithm failed to report some real deadlocks [27]. All of the deadlock detection tools helped to find deadlock in concurrent programs. The problem of detecting deadlock in libraries has not been investigated much. The third approach, developed by Williams et al. helps with that [34]. This problem is vital as library writers may wish to ensure their library is deadlock-free for any calling pattern. This method checks if it is possible to deadlock the library by randomly calling some set of its public methods. On detecting the possibility of deadlock, it provides the name of the methods and variables involved. The deadlock detector in 18 this method utilizes an interprocedural dataflow analysis. By using this method, it is possible to track possible sequences of lock acquisitions inside a Java library. The analysis is flow-sensitive and context-sensitive. At each program point, the analysis computes a symbolic state modeling execution state of the library. At the end of a method, this symbolic state serves as a method summary. The analysis is executed frequently over all methods until a fixed point is reached [34]. One of the limitations of static analysis approaches is that they examine all the possible deadlocks and often give no false negatives, but they report many false positives. For example, the static deadlock detector developed by Williams et al. reports 100,000 deadlocks in Sun 's JDK 1.4, while only 7 are real deadlocks [34] . 2.1.3 Dynamic Analysis Dynamic program analysis techniques use an execution of program to find all potential deadlocks. They only visit feasible paths and have precise views of the values of variables because they operate at runtime. Here, we discuss three different dynamic analysis techniques as follows: Agarwal and Stoller describe a runtime notion of potential deadlock in programs, written in any languages that use synchronization mechanism like locks, semaphores and condition variables [l]. They test the potential for deadlock by checking if there is any feasible permutation of the execution results in deadlock. The viability of such permutations is determined by ordering constraints amongst events in the execution. Havelund proposed the GoodLock Algorithm that detects potential deadlocks involving two threads [19]. This was later generalized to have any number of threads by 19 Bensalem et al. [6] and Agarwal et al. [2]. The algorithm presented Agarwal et al. [1] is extended in [2] to handle non block structured locking as well. The second dynamic analysis technique we discuss is DEADLOCKFUZZER, developed by Joshi, Park and Naik [21]. This technique has two stages. In the first stage, a multithreaded program is executed and observed to find potential deadlocks that could happen in some executions of the program. This phase uses an informative and a simple variant of the Goodlock algorithm, called informative Goodlock or iGoodlock [19]. iGoodlock identifies potential deadlocks even if the observed execution does not end in a deadlock state. It provides debugging information suitable for identifying the cause of deadlock. The second stage uses debugging information provided by the first stage to create a real deadlock with a high probability. In the second stage, a scheduler is favored to generate an execution that creates a real deadlock that is reported in the previous stage with high probability of occurrence. A limitation of iGoodlock is that it can give false positives because it does not consider the happensbefore relationship between lock acquisition and release into account. As a result , the user has to manually go through such potential deadlocks. This burden is removed from users by the second stage of DEADLOCKFUZZER [21]. Qi et al. presented an efficient dynamic deadlock detection tool called MulticoreSDK [29].The algorithm works in two phases: In the first phase, a reduced lock graph is constructed to identify program locations where lock operations may cause deadlocks. This phase examines the execution trace and constructs a lock graph based on the program locations of lock events. Once the lock graph is built, the algorithm finds cycles in it to compute a program locations set comprising of deadlock cycles. This set signifies program locations that may associate with deadlocks and is used in the second phase to filter deadlock free lock events. 20 In the second phase, an analysis report from the first phase is used to determine deadlocks in the lock graph with filtered locks. This phase examines the execution trace again, and constructs another lock graph based on the lock id of lock events. Particularly only lock events from the set derived from the first phase are recorded in the second lock graph. Finally cycles in this lock graph are found to determine potential deadlocks. By using two passes over the program trace and filtering out the locks that are not involved in deadlock, MulticoreSDK efficiently removes multiple lock nodes and edges which do not take part in deadlock formation [29]. Thus, it is more scalable for large real-world applications and more efficient in terms of memory utilization and time. The downsides of dynamic deadlock detection techniques are that they suffer from lack of scalability and performance problems due to t he large size of the lock graphs of large industrial strength applications and cannot handle them. Moreover, they can only find errors on a small number of execution paths as the number of feasible paths grows exponentially with the size of code. This renders the use of dynamic analysis impractical due to a lack of scalability. Another limitation of this approach is that dynamic deadlock detection has a very high computational price which causes runtime overhead. Hence, it is time consuming to run test cases, which makes this approach impossible for the programs with strict timing requirements. In theory, dynamic deadlock detection can compute arbitrarily accurate information, but in practice, they highly depend on the availability of computing resources. 21 2.1.4 Model Checking Model checking is another way to find deadlocks in an application. Model checking takes a simplified description of the code and uses techniques to explore vast state spaces efficiently as it systematically tests the code on all inputs. Because of this, this approach does not need to explore massive state spaces in hardware circuits in order to find deadlocks. It analyses the correctness of concurrent reactive systems doing verification by state-space exploration. An extension of model checking, VeriSoft, is presented by Godefroid [15]. VeriSoft directly deals with implementations of communication protocols written in programming languages such as C or C++. Thus it extends the state space exploration from modeling languages to programming languages. It is a tool to systematically explore the state space of systems composed of several concurrent processes executing arbitrary code written in full-fledged programming languages such as C or C++. The algorithm can be used for detecting deadlocks and assertion violations without incurring the risk of any incompleteness in the verification results for finite acyclic state spaces. In addition to that, VeriSoft also checks for divergence and livelocks. A divergence occurs when a process does not attempt to execute any visible operation for more than a specified amount of time. A livelock occurs when a process has no enabled transition during a sequence of more than a specified number of successive global states. In practice, the algorithm can be used for systematically and efficiently testing the correctness of any concurrent system with or without acyclic state space [15]. Unfortunately, model checking fails to scale for large systems as it requires sig- 22 nificant effort both to specify the system, and in scaling it down enough to execute in the model checking environment. Due to the huge size of current Operating Systems, model checking an entire system of the size of an Operating System is still not possible. Hence, model checking techniques are only restricted to the verification of properties of models. 2.1.5 Type and Annotation Based Techniques Type and annotation based techniques help to avoid deadlocks during coding by allowing programmers to specify a partial order among locks . Also the type checker statically ensures that the partial order among locks is maintained and well-typed programs are deadlock-free. Boyapati, Lee and Rinard presented an ownership type system for multithreaded programs that allows programmers to specify a partial order among locks [7]. It allows them to partition the locks into a fixed number of equivalence classes and specify a partial order among equivalence classes. The type checker then statically ensures that whenever more than one lock is held by a thread, the thread acquires the lock in descending order. Well-typed programs in this system are guaranteed to be deadlock-free. The type system also allows programmers to use recursive tree-based data structures to describe the partial order. This system allows changes to partial order through mutation to data structure at runtime. The basic idea for this system is that the programmers keep locking discipline in mind while writing multithreaded programs. It allows programmers to declare this locking discipline in the form of type declarations in their programs. It also statically verifies consistency of a program with its type declaration. The prototype of this 23 system is implemented in Java including threads, arrays, constructors, static fields, dynamic class loading, runtime downcasts , exceptions and interfaces. Some limitations of this approach are that it imposes the burden of annotation on programmers and scaling it to larger systems is also not feasible. 2.2 Performance Evaluation Studies of Deadlock Handling Algorithms · Deadlock is one of the key issues faced in many different fields like routing, databases, operating systems, manufacturing systems and others. Most of the research carried out in this field focuses on solving the problem of deadlock. Some of the proposed solutions have also undergone comparative performance evaluations but there has not been any research done on comprehensive performance comparisons of those solutions. Studies [8, 24, 25 , 13] and [35] , entail comparative performance analyses. Among these, [25] and [35] talk about the detection of deadlock in distributed systems, [8] and [24] provide a performance analysis for the solutions of deadlock detection techniques in Distributed Database Systems and [13] details a comparative performance analysis of a deadlock avoidance control algorithm for Flexible Manufacturing Systems. A probabilistic performance analysis of a deadlock detection algorithm in distributed systems is presented by Lee and Kim [25]. The distributed deadlock detection algorithm declares deadlock upon finding back edges in a distributed search tree constructed by the propagation of probes. The tree is built as follows: The initiator of the algorithm becomes the root of a tree. Then it sends out probes, called ASK , to all of its successors at once. If a node receives the probe for the first time, it becomes 24 the child of the sender of the probe. The probe is then further propagated until it reaches an executing node or a tree node that has already received a probe. Each tree is assigned a unique path string to represent the level of the node in the tree and to distinguish one branch from another in the tree. With the help of path strings, not only back edges are identified, but other types of edges are also found , such as cross and forward edges. The ASK probe carries the candidate victim identifier which has the lowest priority among those nodes visited. If the candidate victim is inside the detected deadlock cycle, it will receive an abort message [25]. The performance of this algorithm is evaluated on measures such as deadlock duration, number of algorithms initiated, and mean waiting time of a blocked process. Deadlock duration is the elapsed time until a deadlock is detected after it is formed. For distributed systems, a detection algorithm can be started and executed by several sites simultaneously. That may degrade performance of the system by generating too many messages. So the expected number of algorithm initiations throughout the system is also an important measure. Bukhres compared two distributed deadlock detection algorithms for distributed database systems [8]. The algorithms are from two different categories, one is centralized and the other is distributed. Centralized algorithms maintain the wait-for-graph (WFG) at the control site. Control site is designated to perform a deadlock detection activity by gathering the relevant information from all the other sites. At this site the graph is updated and the search for the cycle involving deadlock is carried out. In distributed algorithms, the WFG is maintained at a different site in the system. A cycle may be composed of different transactions at different sites, and several sites participate in deadlock detection. Centralized algorithms have poor reliability compared to distributed algorithms 25 because the whole system fails if the control site fails in the centralized system. However, distributed algorithms are difficult to design and implement as detection of deadlock can be initiated from any site. Simulation studies presented by Bukhres show that the centralized algorithms perform a little better than the distributed algorithms under a lightly-loaded condition. Whereas under a heavily-loaded condition, the distributed algorithms perform better than the centralized algorithms [8]. The measures taken in to consideration to compare performances are throughput, restart rate, deadlock life time and the effect of request pattern against throughput and number of messages. The conclusion for this study is that the choice of the best deadlock detection algorithm is dependent upon the operating region of the system. The problem of evaluating and comparing the performance of deadlock avoidance control policies applied to Flexible Manufacturing Systems (FMS) is addressed by Ferrarini [13]. The problem is discussed with both timed and untimed models. For both models the problem is considered with and without deadlock avoidance control policies. Some of the key metrics measured are deadlock percentage, frequency of occurrence, deadlock rejection percentage, unreachable stage percentage, product survival rate and blocking capacity [13]. From the above discussion , we observe that there is no simulation tool available to study and compare performance of different deadlock handling approaches in multicore systems. Moreover, the studies mentioned above are very old and not available to further extend their work on. Hence, we believe that our simulation testbed will be really useful to study and compare performance of different deadlock handling approaches in multicore systems. 26 2.3 Summary In this chapter, various approaches that have been developed to handle deadlock in multicore systems were discussed. Then, we presented a brief survey on different performance evaluation studies done on deadlock in distributed systems, distributed database systems, and a flexible manufacturing. In the next chapter , we discuss implementation detail of simulation testbed to study deadlock handling approaches in multicore systems. 27 Chapter 3 Simulation Framework for Deadlock in Multicore Systems This chapter describes the current implementation of simulation framework for deadlock handling algorithms in multicore systems. The implementation is done by using Discrete Event Simulation (DES). Before describing the actual framework , DES is briefly explained. 3.1 Simulation A computer simulation is a technique that attempts to model and observe the behavior of an abstract module of a particular system. A simulator is a computer program to transform the states of a system in discrete time points over a specific period of time. Simulation is a very effective tool to study the dynamic behavior of complex systems. Computer Simulations can be classified in two types , either discrete or continuous , based on how the system state is modeled and simulated. In the continuous simulation, the state variables of the system change continuously over time while in discrete simulation the state variables change only at discrete times. Based on the advancement of simulation time and the updates of the system state, discrete simulation can 28 be further divided into two types: time-stepped and event-driven. As names suggest, in time-stepped simulation the system state is updated at every time step while in event-driven simulation, the states of the system are updated at the occurrence of events. Ozgun and Barlas presented a comparison between approaches of discrete event simulation and continuous simulation on a simple queuing system [28]. Discrete event simulation comprises an event list, a simulation clock and an event scheduler. The simulator maintains a queue of events sorted according to the occurrence of their simulation time. The simulation time is advanced to the time of occurrence of the next event in the event list by a Simulation clock. The system states are changed by each execution of events, which is done by an event scheduler. For example, in simulating the behavior of university students, the number of students enrolled and the number of students graduated are state variables and they will be updated on the occurrence of the events in the system. The number of students enrolled will be updated at the start of 8rach semester and the number of students graduated will be updated when the students graduate at the end of academic year. Simulation ends at the time when the event list is empty or simulation time is up. The advancement of a simulation time can be the same, faster or slower than real-time as it is not important to execute simulation in real-time. For example, in the university example, the simulation does not have to wait for an entire year length and it can be faster than real-time. Using the example of a driving test, simulation can be exhibited in real-time speed. On the other hand, protein synthesis in a cell can be simulated slower than real-time. We mainly use discrete event simulation to implement different components of our framework. Next, we move onto a discussion of the building processes of our frame- 29 work. As we expanded the multicore scheduling framework , developed by Manickam under the guidance of Aravind, to simulate and study deadlocks. We first explain the architecture of the Multicore Scheduler System framework. 3.2 Architecture of Multicore Scheduler System Framework To observe deadlock in multicore systems, first , we need a framework that simulates the behavior of multicore systems. For that purpose, we are using an architecture developed by Aravind and Manickam [3]. We first briefly explain the architecture of the multicore scheduling simulation (MSS) framework. Later, we explain the expansion to this framework , that is , what are the added components and how they are inserted into MSS. Although the MSS framework consists of different scheduling algorithms, we are using only Linux Completely Fair Scheduling in our system. 30 Workload Generator Scheduler Chip 1 ~ - - - - _ _ _ _ . ~ . . _··,,.,.... .,,,., Machine ,.~·-- - - - Cl:\lp 2 - - - - ... y Chip n - -........ - -- - - - - , ~E:> ~~ C3E:> Performance Calculation Engine Execution Trace Activity Profile Generator Figure 3. 1: Architecture of MSS Framework Source: A Flexible Simulation Framework for Multicore Schedulers [3] The higher level architecture of MSS is shown in Figure 3.1. There are five key components in it: workload generator, machine, scheduler, execution trace and performance calculation engine. We explain t hem next . 3.2.1 Workload Generator The workload generator is implemented to generate t hreads. We have to give t he number of t hreads, t hread 's mean arrival rate, and thread 's mean service rate as input to generate a workload. In t he out put, each thread will have a unique id, 31 arrival time, execution time, I/0 points, priority and application class. The times when a thread will go for I/Os are determined by its I/0 points. For our system, we added three scenarios to workload generator: (i) General scenario , (ii) Scenario for Dining Philosopher's problem, and (iii) Scenario for t he Banker's algorithm. We also added more numerical inputs for resources and instances per resource. 3.2.2 Machine All the computations are done in the machine which is also called a multicore machine. The machine has a hierarchical structure with the machine at the highest level, followed by chips and cores. One or more chips can fit on a machine and two or more cores can be in each chip. In the simulation context, each core is capable of executing one thread at a time. Cache memory is also an important part of machine, which is also structured in a hierarchical manner. Current multicore system have three - 11, 12 and 13 levels of cache memories. 11 cache is a core level local cache, 12 cache is shared among the cores in a chip and all the cores in the machine share the 13 cache. 3.2.3 Scheduler The implemented framework for a scheduler has two tasks: maintaining the load among the cores and multitasking threads in each core. According to their functions , the first task is referred to as load balancing and the second task as load scheduling, hence the components in t he implementation are called load balancer and load scheduler. The load balancer is responsible for dispatching the new jobs to the appropriate local scheduler, and for migrating jobs from one to another local scheduler. The task of local scheduler involves scheduling jobs to the cores for execution. The simulation 32 gives the local scheduler centralized control. From the simulation point of view, the tasks of the local scheduler are to make a decision to choose a job for execution, to determine the amount of execution time, to calculate the progress rate and to produce the trace. The progress rate of is a crucial design factor as it can also affect the accuracy of execution. The progress rate of a job is dependent on factors such as execution speed of the core and the contention for shared resources. The implementation uses a simple cache contention model which can easily be replaced with the implementation of a more refined model. 3.2.4 Execution Trace and I/0 Trace During the simulation, two types of traces are recorded: the execution trace and the I/ 0 trace. In order to generate an activity profile and to calculate performance metrics , the execution trace is collected at every context switch. The format of the execution trace is a quintuplet: . The status can have values between 0-3 where status is O if the thread is preempted by quanta expiration, 1 if by a thread with a higher priority, 2 if the thread is going for 1/0 and 3 if completed. The format of 1/0 thread is a triple: . The performance calculation engine calculates the values of criteria such as response time, fairness and utilization of resources for a given set of data, and the result can be passed to the performance observation window. The users can then study results from the performance observation window. 33 3.2.5 P erformance Calculation Engine The performance study mainly focuses on determining how well the algorithm attempts to fulfil criteria such as response time fairness and utilization of resources. Users can study algorithms based on these criteria, calculated and passed to the performance observation windows by the performance calculation engine. The performance criteria involve wide range of metrics such as interactive response time, CP U utilization, throughput, turnaround time , waiting time and response time. Next , we will discuss the actual simulation and architecture aspects of Deadlock in Multicore Systems. 3.3 Simulation of Deadlock in Multicore Systems The basic deadlock environment involves processes and resources in it. Resources can have one or more instances and processes can request up to a maximum number of instances of resources available. If the requested resources are available, they could be granted access to the process that requested them otherwise that process can wait until the resources are available, meanwhile holding on to current resources. The simulation of deadlock in multicore systems has three main components: entities, events, and states, in addition to the components of the Multicore Scheduler Simulation framework (an event list, a simulation clock and an event scheduler) described in the previous sections. The event scheduler manages all of the events from an event list according to the simulation clock. Each event from an event list occurs at a particular instance of time and changes the state of the entity. Here we consider 34 a system as an entity as well. For our system, entities, states, and events are shown in Table 3.1. We define these terminologies first, and then we discuss the simulation aspect of the system. Process Resource System Waiting Running Blocked Completed Acquired Free Deadlocked Deadlock Free Request Acquire Release Going for I/ 0 Table 3.1: Entities, States and Events • Processes: Processes are the main entities in the system. A process requires resources while executing in its critical section. A critical section is a part of the executing process which uses shared resources exclusively. A process can have different states such as running, waiting, blocked or completed. Due to their concurrent access to finite resources, deadlock could occur in the system. • Resources: Resource can be virtual entities such as memory, data structures and CPU time or physical entities such as printers, scanners and other I/ 0 devices. A resource can have one or more units of the same type, which are called instances of a resource. Different instances of a resource can be shared among different processes but an instance of a resource can be held by only one process at a time. A resource can be either free or held by a process. • System: A system consist of all the processes, resources and their instances and techniques to handle deadlock. 35 • Deadlock[4] : "Deadlock is a sit uation in a resource allocation system in which two or more processes are in a simultaneous wait state, each one wait ing for one of t he others to release a resource before it can proceed.' The main events are request, acquire and release of resources by processes. These events change states of t he system as eit her deadlocked or deadlock free. For processes t he state changes affiliated wit h t hese events are waiting, running, blocked or completed. If a process acquires all t he requested resources t hen it is in t he running state, if t he process can not acquire all t he requested resources due to unavailability of t hem, t hen t he process is in a waiting state, holding its current resources. Though 1/ 0 is a physical type of resources t hat can be accessed by processes, we consider 1/ 0 as a special case of resources, hence events associated wit h processes accessing 1/ 0 are: going for and coming back to a waiting state from 1/ 0 . When a process goes to 1/ 0 t he state is changed to blocked and when all t he requests for a process are satisfi ed, its state is changed to completed. The simulation of t he system basically includes updating t he state of the ent it ies at every simulation point as well as increasing t he simulation clock. It involves various interactions between t hese entities and occurrence t he of deadlocks due to t hat. Wit h t he help of t hese components , various scenarios and solutions of deadlock can be simulated using DES. The simulation records t he execut ions as a simulation t race, and t hey are recorded at every occurrence of an event. Wit h t his background we furt her int roduce t he architecture of deadlock in multicore system. 36 3. 4 Architecture of Simulation Framework for Deadlock in Multicore Systems To simulate a basic deadlock prone environment, we need resources, processes that can request these resources and a system (Resource Manager) to manage the resources for the requests made by processes. From the point of view of the system, processes communicate with each other through shared memory using read and write operations in a multicore system. During execution, processes require resources. A process requests resources, acquire them if granted , access and then release after using them. As explained above, the MSS Framework has threads, machine and scheduler. The functions of processes and threads are the same, so hereafter we use the term "process" in general. In order to simulate deadlock on top of the MSS framework , we added extra components such as resources, a Resource Manager, and the ability for the processes to request resources as well as to communicate with each other. Alongside these additional components, architectural level changes are also required for some existing components to fulfil the criteria of simulating deadlock. Later, in this chapter, we explain in detail the components related to the implementation of deadlock scenarios and viable handling techniques. The higher level architecture of the Multicore Scheduler Simulation (MSS) framework has five main logical components as shown in Figure 3.1: workload generator, multicore machine, multicore scheduler, execution trace and performance calculation engine. Out of those, the core simulation engine involves three basic components as shown in Figure 3.2: workload generator, multicore scheduler and multicore machine. 37 Workload Generator Machine Pl P2 • I I I I I I I I I Pi I I I , ______ I I Scheduler Chip 1 Chip2 Chip n Cort 1 Cort l Cort 1 Cort 2 Ccrt2 Cort 2 Cor, 3 Cor, 3 Coff 3 Figure 3.2 : Core Architecture of MSS Framework As discussed above, we need resources t hat can be accessed by processes t o simulat e a system compatible for deadlock. In order to manage t hese resources, we need a module, what we refer to as t he Resource Manager (RM). The above mentioned system can be port rayed as shown in Figure 3.3. A det ailed explanation of RM and its implementation details are as follows: Resource Manager: For t he smooth and fl awless execut ion of system , multiple requests by processes to access different resources have to be managed well . Improper management of resources may lead t he syst em to an unwanted situation called deadlock. To prevent t he syst em fr om deadlock or t o avoid this situation of deadlock in t he syst em , grant ing resources to mult iple processes according to t heir requests and availability of resources is managed by a Resource Manager (RM). 38 D Rl ( ~ - ---. Pl P2 • ,?~~ lJf.'s . '-. t a. / Req , ::_est ~ I I Resource l Deadlock handler I \\ ., .... • ve Q;,.e,O: Manager I , ___.,,, Pi \ I I J ·[:l Rj Figure 3.3: Basic System Requirements for Deadlock The notations of Figure 3.3 are same as Figure L 1 wit h additional notation of small, dark rectangles inside resources that represent instances of t hat resource. The basic function of RM is to manage resources - grant resources to t he processes based on availability of resources and according t o t he requests they make in a way t hat t he system does not go into a deadlock state. If RM functions properly, deadlock can be avoided but improper functioning of RM can lead t he system towards a deadlock sit uation. The implementation of RM depends on what strategy needs t o be implemented to handle deadlock. For deadlock prevention and deadlock avoidance, the logic to handle deadlock can be incorporated into RM. On the other hand, deadlock detection and resolution may require a separate implementation than RM 39 to handle deadlock. l J / . Rl Workload Generator Machine P1 , P2 l I Chip Chip Cae Core (o•e Scheduler Core COJe \/l:J ,--.., I Q Resoorce Manager \ I I Deadlock handler - J Rj Pi , _____ I ,; Figure 3.4: Architecture of Simulation Framework for Deadlock in Multicore Systems The resultant architecture after incorporation of resources and Resource Manager to the core MSS framework of Figure 3.2 is shown in Figure 3.4., which has components from both Figure 3.2 and Figure 3.3. Now, we discuss the additional and modified components of the architecture of Deadlock in Multicore Systems in detail. 3.4.1 Workload Generator A workload generator is mainly used to generate processes and resources while setting parameters that are necessary for simulation. We can generate three types of 40 simulation scenarios using the workload generator: (i) General scenario (ii) Scenario for Dining Philosopher 's problem (iii) Scenario for Banker 's algorithm. Based on the selected scenario, criteria such as limit ing resource counts t o equal as process counts, instance count per resource, and runtime are set for the given workload . Requests pattern by processes also depends on t he scenario type. These scenarios are simulat ed on Linux Completely Fair Scheduler. • General Scenario: To generate a general scenario workload , required input paramet ers are: t he number of processes, t he number of resources, instances per resource type, mean arrival rate and t ask period range. Also, there are no restrictions on input values of any parameter. The generat ed workload will have a set of processes and resources . Each process will have a unique id, arrival time, execut ion t ime, 1/ 0 points, priority, application class, request counts and other parameters required for simulation. Likewise, each resource will have a unique id , inst ance count and various other parameters. All the paramet ers that do not require manual input are generated randomly, set t ing an upper bound on them through the workload generator. • Scenario for the Dining Philosopher's problem: This scenario implements t he Dining Philosopher 's problem , which can be considered a special case of general scenario. Dining philosopher 's problem was introduced by Dijkstra [10]. Five philosophers [Pl ,... ,P 5] sitting around a table as shown in Figure 3.5. The table has a large serving bowl of spaghetti, one small bowl for each philosopher and five forks. A philosopher wishing to eat can t ake and eat some only if he or she has two forks on eit her side of t he bowl. The problem st ates that , considering the above scenario, devise a solut ion that will allow the philosophers to eat by satisfying mutual exclusion while avoiding deadlo ck and st arvation. 41 I • • Figure 3.5: Dining Philosopher 's Scenario Here, we limit ourselves only until simulating t he scenario and not t he solution. Any solution that is implemented for a general scenario can also be used to solve the dining philosopher 's problem as it is a special case of general scenario. To simulate this, we put a limit on the number of resources, that is, the number of resources have to be same as the number of processes. Also, we limit the instances per resource type to one. We also have to input runtime beforehand to specify how long the simulation should run . Further requirements to simulate a scenario for the Dining Philosopher 's problem include limiting each process to request only its neighbor resources and limiting the maximum count for requests per process to two. • Scenario for Banker's Algorithm: This scenario requires the same input 42 parameters as t he general scenario but parameters such as maximum need for each resource are generated differently than a general scenario to satisfy the requirements of the Banker 's algorithm . A Banker 's algorit hm requires advanced information about processes to execute such as the maximum claimed instances of resources by each process. This information is generated with the workload. The generated workload will have a set of processes and resources. Each process will have the usual parameters set such as a unique id , arrival time, execution time, 1/ 0 points, priority, application class , request counts and other parameters required for simulation, with additional parameters such as maximum claim and need . The resources have t he same parameters generated as a general scenario with t he workload. There is one more a notable difference between all the scenarios in execution. The general scenario and t he scenario for t he Banker 's algorit hm run until all the processes finish their executions. That is, all of the requests are satisfied for all of the processes. For the Dining Philosopher 's problem, execut ion runs until t he runtime is reached. 3.4.2 Scheduler and Machine The implementation of a scheduler is not primary goal for our system. Hence we have used Linux Completely Fair Scheduler, implemented by Manickam V. [3]. Simulation of a machine simply refl ects the simulation of t he multicore system in it, t herefore we have used that component with our system . 43 3.4.3 Resource Manager As explained earlier , m anagement of resources can b e done in this part of the system so we call it Resource Manager (as shown in Figure 3.4). Solution strategies for deadlocks can also be implemented here. There are three types of solutions to a deadlock problem: deadlock prevention , deadlock avoidance or deadlock detection and resolution. Deadlock prevention works by preventing one of the four Coffman conditions [9] from occurring in the system , as discussed in Chapter 1. D eadlock avoidance makes sure that for every resource request, the system does not enter into an unsafe state and grants the requests that will lead to safe states. An unsafe state is a state that could result in deadlo ck. Deadlock detection and recovery allows deadlocks to occur, on occurrence of deadlock the state of the system is examined and the deadlo ck is resolved either by termination of one or more processes or by preempting resources. We have implemented a deadlock avoidance technique, a deadlock detection technique, and deadlock recovery techniques in our system . The avoidance technique is the Banker 's algorithm and the implem ented det ection technique is Dreadlocks [23], whereas we use different criteria to select and terminate one or more processes involved in deadlock. 3.5 Algorithms Implemented In this section, we explain the Banker 's algorithm and Dreadlocks in detail. The banker 's algorithm is deadlock detection technique and Dreadlocks is used for deadlock detections. 44 3.5 .1 D eadlock Avoidance : Banker's Algorithm The Banker's deadlock avoidance algorit hm was developed by Dijkstra in mid-1 960s [11]. This algorit hm requires all processes to declare t heir maximum requirements for all t he resources at t he start of execut ion. A process need not request all of its required resources at t he beginning as t he requests are made sequentially. Once a process requests t he maximum declared resources, it will acquire t hem for a finite t ime and t hen release t hem. Now, t hese released resources are also available for allocation to other processes. At every new request, t he algorit hm checks whether t he system stays in a safe state or not . A safe state of a system is when all t he processes can complete t heir execution wit hout forming a deadlock [17] . It is determined by simulating t he allocation of maximum possible instances of all resources to a process and t hen by finding a sequence of such processes t hat can finish t heir execut ion wit hout forming a deadlock. A request is granted if on allowing t he request, t he system stays in a safe state, otherwise t he request is denied. Furt her , t he system also determines a dynamic order of processes to execute in order to avoid deadlock. 45 Table 3.2: System in a Safe State (b) Alloc[ i, j] (a) Max[i,j] Rl R2 R3 PO 0 1 0 2 Pl 3 0 2 0 2 P2 3 0 2 2 2 2 P3 2 1 1 4 3 3 P4 0 0 2 Rl R2 R3 PO 7 5 3 Pl 3 2 P2 9 P3 P4 (c) A vail [j] Rl R2 R3 2 3 0 Now we explain the Banker 's algorithm with an example. Consider a resource allocation system with five processes PO , Pl , P2 , P3 and P4 and three resources Rl , R2 and R3. The maximum need for resources j for processes i is represented by Max[i,j]. Alloc[i,j] represents the current allocation of resource j to process i and Avail[j] represents currently available instances of resource j. The current safe state of the system is shown in Table 3.2 for (a) M ax[i, j], (b) Alloc[i, jJ and (c) Avail[j]. The system is in safe state because all the processes can finish their execution in order of Pl , P3 , P4 , PO and P2. Once completed , Pl can release the acquired resources. These resources are then added to the available resources, which in turn can be used by process P3 . In the same way processes P4, PO and P2 can finish their executions. 46 Table 3.3: System in An Unsafe State (b) A lloc[i, j] (a) Max[i , j] Rl R2 R3 Rl R2 R3 PO 7 5 3 PO 0 3 0 Pl 3 2 2 Pl 3 0 2 P2 9 0 2 P2 3 0 2 P3 2 2 2 P3 2 1 1 P4 4 3 3 P4 0 0 2 (c) A vai l [j ] Rl R2 R3 2 1 0 Now, suppose t he system grants request (0, 2, 0) of process PO . The resultant st at e of t he syst em is shown in Table 3.3. This is an unsafe st at e as t here is no such sequence available for processes t hat can finish t heir execut ions. In t his case t he request (0, 2, 0) of Pl is denied by t he Banker 's Algorit hm . The downside of t he Banker 's algorit hm is t hat it is quite expensive as it may take O (n 2 m) time for each execut ion , where n is number of processes and m is number of resources [17]. It also needs some information in advance such as t he maximum number of required resources. In most of t he current systems, t his information is not available. 47 3.5 .2 D eadlock D et ection: Dreadlocks Dreadlocks is a deadlock detection t echnique for shared memory multiprocessors [23]. In Dreadlocks, each process maintains a digest of t he waits-for graphs. A digest of P rocess P , denoted TJp , is t he set of other processes upon which P is wait ing, directly or indirectly. If a process is not t rying to acquire any resources t hen t he digest of t hat process contains only itself. If a process is t rying to acquire a resource, t hen t he digest of t hat process includes itself as well as t he digest of t he processes t hat are current ly holding t he requested resources. Changes to t his digest of waitsfor graphs are propagated as processes acquire and release resources, and a process detects deadlock when it finds own appearance t he digest. Due to t his approach , a probing mechanism and t he maintenance of explicit waits-for graphs can be avoided. To explain t his wit h an example, consider Figure 3.6. ,' ,' ,, , ,, I ,, I I I I Figure 3.6: Example of Dreadlocks The not ations for Figure 3.6 are t he same as Figure 3.3 and Figure 1.1 , wit h addition of a dashed arrow, which indicates t hat a process is t rying to acquire a resource. Process Pl is requesting for resource Rl , held by process P3, that is requesting re- 48 sources R3 which is held by process P4. Also, process P2 is requesting resource R2 , held by process P3 that is requesting resources R3 which is held by process P4. In this case, digests for these processes are as follows: V PI = {Pl ,P3,P4} V p2 = {P2,P3,P4} V p3 = {P3,P4} V p4 = {P4} In the next state, the dotted line, from process P4 to resource R2 , indicates t hat P4 is trying to acquire R2. But according to the algorithm, P4 finds itself in the digest of P3 which is current ly holding R2 , detects a deadlo ck and aborts. Once a deadlock is detected , the next step is to resolve the deadlock. We used process termination techniques to resolve the deadlock situation . This procedure is explained next . 3.5.3 Deadlock Recovery: Process Termination Deadlock recovery can be achieved by aborting one or more processes involved in the deadlock. To select these processes, we have certain criteria described as follows: 49 Process Termination Criteria Once the existence of the deadlock is detected by Dreadlocks, victim selection is required to select one or more processes to terminate. We select either of the two processes - the process that is requesting resources or t he process whose digest contains the requesting process, to be terminated in order to resolve deadlock. The criteria to select which process to terminate are as follows: Random: The selection criteria 1s random. Any process can selected to be aborted randomly. 11 First In First Out: The process which entered the system the first is selected to be terminated. iii Least Recently Used: The process that has utilized t he system the last is selected to be terminated. lV Minimum Runtime: The process that has spent the least t ime in the system is selected to be aborted. v Maximum R untime: The process that has spent the most time in t he system is selected to be aborted . v1 Smallest Digest: The process with the smallest digest is selected for termination. vii Largest Digest: The process with the largest digest is selected for termination. User can enable or disable option for the algorithm to detect the deadlo ck. Process termination option is available only if deadlock detection is enabled for the particular simulation run. If users select scenario for t he Banker 's algorit hm then the detection 50 and termination opt ions are not made visible. ext , we show how to operate t he system by showing and explaining t he user interface of it. 3. 6 User Interface Our simulat or has two main windows: a P erformance P aramet er Setting Window and a Performance Observation Window as shown in Figure 3.7. I Perfom1¥1te Paramettar Seltng Window' / Figure 3. 7: Main Window of t he Simulator A user can click on eit her t he 'P erformance P arameter Setting Window' or 'P erform ance Observation Window '. Each of t he windows has different panels in t hem , to set different parameters for simulation. We now explain t hese windows. 51 Workload generator Deadlock Setoarlo Type 1ceneral 0 Hcloc:k 0is11. •I Worldoads Number of process Number or RUOUl"C8 Add to Load » l Number of Instances per rasource Arrival time distribution Aniv;;u rate Task period range --, "" Figure 3.8: Workload Paramet er Setting Window 3.6.1 P erformance Parameter Setting Window The Performance P arameter Setting Window has t hree panels in it. One panel is used to input paramet ers t hat are required to generat e workloads. The second panel is used to set configuration of multicore machine and deadlock detection algorit hm parameters . The t hird panel is used to start a simulation after all t he required parameters are set. The workload generator panel shown in Figure 3.8 takes t he scenario type, t he number of processes , number of resources, instances per resource type, arrival type, arrival rate and t he task period range as input . Users can also create more t han one workload at t he same time. The input parameters depend on t he selected scenario type, for example, in a general deadlock scenario and a scenario involving t he Banker 's 52 Performance Parameter Sdlng Window Configuration NumMrolcores Sch~ulln1Pollcy 4 =w~°""~'·"'~-~ WoridOld 3~· Process Termination T)'J)e ._ Ran_oo_m_ _~ ~ Enable Drudlock I l._ uo_wc_2.6_2._6 _ _ _ Select All WorklOlds Add to simulation run .&. Tetminaon Type WorkloadO Random Figure 3.9: Configuration P aramet er Setting Window algorit hm, users have t o enter a value for inst ances per resource count, and t here is no restriction for t he number of processes or resources . On t he other hand , for a scenario involving t he Dining Philosopher 's problem, users have to enter runt ime instead of instances per resource count. Also, t he number of processes and resources have to be t he same to satisfy t he scenario criteria. After entering t he above ment ioned parameters for each workload, users have to click on 'Add to Load ' button . The added workload can be seen on t he right side of t he panel. If more t han one workload is added , t he addit ional workload is appended to t he existing list of workloads. After all t he workloads are added , by clicking on t he ' Next ' button at t he bottom of t he panel, users can go to t he configuration panel. The configuration panel in t he P erformance P arameter Setting Window is shown in Figure 3.9. This panel is used to configure t he mult icore machine and deadlock 53 [ Pefformanc:1 Patameler Setting Window Simulating ... Slart Slmulation Slmualtlon Run Comp11t1d Plnse dick on "Pe1fomanu Obstrvdon Window' to obuM lh1 slmulation performanct Figure 3. 10: Simulation Run Window algorit hm to create simulation runs. Several simulation runs can be configured using t his panel. The parameters required to configure the machine include setting the number of cores for each workload. To set parameters for the deadlock algorithm, users can either enable or disable Dreadlocks' deadlock detection using different workloads. Furthermore, if Dreadlocks is enabled , users can set a criteria to select processes for termination from t hose involved in t he deadlock. These options are not visible if users have selected the Banker 's algorithm scenario from the Workload Generator window. After entering all t he parameters , by clicking t he 'Add to simulation run ' button, the simulation run is added using the selected workload. If t he 'Select All Workloads' option is enabled , mult iple simulation runs are added using all the workloads. Once all the simulation runs are added , pressing t he ' ext ' button at t he bottom of the panel will bring t he next panel to start the simulation . 54 The t hird panel is shown in Figure 3.10. By clicking on t he 'Start Simulation ' button , users can start simulation for all t he simulation runs t hat are added in t he previous panel. 3.6.2 Performance Observation Window Using t he Performance Observation Window, as shown in Figure 3.11 , various perform ance metrics can be represented in charts. To select a performance metric, the Chart Type drop-down menu can be used. Pe1tormant1 Obsenoat1D11 WindoW Charts Choose Chart typt Arwyse JM!rformance •mist Arrival Rate Show Figure 3.11 : Performance Observation Window Also, t he results can be analyzed wit h respect to t he arrival rate or number of cores, 55 depending on t he different parameters selected while creating workloads or configuring simulation runs. After selecting t he appropriate choices, pressing the 'Show' button will display a results chart in the chart display area. 3.7 Summary First , we explained the core components of the framework for simulation of the multicore scheduling algorithms, developed by Manickam and Aravind. We, then presented an expanded MSS fr amework for t he simulation of deadlock handling algorithms in mult icore systems. The fr amework is flexible and compet ent to incorporate different deadlock handling algorit hms for multicore systems in it . In the next chapter, we discuss traces generated during execution and the computation of performance metrics fr om traces. 56 Chapter 4 Execution Trace and Performance Evaluation The previous chapter explained the entire procedure from operating t he framework for deadlock in multicore systems to starting the simulation. During the simulation, we record different types of traces for t he purposes of performance evaluation. Here, we discuss traces and performance evaluation in detail. 4 .1 Traces There are four types of traces in our system. They are execution trace, I/ 0 trace, request trace and deadlock (DL) trace. Execution trace has format of a quintuplet: . The status can have a value between 0-3 . Value O means that t hread is preempted by quanta expiration. Value 1 means the thread is preempted by other higher priority t hread . The value of status as 2 indicates that the thread is going for I/ 0 and 3 means that the thread is completed. The I/ 0 trace has formate of a triplet: . The execution trace and I/ 0 trace are retained same as described in the fr amework of the MSS framework. This was done in order to 57 preserve multicore propert ies and to calculate perform ance metrics t hat are relat ed t o mult icore systems but are affected by deadlock algorit hms, such as Turnaround t ime and Throughput. Request Trace: The format of the request t race is . The request id is denoted by rid, the time a request started to wait is denoted by req-wait , t he time a request is granted is denoted by req-grant and t he t ime when a request ends or a process releases occupied resources is denoted by req-release. The request t race helps in observing t he performance metrics of t he whole system which includes t he effects of t he additional components such as Resource Manager and resources as well as t he implemented deadlock algorit hm . The deadlock trace part icularly helps in computing performance metrics pertaining to implemented deadlock algorit hms. From t his trace type we can calculate performance metrics such as request wait t imes which are event ually used in comput ing t he total wait time for a process or a system. Deadlock Trace: The deadlock (DL) trace has a quint uple form at as shown in Figure 4.1. This includes t he process id (pld), deadlock id (dld), deadlock start t ime (DLStart) , deadlock end t ime (DLEnd) and status (St atus) . The process id refers to t he id of t he process t hat is involved in t he deadlock. The deadlock id is set to distinguish different deadlocks, t hat occur in t he system . The deadlock start t ime and deadlock end t ime correspond to t he t ime a deadlock is started and t he time a deadlock ends respectively. The deadlock status can have values eit her O or 1. If t he involved process is selected to abort , t hen t his value is set to 1. All other processes involved in deadlock will have t his value set to 0. DL trace helps us in calculating very important performance metrics directly 58 <27 , 2, 86 , 109 , 0> <34, 4, 258 ,475 , 1> <100 , 9, 2486 , 6971 , O> Figure 4.1: Sample DL Trace relat ed to the implemented deadlock algorithm such as Deadlock count , Duration of Deadlock etc.. In the next section, we explain the calculation of these performance metrics in detail. 4.2 Performance Evaluation The performance metrics related to deadlocks are mainly aimed at measuring the cost (computation and communication) of deadlocks and the techniques to handle them. A deadlock handling technique could be avoidance, prevention or detection followed by recovery. The cost of deadlock detection and recovery depends on the kind of strategy used . We compute p erformance metrics mainly from two persp ectives: (i) From the perspective of an individual process; and (ii) From the perspective of the system. First we will review the metrics used in the distributed syst ems context , where processes communicate through messages [8 , 13, 24, 25 , 33, 35]. 59 Prevention of deadlocks is the most desirable option. However, due to the complexity involved, distributed deadlock prevention methods are rarely considered and studied for distributed systems. Maintaining up to date information is almost impossible, due to the absence of a global clock and unpredictable message delays and/or losses. Even with centralized approaches to maintaining the global state with reasonable consistency, the message cost would be prohibitively high , and on most systems such message traffic could easily choke their underlying communication networks. Therefore, prevention is not a popular technique in distributed systems. In any case, the most interesting performance metrics for prevention are t he number of times deadlock is predicted and avoided , and t he overhead of the algorithm . Next, we look at the metrics suitable for deadlock detection , and resolution techniques in distributed systems. From a process perspective, t he key metrics are: wait time during deadlo ck, turnaround time (also sometimes referred to as response time), number of times in- volved in deadlocks , and number of restarts due to deadlocks. System level per- formance metrics vary depending on t he type of technique employed , whether it is centralized or distributed. For centralized techniques, the metrics are: number of deadlocks that occurred in a specified period of t ime 1 , number of processes involved in a deadlock, number of processes needed to be aborted to resolve a deadlock, duration of a deadlock, number of processes completed in a specified period of time, overhead of deadlock processing, and number of messages used to handle a deadlock. In a distributed case, more t han one process could initiate deadlock detection. Therefore, in addition to the above metrics, the number of deadlock detection invocations could be a performance metric when a distributed deadlock handling technique is used. Based on these metrics, several secondary metrics such as the abort/ restart ratio, deadlock 1 Here the specified period is normally t he simulation t ime. 60 overhead (time to handle deadlock), percentage of processes involved in deadlocks, etc . are also used. Also, for several metrics, minimum, maximum, average, and variance would be desirable. In distributed systems , due to message delay and/or loss, and the absence of global knowledge, several kinds of deadlocks such as phantom deadlocks, transient deadlocks, and true deadlocks are possible. These complications are not present in a shared memory system. Multicore systems are typically shared memory systems and therefore deadlock handling in a mult icore system does not involve messages. Therefore, counting the number of messages involved in handling a deadlock is not applicable here. Also , we don 't deal with t he possibility of phantom or transient deadlocks. A deadlock handling technique in multicore systems could be centralized or dist ributed. For a centralized implementation, one dedicated process could be assigned to do the job, or the responsibility could be rotated among several processes, but one process is responsible at any particular t ime. As indicated earlier, in distributed deadlock handling, several processes could initiate deadlock detection independently. If more than one process initiates a deadlock check, then for simplicity and effi ciency, one must be elected to do the job. So, in a distributed case, in addition, the cost of an election could be a performance metric. When to initiate a deadlock check also influences the performance. 61 4.3 List of Performance Metrics We now summarize the key p erformance m etrics of deadlock handling techniques for multicore systems. We assume t hat the deadlock handling technique maintains process states ( executing, waiting for a resource), resource status (free, used by a process), timestamp of key events (arrival, start of waits, exit , deadlock resolved (prevented ), etc.). We refer to t he timestamp function as ts. 4.3.1 P erformance M etrics ( for Process i) 1. Wait time (W _ T imei( R )): The duration between t he t imes i started waiting for resource(s) R and t he t ime t he resources R is/are granted access to i. That is, W _ T imei( R ) = ts(R granted access to i) - t s(i' s request for R ). 2. Deadlock wait tim e (DW_ T imei(dk)): The duration between the times i started waiting due to its involvement in the deadlock dk and it is resolved . That is, DW _ Timei(dk) = ts(dk resolved)- ts(i joined dk)3. Turnaround time/response time (T R_ T imei): The duration between the t imes i entered and exited t he system. That is, T R_ T imei = ts(i' s exit) - ts(i' s ent ry). 4. Deadlock count (D_CountJ Number of times process i was involved in deadlock. 5. Abort/res tarts count (AR_CountJ Number of times i is ab orted to resolve deadlock. 62 4.3.2 P erformance M etrics (for Syst em) 1. Deadlock count (D_Count): The number of deadlo cks occurred in the system in a specified period of time. 2. System wait time (SW _ T ime): The t ime processes spend to wait for the requests to be granted and the deadlock dk to be resolved. That is, SW _ T ime = W _ T imei( R ) + DW _ Timei(dk) , for all i . 3. Deadlock encounter count (DP_Count): The number of times deadlocks are predicted in the system in a specified period of time. 4. Deadlock size (D_Size(dk)): The number of processes involved in the deadlock dk. 5. Deadlock strength (D_Strength(dk)): The number of processes needed to be aborted to resolve the deadlock dk . 6. Degree of Deadlock (DD (dk)): Degree of deadlock represents the complexity of the deadlock, and we define it as the product of size and strength. That is, DD (dk) = D _ Size(dk) * D _ Strength(dk)- 7. Duration of deadlock (D_ D urat ion(dk)): The duration between t he t imes the deadlock dk is formed and resolved . That is, D _ Duration(dk) = ts(dk resolved) - (dk formed). The value of ts(dk resolved) is easy to obtain, and (dk formed) could be computed from the ts( r equest) of all the processes involved in the deadlock dk. 8. System Throughput (ST ): The number of processes completed in a specified period of time. The number processes completed per unit time may be computed by dividing it by t he entire duration. 63 9. Algorithm Overhead (D_ O(dk)): The read / write count required to process the deadlock (prevention, avoidance or detection and recovery) dk. 10. Total Overhead (SO): The total read/ write count of system including algorithm overhead to process the deadlock (prevention, avoidance or detection and recovery) dk. 11. Overhead Ratio (O HR): The ratio between t he algorithm overhead to process t he deadlock (prevention , avoidance or detection and recovery) dk to total total overhead in terms of read/ write count of the system . That is, OHR = Deadlock algorithm overhead Total overhead 12. Degree of Concurrent In vocations (DCC I (dk)): The number of invocations of deadlock detection of processes involved in the deadlock dk. 13. Abort/ Restart Ratio (A RR): The ratio between the number processes aborted due to deadlock and the total number of processes in t he system. That is, ARR = Number of processes aborted Number of processes in the system· 14. Deadlock Ratio (DR): The ratio of processes involved in deadlocks in comparison to the total number of processes. That is DR = Number of processes invofoed in deadlocks. ' Number of processes in the system Due to the solut ion technique, current system has W _ Timei (R ), DW _ T imei(dk) , TR_ T imei, D_ Counti and AR_ Counti implemented for Processes and D_ Count, SW _ T ime , D_ Size(dk) , ST , ARR, SO , DR, D_ O(dk) and OHR implemented for System. Average, and variance are computed for t he metrics related to processes. 64 4.4 Summary We discussed different types of traces generated and collected during execution. We also presented different performance metrics and the computation of those metrics from t he traces. In t he next chapter, we present the analysis of the results from executions for t he values of different input parameters . They are analysed and compared based on t he performance metrics . 65 Chapter 5 Simulation Study We were interested in studying the performances of the Banker 's algorithm for deadlock avoidance and Dreadlocks for deadlock detection with process termination as a deadlock recovery, with respect to the metrics discussed in the previous chapter. We present two sets of experiments for Dreadlocks and one set of experiments for the Banker's algorithm, in order to study them and demonstrate the operation of the proposed simulator. 5.1 Experimental Setup The scenarios used in the experiments are the general scenario and the scenario for the Banker 's algorithm. The general scenario entails a normal setup of processes, requests and resources. Request generation is randomized with the upper bound provided. There is no other restriction on the generation of requests. In the scenario for the Banker's algorithm , the processes are assigned a maximum claim for each resource in advance while generating workloads. These scenarios are discussed in detail in Chapter 3. The first two sets of experiments pertain to Dreadlocks: deadlock detection algo- 66 rithm. We compare t he results of termination types with respect to variation in either cores, or mean arrival rate. Experiment 1: In this experiment we set the workloads with fixed mean arrival rate and number of cores to vary. Table 5.1 shows the simulation parameters used for this experiment. Table 5.1: Simulation P arameters for Experiment 1 Parameters Value Scenario Type General Number of Processes 700 Number of Resources 70 Number of Instances per Resource I 30 Mean Arrival Rate 2.5 Number of Cores 25 , 50, 75 , 100, 125 Experiment 2 : We set fixed number of cores and mean arrival rate of the workloads is varied in this experiment. Table 5.2 shows t he simulation parameters used for this experiment. 67 Table 5.2: Simulation Paramet ers for Experiment 2 Parameters Value Scenario Type General Number of Processes 700 Number of Resources 70 Number of Instances per Resource 30 Mean Arrival rate 2.5 , 3.5, 5, 6.5, 7.5 Number of cores 75 For both experiments we enabled option of detecting deadlock using Deadlocks. In Chapter 3, we discussed selection criteria for processes to abort. We showed that once deadlock is detect ed , one or more processes can be selected to abort based on different criteria. They are: Random, Least Recently Used (LRU) , First In First Out (FIFO ), Minimum Runtime ( min. runtime), Maximum Runtime (max. runtime), Minimum Digest (min. digest) and Maximum Digest (max. digest). We believe terminating processes with higher runtimes would be more expensive as restarting such processes cost more. Based on this proposition, we made the following hypothesis. Hypothesis: Terminating processes with the minimum runtime will perform better 1 than terminating processes with the maximum runtime. Experiment 3: In this experiment , we show the performance of the Banker 's 1 Here we considered overall performance of t he system, t hat is, a termination type t hat shows better results for most of t he metrics. Due to the randomness of the input and the uncertainty of deadlocks, the selected termination type might not always perform well with all of the performance metrics. Hence, if t he termination type performs better for t he majority of performance metrics, it is considered better than other termination types. 68 deadlock avoidance algorithm. The simulation parameters set for this experiment are shown in Table 5.3. We set the workload with a fixed mean arrival rate and varied the number of cores . The scenario type chosen was scenario for the Banker 's algorithm. Table 5.3: Simulation Parameters for Experiment 3 P arameters Value Scenario Type Scenario for Banker 's algorithm Number of Processes 500 Number of Resources 100 umber of Instances per Resource I 10 Mean Arrival Rate 10 Number of Cores 25, 50, 75 , 100, 125 Next , we explain our observations and analysis for deadlock detection: Dreadlocks, deadlock recovery: termination with the minimum and the maximum runtime, and deadlock avoidance: the Banker 's algorithm. The observation are illustrated and analyses with respect to key performance metrics for the system. 5.2 Analysis For the first two experiments , simulation results are mainly shown for four performance metrics in each case. The metrics are: (i) Throughput , (ii) System Abort / Restart Ratio , (iii) Deadlock Wait Time, (iv) Overhead Ratio. For the third experiment , the 69 performance metrics shown are: (i) Turnaround t ime, (ii) Throughput, (iii) Process Wait Time and (iv) Overhead Ratio. Some of the performance metrics from the initial experiments are not shown for the third experiment because of limitations of the the deadlock handling technique. These metrics are discussed in Chapter 4. ow, we show our observations on the above ment ioned performance metrics for each experiment . We conducted several sets of experiments. In each experiment, we observed a common behavior for termination types wit h respect to each other. Every time, performances of different termination t ypes with respect to each other resulted in a similar pat tern , that is, one type of termination always performed better than other type of t ermination . Also, on comparing t wo different executions, we observed changes in patterns of graphs for some performance metrics, t hat is, when t he performances were compared with respect to the change in t he number of cores , the graphs had increasing/ decreasing lines and when t he performances were compared by changing the mean arrival time, t he graphs had lines wit h a zig-zag pat tern. The reason behind this is that t he implemented deadlock handling techniques are not developed by keeping change in t he mean arrival rate in mind . Hence, their performances with respect to the mean arrival rate are arbitrary and they showed a zig-zag pattern in the results. 5.2 .1 Observations from Experiment 1 Observation on Throughput : While conducting experiment with various input , we observe that termination of processes based on t he minimum runtime has better throughput t han t erminating process based on the maximum runt ime. With t he increase in the number of cores, throughput also increases in both cases. There was always a 70 significant difference between t he t hroughputs for minimum runtime and maximum runt ime. The result for t he given input paramet ers is shown in Figure 5. 1. Throughput 0.055 0.050 0.045 t 0.040 0 §0035 1 ~ ] 0.030 iii ~ 0.025 ~ ~ ~ 0.020 V 0 " C. 0.015 0.010 0.005 0.(XX) ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ 20 25 30 35 ~ ~ ~ ~ ~ ~ ~ ~ E ~ ~ lli ~ ~ m oo Nunber of cores oo rn rn Figure 5.1: Throughput: Minimum Runtime and Maximum Runt ime (by varying t he number of cores) The t hroughput increases wit h increase in number of cores because wit h availability of more cores, more processes can execute per unit t ime. Hence, more processes can complete t heir execut ions per unit t ime. Observation on System Abort / Rest art Ratio (ARR): System ARR increases wit h the increase in number of cores in both the cases. Termination with maximum runtime always shows better behavior t han termination wit h minimum runt ime. The system ARR chart for both termination types is shown in Figure 5.2. 71 System Abort/Restart Ratio !- r.'ii. Rt11tlre - Max. Ruitrre I o.~ 0.75 0.70 o.65 I V, 0.60 GI ~ 0.55 GI ~ 0.50 c." 0.45 iS 0 0.40 I- ~ 035 ~ 0.30 0 ~ 0.?5 020 I 0. 15 0.10 0.05 0.00 20 ?5 m ~ 40 ~ 50 ~ 60 ~ m 75 ~ 11.urber of cores ~ ~ E ~ ~ rn lli ~ ~ rn Figure 5.2: System ARR: Minimum Runtime and Maximum Runtime (by varying the number of cores) The maximum runtime has lower System ARR because less number of processes are terminated when termination criteria is selected as the maximum run time. In the same way, with the minimum runtime, more processes are terminated. Hence, termination with the minimum runtime has a higher System ARR. Observation on Deadlock Wait Time: The graph for Deadlock wait time for both termination types is shown in Figure 5.3. The Deadlock wait time for termination with the minimum runtime is significantly lower than termination with the maximum runtime. The deadlock wait time increases with an increase in the number of cores for the termination with the maximum runtime, while it remains nearly consistent for termination with the minimum runtime. 72 Deadlock Wait Tme !-~. Rllltm> - Max. Rllltm>j 2,200,CXXJ 2,CXXJ,CXX) l ,OOJ.CXXJ 1,600.CXXJ I ~ 1,400,CXXJ "' E,"' 1,200,(XX) "'~ l ,CXXJ,CXXJ iJ OOJ,CXXJ 600,000 400,CXXJ 200,CXXJ 0 20 25 m ~ ~ ~ ~ ~ ~ E m ~ ~ II.umber of cores ~ ~ ~ ~ ~ ~ m ~ ~ rn Figure 5.3: Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by varying the number of cores) With the increase in number of cores, more processes can execute per unit time. This results in more processes holding resources per unit time while waiting for other requested resources. Because of that, more time is required to resolve the deadlo ck. Hence, the Deadlock wait time increases (for the maximum runtime) or remain nearly consistent (for the minimum runtime) with the increase in number of cores. Observation on Overhead Ratio: The graph for overhead ratio for both termination types is shown in Figure 5.4. The Overhead ratio decreases with increment in the number of cores. Termination with the minimum runtime always showed a lower overhead ratio than termination with the maximum runtime. For most of the experiments, the overhead ratio varied between 0.02 to 0.4 for Dreadlocks for all termination types. 73 Overhead Ratio 1-"'1. Rllllm! - Max. Rllltrne l 0.40 ,:, 0.38 i 0.36 f 0.34 ~ 0.32 1 EO.ll f: 0.28 '§ 026 ~ 024 :it. 0.22 u .2 0.20 ~ 0.18 0 0.16 ',:, 0.14 ~ 0.12 1 b 0.10 ~ 0~ i6 0.06 ~ 004 0.02 0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ 20 ~ m ~ ~ ~ ~ ~ e ~ ~ ~ ~ ~ ~ e ill m ~ ~ rn w m llurber of cores Figure 5.4: Overhead Ratio: Minimum Runtime and Maximum Runtime (by varying the number of cores) With t he decrease in number of cores , less number of processes have to wait for the required resources. Because of that less reads/writes are required to detect the deadlock at every request. Hence, the overhead ratio decreases with the increase in number of cores. 5.2.2 Observations from Experiment 2 Observation on Throughput: We observed that termination based on the minimum runtime always performs better than termination based on the maximum runtime. The graph of throughput for both termination types is shown in Figure 5.5. The graph demonstrates a zig-zag pattern but the values remain between a specific range because the number of cores is same for all the experiments and the implemented 74 algorithms do not react to the change in the mean arrival rate. Throughput !- t11.R111tme 7S Cor~ - Max. R111ttne 7S Cor~ ! 0.045 0.040 0.010 0.005 1 O.CXXl - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ ~ U ll ~ U M ll U U U ~ U Y ~ ~ Arrival rate ~ ~ ~ U U M U U ~ D M M Figure 5.5: Throughput: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) Observation on System Abort/ Restart Ratio (ARR): For all values of the mean arrival rate, termination with the maximum runtime shows lower value than the minimum runtime. The lines are nearly consistent for all of the values of the mean arrival rate. The system ARR chart for both termination types is shown in Figure 5.6. Observation on Deadlock Wait Time: The Deadlock wait time for termination with the minimum runtime is always lower than termination with the maximum runtime. The graph demonstrating deadlock wait times for both termination types is shown in Figure 5. 7. For most of the experiments of this set, we observed a significant difference between both termination types. Termination with the maximum runtime shows variation while termination with the minimum runtime has almost consistent performance for all values of the mean arrival rate. 75 System Abort/Restart Ratio !- Rllltme 75 Co,es - Max. Rllltme 75 Co,es I tlil. 0.75 0.70 0.65 1 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 1 0. 10 0.05 0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ ~ ll ll w D ~ u ll ~ ~ u ~ Y ~ ~ ~ ~ il Arrival rate u u M u u n u ~ n Figure 5.6: System ARR: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) Deadlock Wait Tine !- Rllltme 75 Co,es - Max. Rllltme 75 COies I tlil. 2,200,000 2.000.000 I 1.oo:i.000 1,600,000 1 ,;- 1,400,000 t E 1,200,CXXl "' 1,000,000 8 u OOJ,000 600.000 400,000 200.000 I o~--------------------------------~ M ll ll W D M ll ll ~ ~ ~ ti Y ~ ~ Arrival rate ~ ~ il U U M U U ~ U ~ n Figure 5.7: Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by varying the mean arrival rate) 76 Observation on Overhead Ratio: The graph for the overhead ratio for both the termination types is shown in Figure 5.8. The overhead ratio for the termination type with the minimum runtime is always lower than the termination with the maximum runtime. For most of the experiments in this set, the overhead ratio varied between 0.1 to 0.4 for both the termination types. Overhead Ratio !- t111. Rllltrne 75 Caes - Max. Rllltme 75 Cc,es I 0.38 "O 0.36 ~ )/ 0.34 j 0.32 0 0.30 i 0.28 ~ 0.26 ], 0.24 ! 0.22 ~ 020 ~ 0.18 ~ 0. 16 ~ 0. 14 50.12 °i'i 0.10 ~ 0.00 i6 0.06 ~ 004 0.02 0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ ~ ll ll ~ D M M ~ W ~ ~ ti U ~ ~ ~ i l i l U U M U U ~ U ~ U Arrival rate Figure 5.8: Overhead Ratio: Min. Runtime and Max. Runtime (by varying the mean arrival rate) 5.2.3 Observations from Experime nt 3 Observation on Turnaround time: We observe that turnaround time decreases with increment in the number of cores for the Banker 's algorithm. The rate of decrease of the turnaround time is nearly constant between all the numbers of cores. The graph for turnaround time for the Banker 's algorithm is shown in Figure 5.9. 77 Turnaround rme j- Tura:Oll'd tme j 5,500 5,CXXJ 4,500 4,CXXJ 3,500 E3,(XX) ~ :,/. :,/. ~ 2,500 D 2,CXXJ ' 1,500 1,CXXJ 500 M ~ m ~ ~ ~ ~ ~ ro ffi m ~ ~ flllmber of cores ~ oo ~ @ ~ ill lli ~ ~ rn Figure 5.9: Turnaround Time: Banker 's Algorithm (by varying the number of cores) With the increase in number of cores, the processes have more number of cores available per unit time. Which results in less execution time for each process. Hence, the turnaround time decreases with the increase in the number of cores. Observation on Throughput: We observe that throughput of the syst em increases with the increase in the number of cores for Banker's algorithm for most of the times. The graph for throughput for the Banker 's algorithm is shown in Figure 5. 10. 78 Th roughpu t 1TIYootlJutl 0 .065 0 .060 0 .055 0 .050 " '.045 ~0 V C g0.040 ~ ~0 .035 ~ ~ 0.030 ~ ~ ~ 0 . 0 25 ~ V E0.020 C . O .D15 0 .010 0 .005 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 20 25 30 35 40 45 50 55 60 65 7 0 7S 80 f \ \ Jnb e ro fcor e s 85 90 95 1 00 105 110 115 1 20 125 1 30 F igu re5 .1 0 : Th roughput : Ba nk e r ' sA lg or i t hm(byv a ry in gthenumbe ro fc o r e s ) W ithin c re a s einnumbe ro fc o r e s ,morep roc e s s e sc a nex e cut epe run itt im e .H enc e , tin c re a s e sw i th th ein c r e a s einnumbe ro fc o r e s . Thoughf r om 1 0 0 th et h roughpu to1 2 5c o r e si ts l i gh t l yde c r e a s e s ,thev alu eo fthr o ughputrema in sh ighe rthano th e r a nno tde t e rm in eth eex a c tr e a so nbeh indt h iss t ran g eb ehav io r , numbe r so fc o r e s .W ec r a ndomne s so fth einpu tc a nbeon eo fthere a so n sfo rth a t . Obs e r v a tiononP ro c e s sW a i tT ime :ThePro c e s sw a i ttim ed e c r e a s e sw i th in c re a s e tr e su lt sf o rav e rag eandv ar i an c ef o ra l lo f innumbe ro fc o r e s .W eobs e r v edc o n s i s ten rap hfo rP ro c e s sw ai tt im ef o rtheB a nke r' s th ed i f f e ren texper im en t so fth iss e t .Theg hmi ssho wn inF igu re5 . 1 1 . a l g o r it 7 9 Process Wait Tme !- Aveiage - va~nce! 550,(XX) 500,o:xJ 450,(XX) 400,(XX) ~ 3S()r())'.) u ~ 300,o:xJ ::t. u ~ "50,(XX) 200,(XX) 150,(XX) 100,(XX) 50,CY.Xl 0 20 25 30 35 40 ~ so ~ ro ffi m ~ oo f\l.Jmber of cores ~ oo E w ~ ~ ~ ~ ~ rn Figure 5.11: P rocess Wait Time: Banker 's Algorithm (by varying the number of cores) Wit h t he increase in number of cores , more processes can execute per unit t ime. This results in more processes holding resources per unit t ime while wait ing for other requested resources. Because of t hat, processes have to wait less for other requested resources . Hence, t he P rocess wait time decreases with the increase in number of cores . Observation on Overhead Ratio: The graph for overhead ratio for Banker 's algorit hm is shown in Figure 5. 12. The overhead ratio increases wit h increases in t he number of cores. The overhead ratio is higher t han 0.80 for t he given input paramet ers for all cores. We observed that for all experiments of this set , the overhead ratio for t he Banker 's algorithm has always been higher than 0.75. 80 Overhead Ratio j- oveihead Rat~ I 0.95 'O 0.90 i 0. BS f 0.80 ~ ~ 0.75 1ii 0.70 ~ 0.65 , 0.60 v' ~ 055 f 0.50 ~ 0.45 1 0.40 · ~ f 0.35 '§ 030 Cl ci' 0.25 ~ 0.20 0 '6 0.15 ~ ~ 0.10 0.05 0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ 20 25 3(J 3S 40 lS SO 55 60 65 70 75 80 BS 90 95 100 105 110 115 120 125 130 1\1.urber of cores Figure 5. 12: Overhead Ratio: Banker 's Algorit hm (by varying the number of cores) For t he Banker 's algorit hm , safe state is checked for each request. Wit h t he increase in number of cores, more processes execute per unit time. Hence, more processes can request resources per unite t ime. The algorit hm determines a safe sequence while checking the safe state and it assumes t hat the processes will fini sh t heir execution sequent ially according to t he safe sequence. This idea is not helped by increasing the numb er of cor es. In som e cases, t h e incr ease in numb er of cor es m a ke t h e p erforma n ce worse. Hence, t he overhead ratio increases wit h the increase in number of cores for t he Banker 's algorit hm. 81 5.3 Summary We presented the simulation experimentation for termination with the minimum runtime and the maximum runtime, in this chapter. As hypothesized , termination with the minimum runtime outperforms termination with the maximum runtime in most of the cases. We also presented simulation experiments for the Banker's deadlock avoidance algorithm. These experiments showed valid results for deadlock avoidance as well as deadlock detection and recovery. Hence we are confident that our simulation implementation of deadlock in multicore systems is fairly sound. 82 Chapter 6 Conclusions and Future Work Deadlock is a very common bug in software applications, yet, it is ignored by most of the operating systems [32]. With the advent of multicore processors, the problem of deadlock has gained renewed attention in research. Many approaches that have been developed to handle deadlock in multicore systems were discussed in Chapter 2. In this chapter, we conclude the thesis with providing possible future work that can be conducted to extend the research carried our in this thesis. 6.1 Conclusion The primary contributions of this thesis are: (i) The design and implementation of a flexible framework to simulate deadlock in multicore systems and (ii) Three scenarios implemented to incorporate different deadlock handling techniques. The scenarios are: general scenario, scenario for the Dining Philosopher's problem and scenario for the Banker's algorithm. Two deadlock handling strategies, deadlock avoidance and deadlock detection & recovery, are simulated. The deadlock avoidance strategy is the Banker's algorithm and the deadlock detection strategy is Dreadlocks. Deadlock recovery follows deadlock detection , which is done by terminating one of the processes involved in the deadlock. The experience gained by developing this simulator involves 83 software design, implementation and performance analysis. (iii) Demonstration of the soundness and validity of the proposed framework by presenting the result analysis of the experiments. The proposed simulator has the basic components required to simulate a deadlock handling algorithm in multicore systems. More deadlock handling techniques can be easily added to this simulator to study, which can furnish initial insights on behavior of the algorithms in practical multicore systems. These insights can really be helpful to study the performances of the algorithms as well as to identify their shortcomings. Based on the shortcomings, necessary guidelines can be offered for improvements of the algorithm. 6.2 Future Directions The proposed framework to incorporate deadlock handling algorithms in multicore systems is just the beginning of research into deadlock handling algorithms in multicore systems. There are many directions in which the work presented in this thesis can be expanded. We outline some of them next. • The simulation can be made more sophisticated by refining and improving components such as user interrupts. Current system simulates only I/ 0 interrupts, more real system interrupts can be introduced to the proposed framework. • Cache memory can also be simulated to the system to make the framework's behaviour closer to real systems. • More scheduling algorithms can be added to observe performance of deadlock 84 handling algorithms with them. • More deadlock handling algorithms can be incorporated. • Modelling resources and Resource Manager can be refined and improved by adding different types of resources with different properties. Also, function of Resource Manager can be improved to deal differently with various resources. • The current performance study has exposed some irregularities in the implemented algorithms as well as some limitations of them. For example, for Dreadlocks, digest propagation is not accurate and it considers all the processes with equal priority. These limitations can be overcome to make the algorithms more efficient. For example, to overcome limitation of digest propagation, a centralized schema can be introduces to update digests for all the processes at regular intervals. I would like to continue working on some of these directions in the future. 85 Bibliography [l] Rahul Agarwal and Scott D. Stoller. Run-time detection of potential deadlocks for programs with locks , semaphores, and condition variables. In Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging, PADTAD '06, pages 51- 60, New York, NY, USA, 2006. ACM. [2] Rahul Agarwal, Liqiang Wang, and Scott D. Stoller. Detecting potential deadlocks with static analysis and run-time monitoring. In Proceedings of the First Haifa International Conference on Hardware and Software Verification and Testing, HVC '05 , pages 191-207, Berlin, Heidelberg, 2006. Springer-Verlag. [3] Alex A. Aravind and Viswanathan Manickam. A flexible simulation frame- work for multicore schedulers: Work in progress paper. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, SIGSIM PADS '13, pages 355- 360, New York, NY, USA, 2013. ACM. [4] Ferenc Belile An efficient deadlock avoidance technique. IEEE Trans. Comput., 39(7):882-888, July 1990. [5] Saddek Bensalem, Jean-Claude Fernandez, Klaus Havelund, and Laurent Mounier. Confirmation of deadlock potentials detected by runtime analysis. In Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging, PADTAD '06, pages 41- 50, New York, NY, USA, 2006. ACM . [6] Saddek Bensalem and Klaus Havelund. Scalable dynamic deadlock analysis of multithreaded programs. In IN PARALLEL AND DISTRIBUTED SYSTEMS: 86 TESTING AND DEBUGGING (PADTAD - 3), IBM VERIFICATION CONFERENCE, 2005. [7] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for safe programming: Preventing data races and deadlocks. SIGPLAN Not. , 37(11):211-230, November 2002. [8] Omran A. Bukhres. Performance comparisons of distributed deadlock detection algorithms. In Proceedings of the Eighth International Conference on Data Engineering, pages 210-217, Washington, DC , USA, 1992. IEEE Computer Society. [9] E. G. Coffman , M. Elphick, and A. Shoshani. System deadlocks. ACM Comput. Surv., 3(2):67-78, June 1971. [10] Edsger W. Dijkstra. The origin of concurrent programming. chapter Hierarchical Ordering of Sequential Processes, pages 198- 227. Springer-Verlag New York, Inc., New York, NY, USA, 2002. [11 J Edsger Wybe Dijkstra. Cooperating sequential processes, technical report ewd123. Technical report , 1965. [12] Dawson Engler and Ken Ashcraft. Racerx: Effective, static detection of race conditions and deadlocks. SIGOPS Oper. Syst. Rev. , 37(5):237-252, October 2003. [13] Luca Ferrarini , Luigi Piroddi, and Stefano Allegri. A comparative performance analysis of deadlock avoidance control algorithms for fms. Journal of Intelligent Manufacturing, 10(6):569- 585 , 1999. [14] Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. Extended static checking for java. SIGPLAN Not., 37(5):234-245, May 2002. 87 [15] Patrice Godefroid. Model checking for programming languages using verisoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '97, pages 174-186, New York, NY, USA , 1997. ACM. [16] A. N. Habermann. Prevention of system deadlocks. Commun. ACM, 12(7):373- ff., July 1969. [17] Sibsankar Haldar and Alex Aravind. Operating Systems. Pearson Education, 2010. [18] W. Haque. Concurrent deadlock detection in parallel programs. Int. J. Comput. Appl., 28(1):19-25 , January 2006. [19] Klaus Havelund. Using runtime analysis to guide model checking of java pro- grams. In Proceedings of the 7th International SPIN Workshop on SPIN Model Checking and Software Verification, pages 245- 264, London, UK , UK, 2000. Springer-Verlag. [20] Richard C. Holt. Some deadlock properties of computer systems. In Proceedings of the Third ACM Symposium on Operating Systems Principles, SOSP '71 , pages 64-71 , New York, NY, USA , 1971. ACM. [21] Pallavi Joshi, Chang-Seo Park, Koushik Sen, and Mayur Naik. A randomized dynamic program analysis technique for detecting real deadlocks. SIGPLAN Not. , 44 (6): 110- 120, June 2009. [22] Edgar Knapp. Deadlock detection in distributed databases. ACM Comput. Surv., 19(4):303-328, December 1987. [23] Eric Koskinen and Maurice Herlihy. Dreadlocks: Efficient deadlock detection. In 88 Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '08, pages 297-303 , New York, NY, USA, 2008. ACM. [24] Natalija Krivokapic, Alfons Kemper , and Ehud Gudes. Deadlock detection in distributed database systems: A new algorithm and a comparative performance analysis. The VLDB Journal, 8(2):79-100, October 1999. [25] Soojung Lee and Junguk L. Kim. Performance analysis of distributed deadlock detection algorithms. IEEE Trans. on Knowl. and Data Eng. , 13(4):623-636, July 2001. [26] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. SIGOPS Oper. Syst. Rev. , 42(2):329-339, March 2008. [27] Mayur Naik, Chang-Seo Park, Koushik Sen, and David Gay. Effective static deadlock detection. In Proceedings of the 31st International Conference on Software Engineering, ICSE '09 , pages 386-396, Washington, DC , USA, 2009. IEEE Computer Society. [28] Onur Ozgun and Yaman Barlas. Discrete vs. continuous simulation: When does it matter. In Proceedings of the 27th international conference of the system dynamics society, volume 6, pages 1-22, 2009. [29] Yao Qi, Raja Das, Zhi Da Luo , and Martin Trotter. Multicoresdk: A practical and efficient data race detector for real-world applications. In Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, PADTAD '09 , pages 5:1-5:11 , New York, NY, USA, 2009 . ACM. [30] M. Sfinghal. Deadlock detection in distributed systems. Computer, 22(11):37-48, November 1989. 89 [31 J William Stallings. Operating Systems: Internals and Design Principles. Prentice Hall Press , Upper Saddle River, NJ, USA, 6th edition, 2008. [32] Andrew S. Tanenbaum. Modern Operating Systems. Prentice Hall Press, Upper Saddle River , NJ , USA , 3rd edition, 2007. [33] I. Terekhov and T. Camp. Time efficient deadlock resolution algorithms. Inf. Process. Lett., 69(3):149-154, February 1999. [34] Amy Williams, William Thies, and Michael D. Ernst. Static deadlock detection for java libraries. In Proceedings of the 19th European Conference on ObjectOriented Programming, ECOOP '05 , pages 602-629 , Berlin, Heidelberg, 2005. Springer-Verlag. [35] Chim-fu Yeung, Sheung-lun Hung, and Kam-yiu Lam. Performance evaluation of a new distributed deadlock detection algorithm. SIGMOD Rec. , 23(3):21- 26, September 1994. 90