A FLEXIBLE SIMULATION FRAMEWORK FOR THE STUDY OF DEADLOCK
RESOLUTION ALGORITHMS IN MULTICORE SYSTEMS

by

Dhruv Desai
B.E., Gujarat Technological University, 2012

THESIS SUBMITTED IN PARTIAL FULFILMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN

COMPUTER SCIENCE

UNIVERSITY OF NORTHERN BRITISH COLUMBIA
February 2016

© Dhruv Desai, 2016

Abstract
Deadlock is a common phenomenon in software applications, yet it is ignored by
most operating systems. Although the occurrence of a deadlocks in systems is not
frequent , in some cases , the effects are drastic when deadlock occurs. The ongoing
trend in processor technology indicates that future systems will have hundreds and
thousands of cores. Due to this imminent trend in hardware development , the problem
of deadlock has gained renewed attention in research. Deadlock handling techniques
that are developed for earlier processors and distributed systems might not work well
with multicore systems, due to their architectural differences. Hence, to maximize the
utility of multicore systems , new programs have to be carefully designed and tested
before they can be adopted for practical use. Many approaches have been developed
to handle deadlock in multicore systems, but very little attention has been paid to
comparing the performance of those approaches with respect to different performance
parameters.
To fulfil the above mentioned shortfalls, we need a flexible simulation testbed to
study deadlock handling algorithms and to observe their performance differences in
multicore systems . The development of such a framework is the main goal of this
thesis. In the framework, we implemented a general a scenario, scenario for the Dining Philosopher 's problem and scenario for the Banker's algorithm. In addition to
these scenarios, we demonstrate the flexibility, soundness, and use of the proposed
framework by simulating two different deadlock handling strategies - deadlock avoidance (the Banker 's algorithm) and deadlock detection (Dreadlocks). The deadlock
detection is followed by deadlock recovery to resolve the detected deadlock. We also
present result analysis for the different set of experiments performed on the implemented strategies. The proposed simulation testbed to study deadlocks in multicore
systems is developed using Java.

Contents

Abstract
Contents

ii

List of Figures

V

List of Tables

vii

Acknowledgements

viii

1 Introduction

1

1.1

Overview.

1

1.2

Trends in Hardware Development

6

1.2.1

Distributed Systems vs Multicore Systems

6

1.2.2

Deadlock: Distributed Systems vs Multicore Systems

7

1.3

Motivation . .

10

1.4

Contributions

11

1.5

Thesis Organization .

13

2 Background and Related Work

14

2.1

Deadlock Handling Approaches in Multicore Systems

14

2.1.1

Language-level Approaches .

15

2.1.2

Static Analysis . .

17

2.1.3

Dynamic Analysis .

19

11

2.1.4

Model Checking . . . . . . . . . . . . . .

22

2.1.5

Type and Annotation Based Techniques

23

2.2

Performance Evaluation Studies of Deadlock Handling Algorithms

24

2.3

Summary

27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Simulation Framework for Deadlock in Multicore Systems

28

3.1

Simulation . . . . . . . . . . . . . . . . . . . . . . . . .

28

3.2

Architecture of Multicore Scheduler System Framework

30

3.2.1

Workload Generator

31

3.2.2

Machine .

32

3.2.3

Scheduler

32

3.2.4

Execution Trace and I/0 Trace

33

3.2.5

Performance Calculation Engine .

34

3.3

Simulation of Deadlock in Multicore Systems .

34

3.4

Architecture of Simulation Framework for Deadlock in Multicore Systems 37

3.5

3.6

3.7

3.4.1

Workload Generator .

40

3.4.2

Scheduler and Machine

43

3.4.3

Resource Manager

44

Algorithms Implemented .

44

3.5.1

Deadlock Avoidance: Banker's Algorithm .

45

3.5.2

Deadlock Detection: Dreadlocks . . . . .

48

3.5.3

Deadlock Recovery: Process Termination

49

User Interface . . . . . . . . . . . . . .. . . . .

51

3.6.1

Performance Parameter Setting Window

52

3.6.2

Performance Observation Window .

55

Summary

. . . .. .. . . . . . . . . . . .

lll

56

4

Execution Trace and Performance Evaluation

57

4.1

Traces . . . . . . . . . .

57

4.2

Performance Evaluation

59

4.3

List of Performance Metrics

62

4.4
5

Performance Metrics (for Process i)

62

4.3.2

Performance Metrics (for System)

63

Summary . . . . . . . . . . . . . . . . .

65

Simulation Study

66

5.1

Experimental Setup .

66

5.2

Analysis . . . . .. .

69

5.2.1

Observations from Experiment 1 .

70

5.2.2

Observations from Experiment 2 .

74

5.2.3

Observations from Experiment 3.

77

Summary . . . . . . . . . . . . . . . . .

82

5.3
6

4.3.1

Conclusions and Future Work

83

6.1

Conclusion . . . .

83

6.2

Future Directions

84

Bibliography

86

lV

List of Figures

1.1

Deadlock Situation . . . . . . .

2

3.1

Architecture of MSS Framework

31

3.2

Core Architecture of MSS Framework .

38

3.3

Basic System Requirements for Deadlock

39

3.4

Architecture of Simulation Framework for Deadlock in Multicore Systems 40

3.5

Dining Philosopher's Scenario

42

3.6

Example of Dreadlocks . . . .

48

3.7

Main Window of the Simulator

51

3.8

Workload Parameter Setting Window

52

3.9

Configuration Parameter Setting Window .

53

3.10 Simulation Run Window . . . .. .

54

3.11 Performance Observation Window .

55

4.1

Sample DL Trace . . . . . . . . . .

. 59

5.1

Throughput: Minimum Runtime and Maximum Runtime (by varying
the number of cores) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2

System ARR: Minimum Runtime and Maximum Runtime (by varying
the number of cores)

5.3

72

Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by
varying the number of cores) . . . . . . . . . . . . . . . . . . . . . . .

5.4

73

Overhead Ratio: Minimum Runtime and Maximum Runtime (by varying the number of cores) . . . . . . . . . . . . . . . . . . . . . . . . . . 74

V

5.5

Throughput: Minimum Runtime and Maximum Runtime (by varying
the mean arrival rate)

5.6

. . . . . . . . . . . . . . . . . . . . . . . . . . . 75

System ARR: Minimum Runtime and Maximum Runtime (by varying
the mean arrival rate)

. . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5. 7 Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by
varying the mean arrival rate) . . . . . . . . . . . . . . . . . . . . . .
5.8

Overhead Ratio: Min. Runtime and Max. Runtime (by varying the
mean arrival rate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.9

76

77

Turnaround Time: Banker's Algorithm (by varying the number of cores) 78

5.10 Throughput: Banker's Algorithm (by varying the number of cores) . . . 79
5.11 Process Wait Time: Banker's Algorithm (by varying the number of cores) 80
5.12 Overhead Ratio: Banker 's Algorithm (by varying the number of cores)

VI

81

List of Tables

3.1

Entities, States and Events .

35

3.2

System in a Safe State . . .

46

3.3

System in An Unsafe State .

47

5.1

Simulation Parameters for Experiment 1

67

5.2

Simulation Parameters for Experiment 2

68

5.3

Simulation Parameters for Experiment 3

69

vii

Acknowledgements
Although the title page of this thesis shows my name, this thesis would not have
been possible without the contributions of many others. I cannot possibly name here
all of them, but I would like to forward my special thanks to a few.
My deepest gratitude is to my supervisor Dr. Alex Aravind for his continuous
guidance, support, constant encouragement and endless patience. Alex has been a
mentor not only academically, but also throughout my graduate experience at UNBC.
I would like to thank my supervisory committee members Dr. Waqar Haque and Dr.
Geroge Jones, for their time and valuable inputs that helped in shaping this thesis
up. I am also thankful to Dr. Ajit Dayanandan for his suggestions during the initial
stage. My thanks are also due to the external examiner Dr. Balbinder Deo and the
chair of my defense for reading this thesis. I extend a special thanks to Dr. Andreas
Hirt for giving me opportunities to assist him in his courses.
A special word of gratitude to all my colleagues over the past few years who have
helped me to become a better researcher and a better person. The list would definitely include Dhawal, Denish, Giridhar, Darshik, Rafael, Rahim, Shanthini, Suresh,
Behnish, Raman and Mehul. I also thank Brooke for proofreading the draft of my

thesis. I specially thank our systems administrator Chris for always being there for
technical support. A special thanks to Ms. Mahi Aravind for her support and great
food on various occasions.
Most importantly, none of this would have been possible without the love and
patience of my family. I will be forever thankful to my parents and my brother for
their unwavering support and encouragement.

Vlll

Chapter 1

Introduction

Deadlock was introduced and studied in the mid-1960s by Dijktra [11], originally
referred to as "Deadly Embrace". Additional work was done in late 1960s and early
1970s by Coffman [9], Habermann [16], Holt [20] and others. In this chapter, we
start by introducing the problem of deadlock, necessary conditions for deadlock to
happen, and techniques to deal with deadlock. A discussion about changing trends
in hardware, the rationale behind this thesis, and the contribution it lends to the
knowledge will be discussed further on.

1 .1

Overview

Deadlock is a situation in a resource allocation system in which two or more processes

are in a simultaneous wait state, each one waiting for one of the others to release a
resource before it can proceed [4]. In the above definition of deadlock, processes are
the active entities in a system that share resources such as data items in a database,
a data structure, a file etc. Figure 1.1 shows a system in a deadlocked state.

1

Figure 1.1: Deadlock Situation

In the figure above, ellipses represent processes while rectangles represent resources. An arrow going towards a resource from a process shows that a request
is made by the process to acquire the resource, while an arrow going towards a process from a resource shows that the resource is held by the process. Moreover, a
resource may have several instances and in that case a process can request one or
more instances of a resource.
Presence of four conditions is necessary for deadlock to occur in any system. These
conditions are known as Coffman's conditions as they are from a 1971 publication by
Coffman et al. [9]. Coffman's conditions are Mutual exclusion, Hold and Wait , No
preemption and Circular wait.

• Mutual exclusion means that only one process may use a resource at a time, in
a given system.

• Hold and wait describes that a process may hold allocated resources while wait-

2

ing for the requested resources to be allocated.
• No preemption indicates that no resources can be forcefully taken back from a

process holding it, unless a process releases the held resources upon completion.
If the first three conditions hold, as a consequence, a system may end up in the
fourth condition of circular wait.
• Circular wait is defined as, a closed chain of processes, such that each process

holds at least one resource needed by the next process in the chain [31 J.

In Figure 1.1 , process Pl has exclusive access to resource R2 and process P2 has
exclusive access to resource Rl (Mutual Exclusion). To complete their executions,
process Pl needs resource Rl and process P2 needs resource R2 (Hold and wait).
Any process can not release the resources that they hold before finishing their executions (No preemption). Thus both processes wait for the resources that are held by
other processes ( Circular wait) and hence they stay in waiting state forever. In this
situation, the system is deadlocked.
Knapp [22] and Singhal [30] discuss two types of deadlock: resource deadlocks and
communication deadlocks. Resource deadlocks involve reusable resources while communication deadlocks involve consumable resources. A reusable resource is one that

can be safely used by only one process at a time and is not depleted by that use such
as CPU, printer, disk etc. whereas a consumable resource is one that can be created
and consumed such as messages or signals used between processes to communicate
[31 J. This thesis focus only on resource deadlock, referred to as deadlock further.
There are three general strategies to deal with deadlock (i) Deadlock prevention,
(ii) Deadlock avoidance and (iii) Deadlock detection and recovery.

3

• Deadlock prevention: A deadlock prevention technique breaks at least one

of the four Coffman's conditions [9]. Breaking one of the first three conditions
is hard because it is difficult to determine which of the conditions may lead the
system to deadlock. Moreover, satisfying these conditions is the fundamental
requirement for most of the systems. Hence , most of the deadlock prevention
techniques work by preventing circular waiting of processes.
• Deadlock Avoidance : If some specific information, such as all required re-

sources, about processes in a system are known in advance, deadlock can be
avoided by always keeping the system in a safe state. A system is considered
to be in a safe state if all the processes can complete their execution without
forming a deadlock [17]. All the systems are in a safe state at the beginning
because the processes can always be executed sequentially. Deadlock can never
occur in a sequential execution as there is no competition for resources. A safe
state never ends in a deadlock. Also not all unsafe states result in a deadlock.
Safe or unsafe states only determine the probability that a system might enter a
deadlock. One of the limitations of this strategy is that knowing all the required
resources in advance is unrealistic. Determining that information in advance is
not possible for modern systems due to their dynamic behaviour. The Banker 's
algorithm is a popular method for deadlock avoidance which is explained in
Chapter 3.
• Deadlock Detection and Recovery: Due to the limitation of prevention and

avoidance algorithms as discussed above, detection and recovery techniques are
more popular in practical systems. They allow processes to acquire resources
whenever possible and a check can be done periodically to detect if deadlock
has occurred in the system. Upon detection of deadlock, a recovery strategy
is applied. Typically, a recovery strategy includes aborting all or some of the
4

deadlocked processes. These techniques are suitable in systems where deadlocks
are not frequent, and recovery cost is not very expensive. We have implemented
a deadlock detection technique known as Dreadlocks [23] in our system. This
technique is explained in Chapter 3.
Once a deadlock is formed and detected in t he system , t he next step is deadlock
recovery. Deadlock recovery involves t he system taking corrective actions by selecting some processes t o forcefully terminate and make t heir acquired resources
available. The terminated processes can be re-init iated later. If t he selection of
victim processes is not done carefully, abort ing can lead to livelocks. " L ivelock
is a sit uation in which two or more processes continuously change t heir states
in response to changes in t he other process(es) wit hout doing any useful work"
[31J. The selection criteria to choose one or more processes to terminate are
discussed in Chapter 3.

Most of t he modern operating systems- including Windows and t he UNIX familyignore deadlock [32]. In t hese systems, if t he user observe t hat t he system is slowing
down or freezes due to some problem , probably deadlock, it is simply restarted. There
are some reasons to ignore a deadlock: (i) If t he occurrence of deadlock is very rare,
and prevent ion, avoidance or detection and recovery overhead is very high, (ii) If
restart ing jobs does not incur significant ly high costs, and (iii) If other deadlock
handling techniques make t he system slow, or are restrictive and complex, ignoring
deadlock is preferred. Ignoring deadlock is chosen assuming t hat most users would
prefer t he occasional occurrence of deadlock rather t han living wit h t he limitations of
deadlock handling strategies.

5

1.2

Thends in Hardware Development

There has been a significant change in computer systems hardware in the past couple of
decades. Initially, most systems were standalone using single cores. Later, distributed
systems were developed, followed recently by multicore systems becoming popular for
both personal and commercial purposes. Hardware developments in turn demand
changes in programming for effective use of multicore systems. So, multithreaded
programs designed to frequently share resources is expected to become a norm for
program development in the future .

1.2.1

Distributed Systems vs Multicore Systems

Distributed systems consist of geographically distant , separate, autonomous comput-

ers, connected through a network, communicating with each other by passing messages. They could often coordinate their activities to solve a common problem or
to share the computing resources and storage devices of the systems. System or application processes executing on any network node can use both local and remote
sha red resources , simultaneously and exclusively. A multicore processor is an inte-

grated circuit (IC) to which two or more processors have been attached for enhanced
performance, reduced power consumption, and more efficient simultaneous processing
of multiple tasks.
Distributed and multicore systems share two similarities: concurrency and resource
sharing. To achieve these, nodes in distributed systems and cores in multicore systems
have to coordinate with each other. For the purposes of communication and coordi-

6

nation, message passing is used in distributed systems, whereas reading and writing
through shared memory is used in multicore systems. Each node in distributed systems is an autonomous system while in multicore systems a core is not. Concurrency
and resource sharing can lead to deadlock in both the systems.

1.2.2

Deadlock: Distributed Systems vs Multicore Systems

Two types of deadlock can occur in distributed systems, either resource deadlock
or communication deadlock. Resource deadlock happens mainly because of circular
dependencies of processes on each other for resources. Communication deadlock primarily occurs due to lost or delayed messages. Because of the architectural differences
discussed above, resource deadlocks are more common than communication deadlocks
in multicore systems.
Deadlocks are dealt with either in a centralized way or a distributed way in distributed systems. In centralized deadlock handling algorithms, a single node keeps list
of all other nodes and has complete authority to allocate and preempt resources from
all other nodes. In distributed deadlock handling algorithms all the nodes keep list of
other nodes and update the list frequently, whereas in multicore systems, deadlocks
are mostly dealt with in a centralized way because cores are not autonomous and are
typically managed by a single operating system. Distributed deadlock handling algorithms are developed keeping communication overhead in mind. Multicore systems
do not have to deal with the overhead of keeping the node address-lists and different
topologies among the nodes. Moreover , to the best of our knowledge, no performance
evaluation studies have been performed for multicore systems.
Multicore systems use locks to avoid data races in programs. Improper use of these
7

locks can create deadlock in t he system. Improper lock acquisition can happen due
to following reasons: (i) It becomes difficult to follow t he lock order discipline t hat
could avoid deadlock as most of t he t ime software systems are written by multiple
programmers. (ii) As stated before, adding new locks to fix race conditions introduces
deadlock. (iii) Sometimes t hird-party software, such as plugins, are incorporated in
software systems; such t hird-party software may not follow t he locking order discipline
followed by t he software systems and t his could result in deadlock.
Due to t he uncertainty of deadlock and its occurrence in modern systems, researchers have come up with different approaches to deal wit h deadlock in mult icore
systems. The approaches are classified as follows: Language-level [5 , 18], Static analysis [12, 14, 27], Dynamic analysis [1, 21, 29], Model checking [15] and Type and
annotation based techniques [7].
Language-level approaches strive to make concurrency control easier by providing

higher level constructs t hat do not allow uses t hat can cause errors. Even t hough
languages alleviate some of t he complexity of concurrency, t he problem is not completely eliminated. Static program analysis techniques do not require any execution
of t he application as t hey directly work wit h source code. They examine all t he
possible deadlocks and often give no false negatives, but t hey report many false positives. Dynamic program analysis techniques find all potential deadlocks during t he
program execution. As they operate at runtime, t hey only visit feasible paths and
have accurate views of t he values of variables.
M odel checking takes simplified descriptions of t he code and uses techniques to

explore vast state spaces efficient ly as it systematically tests t he code on all inputs .
Because of t his, t his approach does not need to explore massive state spaces in order
to find deadlocks. Type and annotation based techniques help to avoid deadlocks
8

during coding by allowing programmers to specify a partial order among locks. Also,
t he type checker statically ensures t hat t he partial order among locks is maintained,
and well-typed programs are deadlock-free. We discuss t hese approaches in detail in
Chapter 2.
Parallel systems share more similarities with mult icore systems than distributed
systems. But due to t heir architectural and functional level differences, deadlock handling techniques for parallel systems cannot be directly applied to mult icore systems.
The differences between parallel and mult icore system are as follows:

• Large tasks are divided into several, similar , smaller subtasks to carry out computations in parallel systems. Then , each of t he smaller subtask is solved independently and at the same t ime. The results of t he smaller subtasks are
combined up on completion. In multicore systems, t he tasks are often not related to each other. Also, t hey often do not work at t he same time rather
multicore systems use concept of interleaving.
• Parallel systems are basically designed for speed hence t hey have well defined
bases such as switching context. They also share less of controllers and cache
memory. Context switching depends on type of scheduler in multicore syst ems.
Multicore systems also share more of controllers and cache t han parallel systems.

Another type of hardware system , t hat is similar to multicore systems, is multiprocessor systems. There are multiple ICs in single processor of multicore systems
whereas multiprocessor systems consist of more t han one processors. Each of t he processor in multiprocessor systems may contain one or more ICs in t hem. In terms of
effi ciency, mult icore systems perform better on a single program because t he cores can

9

execute multiple instructions at the same time but not multiple programs . When using multiple programs, multiprocessor system perform better than multicore systems.
Due to the differences in their architectures, multicore system is more favourable
system for ordinary users. Any extra support or configuration is not required for
multicore systems and they cost less too. Multiprocessor systems are more favourable
for special purpose uses. Often they require extra support or configuration based on
the purpose, and they are expensive too.

1.3

Motivation

Deadlock is a very common phenomenon in software applications. Lu et al. showed
that out of 105 randomly selected real world concurrency bugs that are collected
from 4 large open-source applications: MySQL, Apache, Mozilla and OpenOffice representing both server and client applications, 31 are deadlock bugs [26]. Also,
a report produced by Oracle's bug database shows that roughly 6500 out of 198,000
reports, or 3%, contain the keyword 'deadlock' [27]. Although occurrence of deadlocks
in systems is not frequent in some cases, the effect are drastic when deadlock occurs.
During the testing phase, deadlocks can be easily created and detected statically but detecting deadlocks dynamically is difficult because concurrent applications
are non-deterministic. Deadlocks happen under certain intricate sequences of lowprobability events, and producing with this subtle sequence of low-probability events
is not guaranteed during execution of the application. Stress testing or random testing
may not always show the existence of deadlocks in an application, but the probability
of their occurrence increases greatly when an application is released to thousands of
users. Therefore, deadlock detection tools are required to analyze and find real and
10

potential deadlocks in application.
From the discussion in the previous section, we observe that a lot of research has
been conducted to develop techniques to handle deadlocks. But very little attention
has been paid towards comparing the performance of those approaches to deal with
deadlock. To the best of our knowledge , no research has been pursued to compare
the performance of different algorithms in multicore systems. All of the performance
comparisons that we could find were done on distributed systems, distributed database
systems, flexible manufacturing systems and others. Moreover, only a few of the
performance studies use simulation testbeds to analyze the performances. The ongoing
trend in processor technology indicates that future systems will have hundreds and
thousands of cores. To maximize their utility, new programs have to be carefully
designed and tested using a large number of cores and a proper set of experiments
before they can be adopted for practical use.
Thus the significant presence of deadlocks in large-scale, real-world applications,
the uncertainty around their occurrence, their crucial effect and lack of research work
towards performance study in multicore systems are the main motivating factors for
this research. Awareness that the field of multicore systems is quite new and emerging
compared to other well-established computer systems also provides rationale for this
study.

1 .4

Contributions

As discussed before, deadlock handling techniques that are developed for earlier processors and distributed systems will not work well with multicore systems, mainly
11

because of the difference in the architecture of these systems. Moreover, communication overhead is one of the most influencing factors for deadlock handling techniques
that are used in distributed systems. As multicore systems use shared memory, the
effect of communication overhead is rare. Hence, there is a need to explore more
deadlock handling techniques that are efficient in multicore systems. However , there
is also lack of availability of tools to study deadlock handling techniques with respect
to different performance parameters in multicore systems.
This thesis helps to fulfil the above mentioned shortfalls in the field of computer
science research. Studying performance of the implemented deadlock handling techniques and its effect in multicore systems is very important. Developing a flexible
simulation testbed would be very useful to study deadlock handling algorithms and
also to observe different performance parameters for those algorithms in multicore systems. Development of such a flexible framework is one of the primary contributions
of this thesis. All the primary contributions of this thesis are listed as follows:

• A flexible simulation framework to incorporate different types of deadlock handling algorithms
• Three different scenarios implemented to simulate different type of deadlock handling techniques. The scenarios are: General scenario , scenario for the Dining
Philosopher's problem, scenario for the Banker's algorithm
• Two different deadlock handling strategies are simulated in the framework with
the above mentioned scenarios. The deadlock avoidance strategy is the Banker's
algorithm and deadlock detection strategy is Dreadlocks. Deadlock detection is
followed by deadlock recovery- termination of one of the victim processes to end
the deadlock.

12

• Result analysis of experiments performed on implemented algorithms with respect to various performance metrics

1.5

Thesis Organization

The overview of the problem of deadlock, necessary conditions that are required for
the formation of deadlock and different approaches to handle deadlocks were briefly
explained in this chapter. In Chapter 2, a detailed literature review of the current
methods to deal with the problem of deadlock in multicore system is presented. The
review and the methods are categorized by different types. At the end of this chapter
we present a comprehensive survey of all the performance evaluation studies done
on deadlock handling techniques in different systems. In Chapter 3, we present the
details about design and implementation of the simulation framework for deadlock
in multicore systems; the primary contributions of this thesis are explained in this
chapter. Later in this chapter , we explain different deadlock handling strategies that
are implemented in our framework. We discuss various traces generated from the
simulation run and the calculation of performance metrics from these traces in Chapter
4. Results and experimentation are described in Chapter 5. Lastly, we conclude in
Chapter 6 with discussing possible future work that can be conducted to extend the
research carried out in this thesis.

13

Chapter 2

Background and Related Work

Research for t his t hesis began wit h a paper t hat my supervisor gave me, about a
deadlock detection algorit hm, called "Dreadlocks: Effi cient Deadlock Detection " [23],
developed by Eric Koskinen and Maurice Herlihy. This algorit hm to detect deadlocks in mult it hreaded programs gained our attent ion and we began to look for ot her
approaches t hat have been developed to deal wit h deadlocks in mult icore systems.
We explain Dreadlocks in detail in Chapter 3. Here, we discuss t he other interesting
approaches in t he literature.

2.1

Deadlock Handling Approaches in Multicore
Systems

As briefly discussed in Chapter 1, many different approaches have been develop ed to
handle deadlocks in mult icore systems. The approaches are classified in five categories
(i) Language-level approaches , (ii) Static Analysis, (iii) Dynamic Analysis, (iv) Model
Checking and (v) Type and annotation based techniques. They are explained in detail
as follows:

14

2 .1.1

Language-level A pp roaches

Language level approaches make concurrency cont rol easier by providing higher level
constructs t hat do not allow error-prone uses . This can be achieved by statically
binding shared variables to t he locks in order to protect t hem. Two different languagelevel approaches are discussed here.
First, we discuss a framework implemented in J ava which is presented by Bensalem
et al. [5]. This framework confirms deadlock potentials, detected by runt ime analysis
of a single run of a multit hreaded program. The multit hreaded program under examination is implemented to produce lock and unlock events. A trace is generated when
t he implemented program is executed. The trace consists of t he lock and unlock operations performed during t he specific run. An observer is constructed using each cycle
t hat can detect t he occurrence of t he corresponding real deadlock. Here, an observer
is a data structure t hat characterizes all possible interleavings t hat lead to a deadlock
state [5] . It also checks t he possibility of t he deadlock reoccurring during subsequent
test runs. In t his framework, a controller determines the optimal scheduling strategy
t hat will maximize t he probability for t he corresponding real deadlock to occur, when
composed wit h t he program.
The methodology of t he approach involves four phases: (i) By examining a single
execut ion t race, deadlock potentials in multit hreaded program and t he system under
test (SUT ) are automatically detected , (ii) Automatic generation of an observer for
each cycle in t he resulting lock graph , (iii) Implementation of t he system under test
in order to confirm deadlock potent ials and (iv) By performing multiple "controlled
" ".mcorrec t execu t·1011 t races " [5].
runs " search mg

15

Elaborating on these phases, the first step involves detecting deadlock potentials in
the SUT. Then, the instrumentation module automatically implements the bytecode
class files of the multithreaded program under test. This is done by adding new
instructions that, when executed, generate the execution trace consisting of lock and
unlock events. The observer module reads the event stream and performs the deadlock
analysis. While the lock and unlock events are observed, the implemented program
under observation is executed. Then a graph is constructed with the edges between
locks suggesting lock orders. Any cycle in the graph signals a deadlock. Once the
lock graph is generated , an observer for each cycle in the lock graph is generated
automatically. The observer observes the SUT. In the last step , execution of SUT is
controlled. This execution contains deadlock if it is "accepted" by the observer [5].
The second approach works for parallel programs that use message passing for
communication. One of the common problems in such programs is the detection of
deadlocks. Haque presented a deadlock detector, Message Passing Interface Deadlock
Detector (MPIDD) , to dynamically detect deadlocks in parallel programs that are
written using C++ and Message Passing Interface (MPI) [18]. There are two main
parts of the MPIDD system, the detector and the MPIDD wrappers.
The detector receives input from the client program about MPI commands being
used. Then by using this information, the detector constructs a state and determines
if there is a deadlock in the client programs. The MPI function call wrappers stimulate

the client program's original calls and provide the information to the detector. MPI's
profiling layer requires no significant modification of the user's code and incurs very
little overhead when invoked. This property of the MPI's profiling layer is used by the
detector. A wrapper containing information needed by the detector is created using
the MPI 's profiling layer. The actual function call is embedded inside this wrapper.

16

The new wrapped routines relay the information of what commands are being issued
to the deadlock detector program.
The key advantages of this tool are its portability, very low overhead and the use
of MPI's profiling layer that allows use of this tool without modification of the source
code. By including the file 'MPIDD .h' in the original source code, the client can
obtain all the functionality transparently [18]. Since MPI is a widely used library,
it is a useful approach. Although the main limitation of this tool is that it is MPI
specific, it can also be easily adoptable to other libraries.
Even though language level approach alleviates some of the complexity of concurrency, the problem is not completely eliminated. Also, to get any benefits from
this approach, all the code has to be written in a particular language. These limitations prevent programmers from using other languages that might better suit their
requirements.

2.1.2

Static Analysis

Static program analysis techniques do not require any execution of the application as
they directly work with source code. Here, we discuss three different static program

analysis techniques as follows:
The first technique is called RacerX [12]. It is a static tool that uses flow sensitive,
interprocedural analysis to detect deadlocks. This technique checks information such
as which locks protect which operations , which shared accesses are dangerous and
which code contexts are multithreaded. On higher level, checking a system with RacerX involves five phases [12] (i) Retargeting RacerX to system-specific locking func-

17

tion, (ii) Obtaining a control flow graph from the checked system, (iii) Running the
deadlock and race checkers over the obtained flow graph, (iv) Subsequent-processing
and ranking the results, and (v) Examining the results. Out of all these phases, the
first and the last phases are done by the user , RacerX does the middle three.
The second technique is an effective static deadlock detection algorithm for Java,
presented by Naik et al.

[27]. The key idea behind this is to show the complex

property of deadlock freedom for a pair of threads/locks in terms of six conditions:
reachable, aliasing, escaping, parallel, non-reentrant and non-guarded [27].
Effectively approximating these conditions requires precise call-graph and pointsto information. This algorithm uses the combined call-graph and may-alias analysis
for effective approximation. This combination of analysis is called k-object-sensitive
analysis. It reliably approximates the first four conditions using well-known static
analysis, a call-graph analysis, a may-alias analysis, a thread-escape analysis, and
a may-happen analysis respectively. In order to soundly approximate the last two
conditions , it requires a must-alias analysis which is much harder than a may-alias
analysis. This technique addresses this difficulty using an unsound solution of using
may-alias analysis to pose as a must-alias analysis. Hence, this algorithm failed to
report some real deadlocks [27].
All of the deadlock detection tools helped to find deadlock in concurrent programs.
The problem of detecting deadlock in libraries has not been investigated much. The
third approach, developed by Williams et al. helps with that [34]. This problem is
vital as library writers may wish to ensure their library is deadlock-free for any calling
pattern. This method checks if it is possible to deadlock the library by randomly
calling some set of its public methods. On detecting the possibility of deadlock, it
provides the name of the methods and variables involved. The deadlock detector in
18

this method utilizes an interprocedural dataflow analysis.
By using this method, it is possible to track possible sequences of lock acquisitions
inside a Java library. The analysis is flow-sensitive and context-sensitive. At each
program point, the analysis computes a symbolic state modeling execution state of
the library. At the end of a method, this symbolic state serves as a method summary.
The analysis is executed frequently over all methods until a fixed point is reached [34].
One of the limitations of static analysis approaches is that they examine all the
possible deadlocks and often give no false negatives, but they report many false positives. For example, the static deadlock detector developed by Williams et al. reports
100,000 deadlocks in Sun 's JDK 1.4, while only 7 are real deadlocks [34] .

2.1.3

Dynamic Analysis

Dynamic program analysis techniques use an execution of program to find all potential
deadlocks. They only visit feasible paths and have precise views of the values of
variables because they operate at runtime. Here, we discuss three different dynamic
analysis techniques as follows:
Agarwal and Stoller describe a runtime notion of potential deadlock in programs,
written in any languages that use synchronization mechanism like locks, semaphores
and condition variables [l]. They test the potential for deadlock by checking if there
is any feasible permutation of the execution results in deadlock. The viability of such
permutations is determined by ordering constraints amongst events in the execution.
Havelund proposed the GoodLock Algorithm that detects potential deadlocks involving two threads [19]. This was later generalized to have any number of threads by

19

Bensalem et al. [6] and Agarwal et al. [2]. The algorithm presented Agarwal et al.
[1] is extended in [2] to handle non block structured locking as well.
The second dynamic analysis technique we discuss is DEADLOCKFUZZER, developed by Joshi, Park and Naik [21]. This technique has two stages. In the first stage,
a multithreaded program is executed and observed to find potential deadlocks that
could happen in some executions of the program. This phase uses an informative and
a simple variant of the Goodlock algorithm, called informative Goodlock or iGoodlock
[19]. iGoodlock identifies potential deadlocks even if the observed execution does not
end in a deadlock state. It provides debugging information suitable for identifying
the cause of deadlock. The second stage uses debugging information provided by the
first stage to create a real deadlock with a high probability. In the second stage,
a scheduler is favored to generate an execution that creates a real deadlock that is
reported in the previous stage with high probability of occurrence. A limitation of
iGoodlock is that it can give false positives because it does not consider the happensbefore relationship between lock acquisition and release into account. As a result , the
user has to manually go through such potential deadlocks. This burden is removed
from users by the second stage of DEADLOCKFUZZER [21].
Qi et al. presented an efficient dynamic deadlock detection tool called MulticoreSDK [29].The algorithm works in two phases: In the first phase, a reduced lock

graph is constructed to identify program locations where lock operations may cause
deadlocks. This phase examines the execution trace and constructs a lock graph based
on the program locations of lock events. Once the lock graph is built, the algorithm
finds cycles in it to compute a program locations set comprising of deadlock cycles.
This set signifies program locations that may associate with deadlocks and is used in
the second phase to filter deadlock free lock events.

20

In the second phase, an analysis report from the first phase is used to determine
deadlocks in the lock graph with filtered locks. This phase examines the execution
trace again, and constructs another lock graph based on the lock id of lock events.
Particularly only lock events from the set derived from the first phase are recorded
in the second lock graph. Finally cycles in this lock graph are found to determine
potential deadlocks. By using two passes over the program trace and filtering out the
locks that are not involved in deadlock, MulticoreSDK efficiently removes multiple
lock nodes and edges which do not take part in deadlock formation [29]. Thus, it is
more scalable for large real-world applications and more efficient in terms of memory
utilization and time.
The downsides of dynamic deadlock detection techniques are that they suffer from
lack of scalability and performance problems due to t he large size of the lock graphs
of large industrial strength applications and cannot handle them. Moreover, they
can only find errors on a small number of execution paths as the number of feasible
paths grows exponentially with the size of code. This renders the use of dynamic
analysis impractical due to a lack of scalability. Another limitation of this approach
is that dynamic deadlock detection has a very high computational price which causes
runtime overhead. Hence, it is time consuming to run test cases, which makes this
approach impossible for the programs with strict timing requirements.

In theory,

dynamic deadlock detection can compute arbitrarily accurate information, but in
practice, they highly depend on the availability of computing resources.

21

2.1.4

Model Checking

Model checking is another way to find deadlocks in an application. Model checking
takes a simplified description of the code and uses techniques to explore vast state
spaces efficiently as it systematically tests the code on all inputs. Because of this, this
approach does not need to explore massive state spaces in hardware circuits in order
to find deadlocks. It analyses the correctness of concurrent reactive systems doing
verification by state-space exploration.
An extension of model checking, VeriSoft, is presented by Godefroid [15]. VeriSoft
directly deals with implementations of communication protocols written in programming languages such as C or C++. Thus it extends the state space exploration from
modeling languages to programming languages. It is a tool to systematically explore
the state space of systems composed of several concurrent processes executing arbitrary code written in full-fledged programming languages such as C or C++.
The algorithm can be used for detecting deadlocks and assertion violations without
incurring the risk of any incompleteness in the verification results for finite acyclic
state spaces. In addition to that, VeriSoft also checks for divergence and livelocks. A
divergence occurs when a process does not attempt to execute any visible operation

for more than a specified amount of time. A livelock occurs when a process has no
enabled transition during a sequence of more than a specified number of successive
global states. In practice, the algorithm can be used for systematically and efficiently
testing the correctness of any concurrent system with or without acyclic state space
[15].
Unfortunately, model checking fails to scale for large systems as it requires sig-

22

nificant effort both to specify the system, and in scaling it down enough to execute
in the model checking environment. Due to the huge size of current Operating Systems, model checking an entire system of the size of an Operating System is still not
possible. Hence, model checking techniques are only restricted to the verification of
properties of models.

2.1.5

Type and Annotation Based Techniques

Type and annotation based techniques help to avoid deadlocks during coding by
allowing programmers to specify a partial order among locks . Also the type checker
statically ensures that the partial order among locks is maintained and well-typed
programs are deadlock-free.
Boyapati, Lee and Rinard presented an ownership type system for multithreaded
programs that allows programmers to specify a partial order among locks [7]. It
allows them to partition the locks into a fixed number of equivalence classes and
specify a partial order among equivalence classes. The type checker then statically
ensures that whenever more than one lock is held by a thread, the thread acquires the
lock in descending order. Well-typed programs in this system are guaranteed to be
deadlock-free. The type system also allows programmers to use recursive tree-based
data structures to describe the partial order.
This system allows changes to partial order through mutation to data structure at
runtime. The basic idea for this system is that the programmers keep locking discipline
in mind while writing multithreaded programs. It allows programmers to declare this
locking discipline in the form of type declarations in their programs. It also statically
verifies consistency of a program with its type declaration. The prototype of this

23

system is implemented in Java including threads, arrays, constructors, static fields,
dynamic class loading, runtime downcasts , exceptions and interfaces.
Some limitations of this approach are that it imposes the burden of annotation on
programmers and scaling it to larger systems is also not feasible.

2.2

Performance Evaluation Studies of Deadlock
Handling Algorithms ·

Deadlock is one of the key issues faced in many different fields like routing, databases,
operating systems, manufacturing systems and others. Most of the research carried
out in this field focuses on solving the problem of deadlock. Some of the proposed
solutions have also undergone comparative performance evaluations but there has not
been any research done on comprehensive performance comparisons of those solutions.
Studies [8, 24, 25 , 13] and [35] , entail comparative performance analyses. Among these,
[25] and [35] talk about the detection of deadlock in distributed systems, [8] and [24]
provide a performance analysis for the solutions of deadlock detection techniques in
Distributed Database Systems and [13] details a comparative performance analysis of
a deadlock avoidance control algorithm for Flexible Manufacturing Systems.
A probabilistic performance analysis of a deadlock detection algorithm in distributed systems is presented by Lee and Kim [25]. The distributed deadlock detection algorithm declares deadlock upon finding back edges in a distributed search tree
constructed by the propagation of probes. The tree is built as follows: The initiator
of the algorithm becomes the root of a tree. Then it sends out probes, called ASK , to
all of its successors at once. If a node receives the probe for the first time, it becomes
24

the child of the sender of the probe. The probe is then further propagated until it
reaches an executing node or a tree node that has already received a probe. Each tree
is assigned a unique path string to represent the level of the node in the tree and to
distinguish one branch from another in the tree. With the help of path strings, not
only back edges are identified, but other types of edges are also found , such as cross
and forward edges. The ASK probe carries the candidate victim identifier which has
the lowest priority among those nodes visited. If the candidate victim is inside the
detected deadlock cycle, it will receive an abort message [25].
The performance of this algorithm is evaluated on measures such as deadlock
duration, number of algorithms initiated, and mean waiting time of a blocked process.
Deadlock duration is the elapsed time until a deadlock is detected after it is formed.

For distributed systems, a detection algorithm can be started and executed by several
sites simultaneously. That may degrade performance of the system by generating
too many messages. So the expected number of algorithm initiations throughout the
system is also an important measure.
Bukhres compared two distributed deadlock detection algorithms for distributed
database systems [8]. The algorithms are from two different categories, one is centralized and the other is distributed. Centralized algorithms maintain the wait-for-graph
(WFG) at the control site. Control site is designated to perform a deadlock detection
activity by gathering the relevant information from all the other sites. At this site
the graph is updated and the search for the cycle involving deadlock is carried out.
In distributed algorithms, the WFG is maintained at a different site in the system.
A cycle may be composed of different transactions at different sites, and several sites
participate in deadlock detection.
Centralized algorithms have poor reliability compared to distributed algorithms
25

because the whole system fails if the control site fails in the centralized system. However, distributed algorithms are difficult to design and implement as detection of
deadlock can be initiated from any site. Simulation studies presented by Bukhres
show that the centralized algorithms perform a little better than the distributed algorithms under a lightly-loaded condition. Whereas under a heavily-loaded condition,
the distributed algorithms perform better than the centralized algorithms [8]. The
measures taken in to consideration to compare performances are throughput, restart
rate, deadlock life time and the effect of request pattern against throughput and number of messages. The conclusion for this study is that the choice of the best deadlock
detection algorithm is dependent upon the operating region of the system.
The problem of evaluating and comparing the performance of deadlock avoidance
control policies applied to Flexible Manufacturing Systems (FMS) is addressed by
Ferrarini [13]. The problem is discussed with both timed and untimed models. For
both models the problem is considered with and without deadlock avoidance control policies. Some of the key metrics measured are deadlock percentage, frequency
of occurrence, deadlock rejection percentage, unreachable stage percentage, product
survival rate and blocking capacity [13].
From the above discussion , we observe that there is no simulation tool available to
study and compare performance of different deadlock handling approaches in multicore
systems. Moreover, the studies mentioned above are very old and not available to
further extend their work on. Hence, we believe that our simulation testbed will
be really useful to study and compare performance of different deadlock handling
approaches in multicore systems.

26

2.3

Summary

In this chapter, various approaches that have been developed to handle deadlock in
multicore systems were discussed. Then, we presented a brief survey on different
performance evaluation studies done on deadlock in distributed systems, distributed
database systems, and a flexible manufacturing.

In the next chapter , we discuss

implementation detail of simulation testbed to study deadlock handling approaches
in multicore systems.

27

Chapter 3
Simulation Framework for Deadlock
in Multicore Systems

This chapter describes the current implementation of simulation framework for deadlock handling algorithms in multicore systems. The implementation is done by using
Discrete Event Simulation (DES). Before describing the actual framework , DES is
briefly explained.

3.1

Simulation

A computer simulation is a technique that attempts to model and observe the behavior
of an abstract module of a particular system. A simulator is a computer program to
transform the states of a system in discrete time points over a specific period of time.
Simulation is a very effective tool to study the dynamic behavior of complex systems.
Computer Simulations can be classified in two types , either discrete or continuous ,
based on how the system state is modeled and simulated. In the continuous simulation, the state variables of the system change continuously over time while in discrete
simulation the state variables change only at discrete times. Based on the advancement of simulation time and the updates of the system state, discrete simulation can
28

be further divided into two types: time-stepped and event-driven. As names suggest,
in time-stepped simulation the system state is updated at every time step while in
event-driven simulation, the states of the system are updated at the occurrence of
events.
Ozgun and Barlas presented a comparison between approaches of discrete event
simulation and continuous simulation on a simple queuing system [28]. Discrete event
simulation comprises an event list, a simulation clock and an event scheduler. The
simulator maintains a queue of events sorted according to the occurrence of their
simulation time. The simulation time is advanced to the time of occurrence of the
next event in the event list by a Simulation clock. The system states are changed by
each execution of events, which is done by an event scheduler.
For example, in simulating the behavior of university students, the number of
students enrolled and the number of students graduated are state variables and they
will be updated on the occurrence of the events in the system. The number of students
enrolled will be updated at the start of 8rach semester and the number of students
graduated will be updated when the students graduate at the end of academic year.
Simulation ends at the time when the event list is empty or simulation time is up. The
advancement of a simulation time can be the same, faster or slower than real-time as
it is not important to execute simulation in real-time. For example, in the university
example, the simulation does not have to wait for an entire year length and it can be
faster than real-time. Using the example of a driving test, simulation can be exhibited
in real-time speed. On the other hand, protein synthesis in a cell can be simulated
slower than real-time.
We mainly use discrete event simulation to implement different components of our
framework. Next, we move onto a discussion of the building processes of our frame-

29

work. As we expanded the multicore scheduling framework , developed by Manickam
under the guidance of Aravind, to simulate and study deadlocks. We first explain the
architecture of the Multicore Scheduler System framework.

3.2

Architecture of Multicore Scheduler System
Framework

To observe deadlock in multicore systems, first , we need a framework that simulates
the behavior of multicore systems. For that purpose, we are using an architecture developed by Aravind and Manickam [3]. We first briefly explain the architecture of the
multicore scheduling simulation (MSS) framework. Later, we explain the expansion
to this framework , that is , what are the added components and how they are inserted
into MSS. Although the MSS framework consists of different scheduling algorithms,
we are using only Linux Completely Fair Scheduling in our system.

30

Workload
Generator

Scheduler
Chip 1

~ - - - - _ _ _ _ . ~ . . _··,,.,.... .,,,.,

Machine

,.~·--

- - - Cl:\lp 2 - - - - ...
y

Chip n

- -........
- -- - - - - ,

~E:>
~~

C3E:>
Performance
Calculation Engine

Execution
Trace

Activity Profile
Generator

Figure 3. 1: Architecture of MSS Framework
Source: A Flexible Simulation Framework for Multicore Schedulers [3]

The higher level architecture of MSS is shown in Figure 3.1. There are five key
components in it: workload generator, machine, scheduler, execution trace and performance calculation engine. We explain t hem next .

3.2.1

Workload Generator

The workload generator is implemented to generate t hreads. We have to give t he
number of t hreads, t hread 's mean arrival rate, and thread 's mean service rate as
input to generate a workload. In t he out put, each thread will have a unique id,
31

arrival time, execution time, I/0 points, priority and application class. The times
when a thread will go for I/Os are determined by its I/0 points. For our system, we
added three scenarios to workload generator: (i) General scenario , (ii) Scenario for
Dining Philosopher's problem, and (iii) Scenario for t he Banker's algorithm. We also
added more numerical inputs for resources and instances per resource.

3.2.2

Machine

All the computations are done in the machine which is also called a multicore machine. The machine has a hierarchical structure with the machine at the highest level,
followed by chips and cores. One or more chips can fit on a machine and two or more
cores can be in each chip. In the simulation context, each core is capable of executing
one thread at a time. Cache memory is also an important part of machine, which is
also structured in a hierarchical manner. Current multicore system have three - 11,
12 and 13 levels of cache memories. 11 cache is a core level local cache, 12 cache is
shared among the cores in a chip and all the cores in the machine share the 13 cache.

3.2.3

Scheduler

The implemented framework for a scheduler has two tasks: maintaining the load
among the cores and multitasking threads in each core. According to their functions ,
the first task is referred to as load balancing and the second task as load scheduling,
hence the components in t he implementation are called load balancer and load scheduler. The load balancer is responsible for dispatching the new jobs to the appropriate
local scheduler, and for migrating jobs from one to another local scheduler. The task
of local scheduler involves scheduling jobs to the cores for execution. The simulation
32

gives the local scheduler centralized control. From the simulation point of view, the
tasks of the local scheduler are to make a decision to choose a job for execution, to
determine the amount of execution time, to calculate the progress rate and to produce
the trace. The progress rate of is a crucial design factor as it can also affect the accuracy of execution. The progress rate of a job is dependent on factors such as execution
speed of the core and the contention for shared resources. The implementation uses a
simple cache contention model which can easily be replaced with the implementation
of a more refined model.

3.2.4

Execution Trace and I/0 Trace

During the simulation, two types of traces are recorded: the execution trace and
the I/ 0 trace. In order to generate an activity profile and to calculate performance
metrics , the execution trace is collected at every context switch. The format of the
execution trace is a quintuplet: <core id, thread id, execution start time, scheduling
end time, status>. The status can have values between 0-3 where status is O if the
thread is preempted by quanta expiration, 1 if by a thread with a higher priority, 2 if
the thread is going for 1/0 and 3 if completed. The format of 1/0 thread is a triple:
<thread id, I/ 0 start time, I/ 0 end time>. The performance calculation engine calculates the values of criteria such as response time, fairness and utilization of resources
for a given set of data, and the result can be passed to the performance observation
window. The users can then study results from the performance observation window.

33

3.2.5

P erformance Calculation Engine

The performance study mainly focuses on determining how well the algorithm attempts to fulfil criteria such as response time fairness and utilization of resources.
Users can study algorithms based on these criteria, calculated and passed to the
performance observation windows by the performance calculation engine. The performance criteria involve wide range of metrics such as interactive response time, CP U
utilization, throughput, turnaround time , waiting time and response time.
Next , we will discuss the actual simulation and architecture aspects of Deadlock
in Multicore Systems.

3.3

Simulation of Deadlock in Multicore Systems

The basic deadlock environment involves processes and resources in it. Resources can
have one or more instances and processes can request up to a maximum number of
instances of resources available. If the requested resources are available, they could
be granted access to the process that requested them otherwise that process can wait
until the resources are available, meanwhile holding on to current resources.
The simulation of deadlock in multicore systems has three main components: entities, events, and states, in addition to the components of the Multicore Scheduler
Simulation framework (an event list, a simulation clock and an event scheduler) described in the previous sections. The event scheduler manages all of the events from
an event list according to the simulation clock. Each event from an event list occurs
at a particular instance of time and changes the state of the entity. Here we consider
34

a system as an entity as well. For our system, entities, states, and events are shown
in Table 3.1. We define these terminologies first, and then we discuss the simulation
aspect of the system.

Process
Resource
System

Waiting
Running
Blocked
Completed
Acquired
Free
Deadlocked
Deadlock Free

Request
Acquire
Release
Going for I/ 0

Table 3.1: Entities, States and Events

• Processes: Processes are the main entities in the system. A process requires
resources while executing in its critical section. A critical section is a part of the
executing process which uses shared resources exclusively. A process can have
different states such as running, waiting, blocked or completed. Due to their
concurrent access to finite resources, deadlock could occur in the system.

• Resources: Resource can be virtual entities such as memory, data structures
and CPU time or physical entities such as printers, scanners and other I/ 0
devices. A resource can have one or more units of the same type, which are
called instances of a resource. Different instances of a resource can be shared
among different processes but an instance of a resource can be held by only one
process at a time. A resource can be either free or held by a process.

• System: A system consist of all the processes, resources and their instances
and techniques to handle deadlock.
35

• Deadlock[4] : "Deadlock is a sit uation in a resource allocation system in which
two or more processes are in a simultaneous wait state, each one wait ing for one
of t he others to release a resource before it can proceed.'

The main events are request, acquire and release of resources by processes. These
events change states of t he system as eit her deadlocked or deadlock free. For processes t he state changes affiliated wit h t hese events are waiting, running, blocked or
completed. If a process acquires all t he requested resources t hen it is in t he running
state, if t he process can not acquire all t he requested resources due to unavailability
of t hem, t hen t he process is in a waiting state, holding its current resources. Though
1/ 0 is a physical type of resources t hat can be accessed by processes, we consider 1/ 0
as a special case of resources, hence events associated wit h processes accessing 1/ 0
are: going for and coming back to a waiting state from 1/ 0 . When a process goes
to 1/ 0 t he state is changed to blocked and when all t he requests for a process are
satisfi ed, its state is changed to completed.
The simulation of t he system basically includes updating t he state of the ent it ies
at every simulation point as well as increasing t he simulation clock. It involves various
interactions between t hese entities and occurrence t he of deadlocks due to t hat. Wit h
t he help of t hese components , various scenarios and solutions of deadlock can be
simulated using DES. The simulation records t he execut ions as a simulation t race,
and t hey are recorded at every occurrence of an event. Wit h t his background we
furt her int roduce t he architecture of deadlock in multicore system.

36

3. 4

Architecture of Simulation Framework for
Deadlock in Multicore Systems

To simulate a basic deadlock prone environment, we need resources, processes that
can request these resources and a system (Resource Manager) to manage the resources
for the requests made by processes. From the point of view of the system, processes
communicate with each other through shared memory using read and write operations
in a multicore system.

During execution, processes require resources. A process

requests resources, acquire them if granted , access and then release after using them.
As explained above, the MSS Framework has threads, machine and scheduler. The
functions of processes and threads are the same, so hereafter we use the term "process"
in general.
In order to simulate deadlock on top of the MSS framework , we added extra
components such as resources, a Resource Manager, and the ability for the processes
to request resources as well as to communicate with each other. Alongside these
additional components, architectural level changes are also required for some existing
components to fulfil the criteria of simulating deadlock. Later, in this chapter, we
explain in detail the components related to the implementation of deadlock scenarios

and viable handling techniques.
The higher level architecture of the Multicore Scheduler Simulation (MSS) framework has five main logical components as shown in Figure 3.1: workload generator,
multicore machine, multicore scheduler, execution trace and performance calculation
engine. Out of those, the core simulation engine involves three basic components as
shown in Figure 3.2: workload generator, multicore scheduler and multicore machine.

37

Workload
Generator
Machine

Pl
P2

•
I

I
I
I
I
I
I
I
I
Pi
I
I
I , ______ I I

Scheduler

Chip 1

Chip2

Chip n

Cort 1

Cort l

Cort 1

Cort 2

Ccrt2

Cort 2

Cor, 3

Cor, 3

Coff 3

Figure 3.2 : Core Architecture of MSS Framework

As discussed above, we need resources t hat can be accessed by processes t o simulat e
a system compatible for deadlock. In order to manage t hese resources, we need a
module, what we refer to as t he Resource Manager (RM). The above mentioned
system can be port rayed as shown in Figure 3.3. A det ailed explanation of RM and
its implementation details are as follows:

Resource Manager: For t he smooth and fl awless execut ion of system , multiple
requests by processes to access different resources have to be managed well . Improper management of resources may lead t he syst em to an unwanted situation
called deadlock. To prevent t he syst em fr om deadlock or t o avoid this situation
of deadlock in t he syst em , grant ing resources to mult iple processes according to
t heir requests and availability of resources is managed by a Resource Manager
(RM).
38

D
Rl

(

~

-

---.

Pl
P2

•

,?~~

lJf.'s
. '-.
t

a.

/

Req
, ::_est

~

I
I

Resource

l

Deadlock
handler

I

\\
.,
.... •
ve

Q;,.e,O:

Manager

I

,

___.,,,

Pi

\

I
I

J

·[:l
Rj

Figure 3.3: Basic System Requirements for Deadlock

The notations of Figure 3.3 are same as Figure L 1 wit h additional notation of
small, dark rectangles inside resources that represent instances of t hat resource.
The basic function of RM is to manage resources - grant resources to t he processes
based on availability of resources and according t o t he requests they make in a way
t hat t he system does not go into a deadlock state. If RM functions properly, deadlock can be avoided but improper functioning of RM can lead t he system towards a
deadlock sit uation. The implementation of RM depends on what strategy needs t o
be implemented to handle deadlock. For deadlock prevention and deadlock avoidance, the logic to handle deadlock can be incorporated into RM. On the other hand,
deadlock detection and resolution may require a separate implementation than RM
39

to handle deadlock.

l
J
/ .
Rl

Workload
Generator
Machine
P1 ,

P2

l

I

Chip

Chip

Cae

Core

(o•e

Scheduler

Core

COJe

\/l:J

,--..,

I

Q

Resoorce
Manager

\

I

I

Deadlock
handler

-

J

Rj

Pi

, _____ I
,;

Figure 3.4: Architecture of Simulation Framework for Deadlock in Multicore Systems

The resultant architecture after incorporation of resources and Resource Manager
to the core MSS framework of Figure 3.2 is shown in Figure 3.4., which has components
from both Figure 3.2 and Figure 3.3. Now, we discuss the additional and modified
components of the architecture of Deadlock in Multicore Systems in detail.

3.4.1

Workload Generator

A workload generator is mainly used to generate processes and resources while setting parameters that are necessary for simulation. We can generate three types of

40

simulation scenarios using the workload generator: (i) General scenario (ii) Scenario
for Dining Philosopher 's problem (iii) Scenario for Banker 's algorithm. Based on the
selected scenario, criteria such as limit ing resource counts t o equal as process counts,
instance count per resource, and runtime are set for the given workload . Requests
pattern by processes also depends on t he scenario type. These scenarios are simulat ed
on Linux Completely Fair Scheduler.

• General Scenario: To generate a general scenario workload , required input
paramet ers are: t he number of processes, t he number of resources, instances
per resource type, mean arrival rate and t ask period range. Also, there are no
restrictions on input values of any parameter. The generat ed workload will have
a set of processes and resources . Each process will have a unique id, arrival time,
execut ion t ime, 1/ 0 points, priority, application class, request counts and other
parameters required for simulation. Likewise, each resource will have a unique
id , inst ance count and various other parameters. All the paramet ers that do not
require manual input are generated randomly, set t ing an upper bound on them
through the workload generator.

• Scenario for the Dining Philosopher's problem: This scenario implements
t he Dining Philosopher 's problem , which can be considered a special case of
general scenario. Dining philosopher 's problem was introduced by Dijkstra [10].
Five philosophers [Pl ,... ,P 5] sitting around a table as shown in Figure 3.5. The
table has a large serving bowl of spaghetti, one small bowl for each philosopher
and five forks. A philosopher wishing to eat can t ake and eat some only if
he or she has two forks on eit her side of t he bowl. The problem st ates that ,
considering the above scenario, devise a solut ion that will allow the philosophers
to eat by satisfying mutual exclusion while avoiding deadlo ck and st arvation.

41

I

•

•

Figure 3.5: Dining Philosopher 's Scenario

Here, we limit ourselves only until simulating t he scenario and not t he solution.
Any solution that is implemented for a general scenario can also be used to solve
the dining philosopher 's problem as it is a special case of general scenario. To
simulate this, we put a limit on the number of resources, that is, the number
of resources have to be same as the number of processes. Also, we limit the
instances per resource type to one. We also have to input runtime beforehand
to specify how long the simulation should run . Further requirements to simulate
a scenario for the Dining Philosopher 's problem include limiting each process to
request only its neighbor resources and limiting the maximum count for requests
per process to two.

• Scenario for Banker's Algorithm: This scenario requires the same input
42

parameters as t he general scenario but parameters such as maximum need for
each resource are generated differently than a general scenario to satisfy the
requirements of the Banker 's algorithm . A Banker 's algorit hm requires advanced
information about processes to execute such as the maximum claimed instances
of resources by each process. This information is generated with the workload.
The generated workload will have a set of processes and resources. Each process
will have the usual parameters set such as a unique id , arrival time, execution
time, 1/ 0 points, priority, application class , request counts and other parameters
required for simulation, with additional parameters such as maximum claim and
need . The resources have t he same parameters generated as a general scenario
with t he workload.

There is one more a notable difference between all the scenarios in execution. The
general scenario and t he scenario for t he Banker 's algorit hm run until all the processes
finish their executions. That is, all of the requests are satisfied for all of the processes.
For the Dining Philosopher 's problem, execut ion runs until t he runtime is reached.

3.4.2

Scheduler and Machine

The implementation of a scheduler is not primary goal for our system. Hence we have
used Linux Completely Fair Scheduler, implemented by Manickam V. [3]. Simulation
of a machine simply refl ects the simulation of t he multicore system in it, t herefore we
have used that component with our system .

43

3.4.3

Resource Manager

As explained earlier , m anagement of resources can b e done in this part of the system
so we call it Resource Manager (as shown in Figure 3.4).

Solution strategies for

deadlocks can also be implemented here. There are three types of solutions to a
deadlock problem: deadlock prevention , deadlock avoidance or deadlock detection
and resolution. Deadlock prevention works by preventing one of the four Coffman
conditions [9] from occurring in the system , as discussed in Chapter 1.

D eadlock

avoidance makes sure that for every resource request, the system does not enter into

an unsafe state and grants the requests that will lead to safe states.

An unsafe

state is a state that could result in deadlo ck. Deadlock detection and recovery allows
deadlocks to occur, on occurrence of deadlock the state of the system is examined
and the deadlo ck is resolved either by termination of one or more processes or by
preempting resources.
We have implemented a deadlock avoidance technique, a deadlock detection technique, and deadlock recovery techniques in our system . The avoidance technique is
the Banker 's algorithm and the implem ented det ection technique is Dreadlocks [23],
whereas we use different criteria to select and terminate one or more processes involved
in deadlock.

3.5

Algorithms Implemented

In this section, we explain the Banker 's algorithm and Dreadlocks in detail. The
banker 's algorithm is deadlock detection technique and Dreadlocks is used for deadlock
detections.

44

3.5 .1

D eadlock Avoidance : Banker's Algorithm

The Banker's deadlock avoidance algorit hm was developed by Dijkstra in mid-1 960s
[11]. This algorit hm requires all processes to declare t heir maximum requirements for
all t he resources at t he start of execut ion. A process need not request all of its required
resources at t he beginning as t he requests are made sequentially. Once a process
requests t he maximum declared resources, it will acquire t hem for a finite t ime and
t hen release t hem. Now, t hese released resources are also available for allocation to
other processes. At every new request, t he algorit hm checks whether t he system stays
in a safe state or not . A safe state of a system is when all t he processes can complete
t heir execution wit hout forming a deadlock [17] . It is determined by simulating t he
allocation of maximum possible instances of all resources to a process and t hen by
finding a sequence of such processes t hat can finish t heir execut ion wit hout forming
a deadlock. A request is granted if on allowing t he request, t he system stays in a safe
state, otherwise t he request is denied. Furt her , t he system also determines a dynamic
order of processes to execute in order to avoid deadlock.

45

Table 3.2: System in a Safe State

(b) Alloc[ i, j]

(a) Max[i,j]

Rl

R2

R3

PO

0

1

0

2

Pl

3

0

2

0

2

P2

3

0

2

2

2

2

P3

2

1

1

4

3

3

P4

0

0

2

Rl

R2

R3

PO

7

5

3

Pl

3

2

P2

9

P3
P4

(c) A vail [j]

Rl

R2

R3

2

3

0

Now we explain the Banker 's algorithm with an example. Consider a resource
allocation system with five processes PO , Pl , P2 , P3 and P4 and three resources
Rl , R2 and R3. The maximum need for resources j for processes i is represented by

Max[i,j]. Alloc[i,j] represents the current allocation of resource j to process i and
Avail[j] represents currently available instances of resource j. The current safe state
of the system is shown in Table 3.2 for (a) M ax[i, j], (b) Alloc[i, jJ and (c) Avail[j].
The system is in safe state because all the processes can finish their execution in order
of Pl , P3 , P4 , PO and P2. Once completed , Pl can release the acquired resources.
These resources are then added to the available resources, which in turn can be used
by process P3 . In the same way processes P4, PO and P2 can finish their executions.

46

Table 3.3: System in An Unsafe State

(b) A lloc[i, j]

(a) Max[i , j]

Rl R2 R3

Rl

R2

R3

PO

7

5

3

PO

0

3

0

Pl

3

2

2

Pl

3

0

2

P2

9

0

2

P2

3

0

2

P3

2

2

2

P3

2

1

1

P4

4

3

3

P4

0

0

2

(c) A vai l [j ]
Rl

R2

R3

2

1

0

Now, suppose t he system grants request (0, 2, 0) of process PO . The resultant
st at e of t he syst em is shown in Table 3.3. This is an unsafe st at e as t here is no
such sequence available for processes t hat can finish t heir execut ions. In t his case t he
request (0, 2, 0) of Pl is denied by t he Banker 's Algorit hm .
The downside of t he Banker 's algorit hm is t hat it is quite expensive as it may take
O (n 2 m) time for each execut ion , where n is number of processes and m is number
of resources [17]. It also needs some information in advance such as t he maximum
number of required resources. In most of t he current systems, t his information is not
available.

47

3.5 .2

D eadlock D et ection: Dreadlocks

Dreadlocks is a deadlock detection t echnique for shared memory multiprocessors [23].
In Dreadlocks, each process maintains a digest of t he waits-for graphs.

A digest

of P rocess P , denoted TJp , is t he set of other processes upon which P is wait ing,
directly or indirectly. If a process is not t rying to acquire any resources t hen t he
digest of t hat process contains only itself. If a process is t rying to acquire a resource,
t hen t he digest of t hat process includes itself as well as t he digest of t he processes
t hat are current ly holding t he requested resources. Changes to t his digest of waitsfor graphs are propagated as processes acquire and release resources, and a process
detects deadlock when it finds own appearance t he digest. Due to t his approach , a
probing mechanism and t he maintenance of explicit waits-for graphs can be avoided.
To explain t his wit h an example, consider Figure 3.6.

,'

,'

,,

,

,,

I

,,

I

I

I

I

Figure 3.6: Example of Dreadlocks

The not ations for Figure 3.6 are t he same as Figure 3.3 and Figure 1.1 , wit h addition of a dashed arrow, which indicates t hat a process is t rying to acquire a resource.
Process Pl is requesting for resource Rl , held by process P3, that is requesting re-

48

sources R3 which is held by process P4. Also, process P2 is requesting resource R2 ,
held by process P3 that is requesting resources R3 which is held by process P4. In
this case, digests for these processes are as follows:

V PI = {Pl ,P3,P4}
V p2 =

{P2,P3,P4}

V p3 =

{P3,P4}

V p4 =

{P4}

In the next state, the dotted line, from process P4 to resource R2 , indicates t hat

P4 is trying to acquire R2. But according to the algorithm, P4 finds itself in the
digest of P3 which is current ly holding R2 , detects a deadlo ck and aborts. Once
a deadlock is detected , the next step is to resolve the deadlock. We used process
termination techniques to resolve the deadlock situation . This procedure is explained
next .

3.5.3

Deadlock Recovery: Process Termination

Deadlock recovery can be achieved by aborting one or more processes involved in the
deadlock. To select these processes, we have certain criteria described as follows:

49

Process Termination Criteria

Once the existence of the deadlock is detected by Dreadlocks, victim selection is
required to select one or more processes to terminate. We select either of the two
processes - the process that is requesting resources or t he process whose digest contains
the requesting process, to be terminated in order to resolve deadlock. The criteria to
select which process to terminate are as follows:

Random: The selection criteria 1s random.

Any process can selected to be

aborted randomly.
11

First In First Out: The process which entered the system the first is selected
to be terminated.

iii Least Recently Used: The process that has utilized t he system the last is selected
to be terminated.
lV

Minimum Runtime: The process that has spent the least t ime in the system is
selected to be aborted.

v Maximum R untime: The process that has spent the most time in t he system is
selected to be aborted .
v1 Smallest Digest: The process with the smallest digest is selected for termination.
vii Largest Digest: The process with the largest digest is selected for termination.

User can enable or disable option for the algorithm to detect the deadlo ck. Process
termination option is available only if deadlock detection is enabled for the particular
simulation run. If users select scenario for t he Banker 's algorit hm then the detection
50

and termination opt ions are not made visible.

ext , we show how to operate t he

system by showing and explaining t he user interface of it.

3. 6

User Interface

Our simulat or has two main windows: a P erformance P aramet er Setting Window and
a Performance Observation Window as shown in Figure 3.7.

I

Perfom1¥1te Paramettar Seltng Window'

/

Figure 3. 7: Main Window of t he Simulator

A user can click on eit her t he 'P erformance P arameter Setting Window' or 'P erform ance Observation Window '. Each of t he windows has different panels in t hem , to
set different parameters for simulation. We now explain t hese windows.
51

Workload generator
Deadlock Setoarlo Type

1ceneral 0 Hcloc:k 0is11.

•I

Worldoads

Number of process

Number or RUOUl"C8

Add to Load »

l

Number of Instances per rasource

Arrival time distribution

Aniv;;u rate

Task period range

--,

""

Figure 3.8: Workload Paramet er Setting Window

3.6.1

P erformance Parameter Setting Window

The Performance P arameter Setting Window has t hree panels in it. One panel is used
to input paramet ers t hat are required to generat e workloads. The second panel is used
to set configuration of multicore machine and deadlock detection algorit hm parameters . The t hird panel is used to start a simulation after all t he required parameters
are set.
The workload generator panel shown in Figure 3.8 takes t he scenario type, t he
number of processes , number of resources, instances per resource type, arrival type,
arrival rate and t he task period range as input . Users can also create more t han one
workload at t he same time. The input parameters depend on t he selected scenario
type, for example, in a general deadlock scenario and a scenario involving t he Banker 's
52

Performance Parameter Sdlng Window

Configuration

NumMrolcores

Sch~ulln1Pollcy

4

=w~°""~'·"'~-~

WoridOld

3~·

Process Termination T)'J)e ._
Ran_oo_m_ _~

~ Enable Drudlock

I

l._
uo_wc_2.6_2._6 _ _ _

Select All WorklOlds
Add to simulation run .&.

Tetminaon Type

WorkloadO

Random

Figure 3.9: Configuration P aramet er Setting Window

algorit hm, users have t o enter a value for inst ances per resource count, and t here is no
restriction for t he number of processes or resources . On t he other hand , for a scenario
involving t he Dining Philosopher 's problem, users have to enter runt ime instead of
instances per resource count. Also, t he number of processes and resources have to
be t he same to satisfy t he scenario criteria.

After entering t he above ment ioned

parameters for each workload, users have to click on 'Add to Load ' button . The
added workload can be seen on t he right side of t he panel. If more t han one workload
is added , t he addit ional workload is appended to t he existing list of workloads. After
all t he workloads are added , by clicking on t he ' Next ' button at t he bottom of t he
panel, users can go to t he configuration panel.
The configuration panel in t he P erformance P arameter Setting Window is shown
in Figure 3.9. This panel is used to configure t he mult icore machine and deadlock

53

[

Pefformanc:1 Patameler Setting Window

Simulating ...

Slart Slmulation

Slmualtlon Run Comp11t1d Plnse dick on "Pe1fomanu Obstrvdon Window' to obuM lh1 slmulation performanct

Figure 3. 10: Simulation Run Window

algorit hm to create simulation runs. Several simulation runs can be configured using
t his panel. The parameters required to configure the machine include setting the
number of cores for each workload. To set parameters for the deadlock algorithm, users
can either enable or disable Dreadlocks' deadlock detection using different workloads.
Furthermore, if Dreadlocks is enabled , users can set a criteria to select processes for
termination from t hose involved in t he deadlock. These options are not visible if users
have selected the Banker 's algorithm scenario from the Workload Generator window.
After entering all t he parameters , by clicking t he 'Add to simulation run ' button, the
simulation run is added using the selected workload. If t he 'Select All Workloads'
option is enabled , mult iple simulation runs are added using all the workloads. Once
all the simulation runs are added , pressing t he ' ext ' button at t he bottom of the
panel will bring t he next panel to start the simulation .

54

The t hird panel is shown in Figure 3.10. By clicking on t he 'Start Simulation '
button , users can start simulation for all t he simulation runs t hat are added in t he
previous panel.

3.6.2

Performance Observation Window

Using t he Performance Observation Window, as shown in Figure 3.11 , various perform ance metrics can be represented in charts. To select a performance metric, the
Chart Type drop-down menu can be used.

Pe1tormant1 Obsenoat1D11 WindoW

Charts
Choose Chart typt
Arwyse JM!rformance

•mist

Arrival Rate

Show

Figure 3.11 : Performance Observation Window

Also, t he results can be analyzed wit h respect to t he arrival rate or number of cores,
55

depending on t he different parameters selected while creating workloads or configuring
simulation runs. After selecting t he appropriate choices, pressing the 'Show' button
will display a results chart in the chart display area.

3.7

Summary

First , we explained the core components of the framework for simulation of the multicore scheduling algorithms, developed by Manickam and Aravind. We, then presented
an expanded MSS fr amework for t he simulation of deadlock handling algorithms in
mult icore systems. The fr amework is flexible and compet ent to incorporate different
deadlock handling algorit hms for multicore systems in it . In the next chapter, we discuss traces generated during execution and the computation of performance metrics
fr om traces.

56

Chapter 4
Execution Trace and Performance
Evaluation

The previous chapter explained the entire procedure from operating t he framework
for deadlock in multicore systems to starting the simulation. During the simulation,
we record different types of traces for t he purposes of performance evaluation. Here,
we discuss traces and performance evaluation in detail.

4 .1

Traces

There are four types of traces in our system. They are execution trace, I/ 0 trace,
request trace and deadlock (DL) trace. Execution trace has format of a quintuplet:

<core id, thread id , execution start time, scheduling end time, status> .
The status can have a value between 0-3 . Value O means that t hread is preempted by
quanta expiration. Value 1 means the thread is preempted by other higher priority
t hread . The value of status as 2 indicates that the thread is going for I/ 0 and 3 means
that the thread is completed. The I/ 0 trace has formate of a triplet: <thread id,

1/0 start time, 1/0 end time> . The execution trace and I/ 0 trace are retained
same as described in the fr amework of the MSS framework. This was done in order to
57

preserve multicore propert ies and to calculate perform ance metrics t hat are relat ed t o
mult icore systems but are affected by deadlock algorit hms, such as Turnaround t ime
and Throughput.

Request Trace: The format of the request t race is <rid, req-wait, reqgrant, reg-release>. The request id is denoted by rid, the time a request
started to wait is denoted by req-wait , t he time a request is granted is denoted
by req-grant and t he t ime when a request ends or a process releases occupied
resources is denoted by req-release.
The request t race helps in observing t he performance metrics of t he whole system which includes t he effects of t he additional components such as Resource
Manager and resources as well as t he implemented deadlock algorit hm . The
deadlock trace part icularly helps in computing performance metrics pertaining
to implemented deadlock algorit hms. From t his trace type we can calculate
performance metrics such as request wait t imes which are event ually used in
comput ing t he total wait time for a process or a system.

Deadlock Trace: The deadlock (DL) trace has a quint uple form at as shown in
Figure 4.1. This includes t he process id (pld), deadlock id (dld), deadlock start
t ime (DLStart) , deadlock end t ime (DLEnd) and status (St atus) . The process
id refers to t he id of t he process t hat is involved in t he deadlock. The deadlock id
is set to distinguish different deadlocks, t hat occur in t he system . The deadlock
start t ime and deadlock end t ime correspond to t he t ime a deadlock is started
and t he time a deadlock ends respectively. The deadlock status can have values
eit her O or 1. If t he involved process is selected to abort , t hen t his value is set
to 1. All other processes involved in deadlock will have t his value set to 0.
DL trace helps us in calculating very important performance metrics directly
58

<pld , did, DLStart, DLEnd , Status >
<27 , 2, 86 , 109 , 0>
<34, 4, 258 ,475 , 1>

<100 , 9, 2486 , 6971 , O>

Figure 4.1: Sample DL Trace
relat ed to the implemented deadlock algorithm such as Deadlock count , Duration of Deadlock etc.. In the next section, we explain the calculation of these
performance metrics in detail.

4.2

Performance Evaluation

The performance metrics related to deadlocks are mainly aimed at measuring the cost
(computation and communication) of deadlocks and the techniques to handle them.
A deadlock handling technique could be avoidance, prevention or detection followed
by recovery. The cost of deadlock detection and recovery depends on the kind of
strategy used . We compute p erformance metrics mainly from two persp ectives: (i)
From the perspective of an individual process; and (ii) From the perspective of the
system. First we will review the metrics used in the distributed syst ems context ,
where processes communicate through messages [8 , 13, 24, 25 , 33, 35].

59

Prevention of deadlocks is the most desirable option. However, due to the complexity involved, distributed deadlock prevention methods are rarely considered and
studied for distributed systems. Maintaining up to date information is almost impossible, due to the absence of a global clock and unpredictable message delays and/or
losses. Even with centralized approaches to maintaining the global state with reasonable consistency, the message cost would be prohibitively high , and on most systems
such message traffic could easily choke their underlying communication networks.
Therefore, prevention is not a popular technique in distributed systems. In any case,
the most interesting performance metrics for prevention are t he number of times deadlock is predicted and avoided , and t he overhead of the algorithm . Next, we look at
the metrics suitable for deadlock detection , and resolution techniques in distributed
systems.
From a process perspective, t he key metrics are:

wait time during deadlo ck,

turnaround time (also sometimes referred to as response time), number of times in-

volved in deadlocks , and number of restarts due to deadlocks.

System level per-

formance metrics vary depending on t he type of technique employed , whether it is
centralized or distributed. For centralized techniques, the metrics are: number of deadlocks that occurred in a specified period of t ime 1 , number of processes involved in a

deadlock, number of processes needed to be aborted to resolve a deadlock, duration
of a deadlock, number of processes completed in a specified period of time, overhead
of deadlock processing, and number of messages used to handle a deadlock. In a distributed case, more t han one process could initiate deadlock detection. Therefore, in
addition to the above metrics, the number of deadlock detection invocations could be
a performance metric when a distributed deadlock handling technique is used. Based
on these metrics, several secondary metrics such as the abort/ restart ratio, deadlock
1

Here the specified period is normally t he simulation t ime.

60

overhead (time to handle deadlock), percentage of processes involved in deadlocks, etc .
are also used. Also, for several metrics, minimum, maximum, average, and variance
would be desirable.
In distributed systems , due to message delay and/or loss, and the absence of
global knowledge, several kinds of deadlocks such as phantom deadlocks, transient
deadlocks, and true deadlocks are possible. These complications are not present in
a shared memory system. Multicore systems are typically shared memory systems
and therefore deadlock handling in a mult icore system does not involve messages.
Therefore, counting the number of messages involved in handling a deadlock is not
applicable here. Also , we don 't deal with t he possibility of phantom or transient
deadlocks.
A deadlock handling technique in multicore systems could be centralized or dist ributed. For a centralized implementation, one dedicated process could be assigned
to do the job, or the responsibility could be rotated among several processes, but
one process is responsible at any particular t ime. As indicated earlier, in distributed
deadlock handling, several processes could initiate deadlock detection independently.
If more than one process initiates a deadlock check, then for simplicity and effi ciency,

one must be elected to do the job. So, in a distributed case, in addition, the cost of
an election could be a performance metric. When to initiate a deadlock check also
influences the performance.

61

4.3

List of Performance Metrics

We now summarize the key p erformance m etrics of deadlock handling techniques for
multicore systems. We assume t hat the deadlock handling technique maintains process
states ( executing, waiting for a resource), resource status (free, used by a process),
timestamp of key events (arrival, start of waits, exit , deadlock resolved (prevented ),
etc.). We refer to t he timestamp function as ts.

4.3.1

P erformance M etrics ( for Process i)

1. Wait time (W _ T imei( R )): The duration between t he t imes i started waiting
for resource(s) R and t he t ime t he resources R is/are granted access to i. That
is, W _ T imei( R ) = ts(R granted access to i) - t s(i' s request for R ).
2. Deadlock wait tim e (DW_ T imei(dk)): The duration between the times i started
waiting due to its involvement in the deadlock dk and it is resolved . That is,

DW _ Timei(dk) = ts(dk resolved)- ts(i joined dk)3. Turnaround time/response time (T R_ T imei): The duration between the t imes
i entered and exited t he system. That is, T R_ T imei = ts(i' s exit) - ts(i' s

ent ry).
4. Deadlock count (D_CountJ Number of times process i was involved in deadlock.
5. Abort/res tarts count (AR_CountJ Number of times i is ab orted to resolve
deadlock.

62

4.3.2

P erformance M etrics (for Syst em)

1. Deadlock count (D_Count): The number of deadlo cks occurred in the system
in a specified period of time.
2. System wait time (SW _ T ime): The t ime processes spend to wait for the requests to be granted and the deadlock dk to be resolved. That is, SW _ T ime =
W _ T imei( R ) + DW _ Timei(dk) , for all i .

3. Deadlock encounter count (DP_Count): The number of times deadlocks are
predicted in the system in a specified period of time.
4. Deadlock size (D_Size(dk)): The number of processes involved in the deadlock
dk.

5. Deadlock strength (D_Strength(dk)): The number of processes needed to be
aborted to resolve the deadlock dk .
6. Degree of Deadlock (DD (dk)): Degree of deadlock represents the complexity of
the deadlock, and we define it as the product of size and strength. That is,
DD (dk) = D _ Size(dk) * D _ Strength(dk)-

7. Duration of deadlock (D_ D urat ion(dk)): The duration between t he t imes the
deadlock dk is formed and resolved . That is, D _ Duration(dk) = ts(dk resolved)
- (dk formed). The value of ts(dk resolved) is easy to obtain, and (dk formed)

could be computed from the ts( r equest) of all the processes involved in the
deadlock dk.
8. System Throughput (ST ): The number of processes completed in a specified
period of time. The number processes completed per unit time may be computed
by dividing it by t he entire duration.

63

9. Algorithm Overhead (D_ O(dk)): The read / write count required to process the
deadlock (prevention, avoidance or detection and recovery) dk.
10. Total Overhead (SO): The total read/ write count of system including algorithm overhead to process the deadlock (prevention, avoidance or detection and
recovery) dk.
11. Overhead Ratio (O HR): The ratio between t he algorithm overhead to process
t he deadlock (prevention , avoidance or detection and recovery) dk to total total overhead in terms of read/ write count of the system . That is, OHR =

Deadlock algorithm overhead
Total overhead
12. Degree of Concurrent In vocations (DCC I (dk)): The number of invocations of
deadlock detection of processes involved in the deadlock dk.
13. Abort/ Restart Ratio (A RR): The ratio between the number processes aborted
due to deadlock and the total number of processes in t he system. That is,

ARR =

Number of processes aborted
Number of processes in the system·

14. Deadlock Ratio (DR): The ratio of processes involved in deadlocks in comparison to the total number of processes. That is DR = Number of processes invofoed in deadlocks.
'
Number of processes in the system

Due to the solut ion technique, current system has W _ Timei (R ), DW _ T imei(dk) ,

TR_ T imei, D_ Counti and AR_ Counti implemented for Processes and D_ Count,
SW _ T ime , D_ Size(dk) , ST , ARR, SO , DR, D_ O(dk) and OHR implemented for
System. Average, and variance are computed for t he metrics related to processes.

64

4.4

Summary

We discussed different types of traces generated and collected during execution. We
also presented different performance metrics and the computation of those metrics
from t he traces. In t he next chapter, we present the analysis of the results from executions for t he values of different input parameters . They are analysed and compared
based on t he performance metrics .

65

Chapter 5
Simulation Study

We were interested in studying the performances of the Banker 's algorithm for deadlock avoidance and Dreadlocks for deadlock detection with process termination as a
deadlock recovery, with respect to the metrics discussed in the previous chapter. We
present two sets of experiments for Dreadlocks and one set of experiments for the
Banker's algorithm, in order to study them and demonstrate the operation of the
proposed simulator.

5.1

Experimental Setup

The scenarios used in the experiments are the general scenario and the scenario for
the Banker 's algorithm. The general scenario entails a normal setup of processes,
requests and resources.

Request generation is randomized with the upper bound

provided. There is no other restriction on the generation of requests. In the scenario
for the Banker's algorithm , the processes are assigned a maximum claim for each
resource in advance while generating workloads.

These scenarios are discussed in

detail in Chapter 3.
The first two sets of experiments pertain to Dreadlocks: deadlock detection algo-

66

rithm. We compare t he results of termination types with respect to variation in either
cores, or mean arrival rate.

Experiment 1: In this experiment we set the workloads with fixed mean arrival
rate and number of cores to vary. Table 5.1 shows the simulation parameters
used for this experiment.
Table 5.1: Simulation P arameters for Experiment 1
Parameters

Value

Scenario Type

General

Number of Processes

700

Number of Resources

70

Number of Instances per Resource I

30

Mean Arrival Rate

2.5

Number of Cores

25 , 50, 75 , 100, 125

Experiment 2 : We set fixed number of cores and mean arrival rate of the workloads is varied in this experiment. Table 5.2 shows t he simulation parameters
used for this experiment.

67

Table 5.2: Simulation Paramet ers for Experiment 2

Parameters

Value

Scenario Type

General

Number of Processes

700

Number of Resources

70

Number of Instances per Resource

30

Mean Arrival rate

2.5 , 3.5, 5, 6.5, 7.5

Number of cores

75

For both experiments we enabled option of detecting deadlock using Deadlocks. In
Chapter 3, we discussed selection criteria for processes to abort. We showed that once
deadlock is detect ed , one or more processes can be selected to abort based on different
criteria. They are: Random, Least Recently Used (LRU) , First In First Out (FIFO ),
Minimum Runtime ( min. runtime), Maximum Runtime (max. runtime), Minimum
Digest (min. digest) and Maximum Digest (max. digest). We believe terminating
processes with higher runtimes would be more expensive as restarting such processes
cost more. Based on this proposition, we made the following hypothesis.

Hypothesis: Terminating processes with the minimum runtime will perform better 1
than terminating processes with the maximum runtime.

Experiment 3: In this experiment , we show the performance of the Banker 's
1

Here we considered overall performance of t he system, t hat is, a termination type t hat shows
better results for most of t he metrics. Due to the randomness of the input and the uncertainty of
deadlocks, the selected termination type might not always perform well with all of the performance
metrics. Hence, if t he termination type performs better for t he majority of performance metrics, it
is considered better than other termination types.

68

deadlock avoidance algorithm. The simulation parameters set for this experiment are shown in Table 5.3. We set the workload with a fixed mean arrival
rate and varied the number of cores . The scenario type chosen was scenario for
the Banker 's algorithm.
Table 5.3: Simulation Parameters for Experiment 3

P arameters

Value

Scenario Type

Scenario for Banker 's algorithm

Number of Processes

500

Number of Resources

100

umber of Instances per Resource I

10

Mean Arrival Rate

10

Number of Cores

25, 50, 75 , 100, 125

Next , we explain our observations and analysis for deadlock detection: Dreadlocks,
deadlock recovery: termination with the minimum and the maximum runtime, and
deadlock avoidance: the Banker 's algorithm. The observation are illustrated and
analyses with respect to key performance metrics for the system.

5.2

Analysis

For the first two experiments , simulation results are mainly shown for four performance metrics in each case. The metrics are: (i) Throughput , (ii) System Abort / Restart
Ratio , (iii) Deadlock Wait Time, (iv) Overhead Ratio. For the third experiment , the

69

performance metrics shown are: (i) Turnaround t ime, (ii) Throughput, (iii) Process
Wait Time and (iv) Overhead Ratio. Some of the performance metrics from the initial
experiments are not shown for the third experiment because of limitations of the the
deadlock handling technique. These metrics are discussed in Chapter 4.

ow, we show

our observations on the above ment ioned performance metrics for each experiment .
We conducted several sets of experiments. In each experiment, we observed a
common behavior for termination types wit h respect to each other.

Every time,

performances of different termination t ypes with respect to each other resulted in a
similar pat tern , that is, one type of termination always performed better than other
type of t ermination .
Also, on comparing t wo different executions, we observed changes in patterns
of graphs for some performance metrics, t hat is, when t he performances were compared with respect to the change in t he number of cores , the graphs had increasing/ decreasing lines and when t he performances were compared by changing the mean
arrival time, t he graphs had lines wit h a zig-zag pat tern. The reason behind this is
that t he implemented deadlock handling techniques are not developed by keeping
change in t he mean arrival rate in mind . Hence, their performances with respect to
the mean arrival rate are arbitrary and they showed a zig-zag pattern in the results.

5.2 .1

Observations from Experiment 1

Observation on Throughput : While conducting experiment with various input , we observe that termination of processes based on t he minimum runtime has better throughput t han t erminating process based on the maximum runt ime. With t he increase in
the number of cores, throughput also increases in both cases. There was always a

70

significant difference between t he t hroughputs for minimum runtime and maximum
runt ime. The result for t he given input paramet ers is shown in Figure 5. 1.
Throughput
0.055
0.050
0.045

t 0.040
0

§0035 1
~

] 0.030

iii

~ 0.025
~
~

~ 0.020
V

0

"

C. 0.015

0.010
0.005
0.(XX) ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
20
25
30
35
~
~
~
~
~
~
~
~
E
~
~
lli
~
~

m

oo

Nunber of cores

oo

rn

rn

Figure 5.1: Throughput: Minimum Runtime and Maximum Runt ime (by varying t he
number of cores)

The t hroughput increases wit h increase in number of cores because wit h availability of more cores, more processes can execute per unit t ime. Hence, more processes
can complete t heir execut ions per unit t ime.
Observation on System Abort / Rest art Ratio (ARR): System ARR increases wit h
the increase in number of cores in both the cases. Termination with maximum runtime
always shows better behavior t han termination wit h minimum runt ime. The system
ARR chart for both termination types is shown in Figure 5.2.

71

System Abort/Restart Ratio

!-

r.'ii. Rt11tlre - Max. Ruitrre

I

o.~
0.75
0.70
o.65 I
V,

0.60

GI

~ 0.55

GI

~ 0.50

c." 0.45

iS

0 0.40

I-

~ 035

~ 0.30
0

~ 0.?5

020 I
0. 15
0.10
0.05
0.00

20

?5

m

~

40

~

50

~

60

~

m

75

~

11.urber of cores

~

~

E

~

~

rn

lli

~

~

rn

Figure 5.2: System ARR: Minimum Runtime and Maximum Runtime (by varying the
number of cores)

The maximum runtime has lower System ARR because less number of processes
are terminated when termination criteria is selected as the maximum run time. In
the same way, with the minimum runtime, more processes are terminated. Hence,
termination with the minimum runtime has a higher System ARR.
Observation on Deadlock Wait Time: The graph for Deadlock wait time for both
termination types is shown in Figure 5.3. The Deadlock wait time for termination
with the minimum runtime is significantly lower than termination with the maximum
runtime. The deadlock wait time increases with an increase in the number of cores
for the termination with the maximum runtime, while it remains nearly consistent for
termination with the minimum runtime.

72

Deadlock Wait Tme

!-~. Rllltm> - Max. Rllltm>j

2,200,CXXJ
2,CXXJ,CXX)
l ,OOJ.CXXJ
1,600.CXXJ I
~ 1,400,CXXJ

"'

E,"' 1,200,(XX)

"'~ l ,CXXJ,CXXJ

iJ

OOJ,CXXJ
600,000
400,CXXJ
200,CXXJ

0

20

25

m

~

~

~

~

~

~

E

m

~

~

II.umber of cores

~

~

~

~

~

~

m

~

~

rn

Figure 5.3: Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by
varying the number of cores)

With the increase in number of cores, more processes can execute per unit time.
This results in more processes holding resources per unit time while waiting for other
requested resources. Because of that, more time is required to resolve the deadlo ck.
Hence, the Deadlock wait time increases (for the maximum runtime) or remain nearly
consistent (for the minimum runtime) with the increase in number of cores.
Observation on Overhead Ratio: The graph for overhead ratio for both termination types is shown in Figure 5.4. The Overhead ratio decreases with increment in
the number of cores. Termination with the minimum runtime always showed a lower
overhead ratio than termination with the maximum runtime. For most of the experiments, the overhead ratio varied between 0.02 to 0.4 for Dreadlocks for all termination
types.

73

Overhead Ratio

1-"'1. Rllllm! - Max. Rllltrne l
0.40
,:, 0.38

i 0.36

f 0.34

~ 0.32 1
EO.ll

f: 0.28
'§ 026

~ 024
:it. 0.22

u

.2 0.20

~ 0.18

0 0.16

',:, 0.14

~ 0.12 1
b 0.10

~ 0~
i6 0.06

~ 004

0.02
0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
20
~
m ~
~
~
~
~
e
~
~
~
~
~
~
e ill m ~ ~ rn

w

m

llurber of cores

Figure 5.4: Overhead Ratio: Minimum Runtime and Maximum Runtime (by varying
the number of cores)

With t he decrease in number of cores , less number of processes have to wait for
the required resources. Because of that less reads/writes are required to detect the
deadlock at every request. Hence, the overhead ratio decreases with the increase in
number of cores.

5.2.2

Observations from Experiment 2

Observation on Throughput: We observed that termination based on the minimum
runtime always performs better than termination based on the maximum runtime.
The graph of throughput for both termination types is shown in Figure 5.5. The
graph demonstrates a zig-zag pattern but the values remain between a specific range
because the number of cores is same for all the experiments and the implemented

74

algorithms do not react to the change in the mean arrival rate.
Throughput

!- t11.R111tme 7S Cor~ - Max. R111ttne 7S Cor~ !
0.045
0.040

0.010
0.005
1

O.CXXl - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
~

U

ll

~

U

M ll U

U

U

~

U

Y ~ ~
Arrival rate

~

~

~

U

U

M U

U

~

D

M M

Figure 5.5: Throughput: Minimum Runtime and Maximum Runtime (by varying the
mean arrival rate)

Observation on System Abort/ Restart Ratio (ARR): For all values of the mean
arrival rate, termination with the maximum runtime shows lower value than the minimum runtime. The lines are nearly consistent for all of the values of the mean arrival
rate. The system ARR chart for both termination types is shown in Figure 5.6.
Observation on Deadlock Wait Time: The Deadlock wait time for termination with
the minimum runtime is always lower than termination with the maximum runtime.
The graph demonstrating deadlock wait times for both termination types is shown
in Figure 5. 7.

For most of the experiments of this set, we observed a significant

difference between both termination types. Termination with the maximum runtime
shows variation while termination with the minimum runtime has almost consistent
performance for all values of the mean arrival rate.

75

System Abort/Restart Ratio

!- Rllltme 75 Co,es - Max. Rllltme 75 Co,es I
tlil.

0.75
0.70
0.65 1
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20

0.15 1
0. 10
0.05
0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
~

ll ll

w D

~

u

ll

~

~

u

~

Y

~

~

~

~

il

Arrival rate

u

u

M u

u

n u

~

n

Figure 5.6: System ARR: Minimum Runtime and Maximum Runtime (by varying the
mean arrival rate)

Deadlock Wait Tine

!- Rllltme 75 Co,es - Max. Rllltme 75 COies I
tlil.

2,200,000
2.000.000

I

1.oo:i.000
1,600,000

1

,;- 1,400,000

t
E 1,200,CXXl

"' 1,000,000
8
u

OOJ,000
600.000
400,000
200.000 I

o~--------------------------------~
M ll ll W D M ll ll

~

~

~

ti

Y

~

~

Arrival rate

~

~

il

U U M U U

~

U

~

n

Figure 5.7: Deadlock Wait Time: Minimum Runtime and Maximum Runtime (by
varying the mean arrival rate)

76

Observation on Overhead Ratio: The graph for the overhead ratio for both the
termination types is shown in Figure 5.8. The overhead ratio for the termination type
with the minimum runtime is always lower than the termination with the maximum
runtime. For most of the experiments in this set, the overhead ratio varied between
0.1 to 0.4 for both the termination types.
Overhead Ratio

!- t111. Rllltrne 75 Caes - Max. Rllltme 75 Cc,es I
0.38
"O 0.36
~

)/ 0.34

j 0.32
0 0.30

i 0.28
~ 0.26
], 0.24

! 0.22

~ 020

~ 0.18
~ 0. 16
~ 0. 14

50.12
°i'i 0.10
~ 0.00
i6 0.06

~ 004

0.02
0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
~
ll ll ~ D M M ~ W ~ ~ ti U ~ ~ ~ i l i l U U M U U ~ U ~ U

Arrival rate

Figure 5.8: Overhead Ratio: Min. Runtime and Max. Runtime (by varying the mean
arrival rate)

5.2.3

Observations from Experime nt 3

Observation on Turnaround time: We observe that turnaround time decreases with
increment in the number of cores for the Banker 's algorithm. The rate of decrease of
the turnaround time is nearly constant between all the numbers of cores. The graph
for turnaround time for the Banker 's algorithm is shown in Figure 5.9.

77

Turnaround rme

j- Tura:Oll'd tme

j

5,500
5,CXXJ
4,500
4,CXXJ
3,500

E3,(XX)
~

:,/.

:,/.

~ 2,500

D

2,CXXJ '
1,500
1,CXXJ
500

M

~

m

~

~

~

~

~

ro

ffi

m

~

~

flllmber of cores

~

oo

~

@

~

ill

lli

~

~

rn

Figure 5.9: Turnaround Time: Banker 's Algorithm (by varying the number of cores)

With the increase in number of cores, the processes have more number of cores
available per unit time. Which results in less execution time for each process. Hence,
the turnaround time decreases with the increase in the number of cores.
Observation on Throughput: We observe that throughput of the syst em increases
with the increase in the number of cores for Banker's algorithm for most of the times.
The graph for throughput for the Banker 's algorithm is shown in Figure 5. 10.

78

Th
roughpu
t

1TIYootlJutl

0
.065
0
.060
0
.055
0
.050

"
'.045
~0
V

C

g0.040
~

~0
.035
~

~ 0.030
~

~

~ 0
.
0
25
~

V

E0.020

C
.

O
.D15

0
.010
0
.005
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

20

25

30

35

40

45

50

55

60

65

7
0

7S

80

f
\
\
Jnb
e
ro
fcor
e
s

85

90

95

1
00

105

110 115

1
20

125

1
30

F
igu
re5
.1
0
: Th
roughput
: Ba
nk
e
r
'
sA
lg
or
i
t
hm(byv
a
ry
in
gthenumbe
ro
fc
o
r
e
s
)

W
ithin
c
re
a
s
einnumbe
ro
fc
o
r
e
s
,morep
roc
e
s
s
e
sc
a
nex
e
cut
epe
run
itt
im
e
.H
enc
e
,
tin
c
re
a
s
e
sw
i
th th
ein
c
r
e
a
s
einnumbe
ro
fc
o
r
e
s
. Thoughf
r
om 1
0
0
th
et
h
roughpu
to1
2
5c
o
r
e
si
ts
l
i
gh
t
l
yde
c
r
e
a
s
e
s
,thev
alu
eo
fthr
o
ughputrema
in
sh
ighe
rthano
th
e
r
a
nno
tde
t
e
rm
in
eth
eex
a
c
tr
e
a
so
nbeh
indt
h
iss
t
ran
g
eb
ehav
io
r
,
numbe
r
so
fc
o
r
e
s
.W
ec
r
a
ndomne
s
so
fth
einpu
tc
a
nbeon
eo
fthere
a
so
n
sfo
rth
a
t
.
Obs
e
r
v
a
tiononP
ro
c
e
s
sW
a
i
tT
ime
:ThePro
c
e
s
sw
a
i
ttim
ed
e
c
r
e
a
s
e
sw
i
th in
c
re
a
s
e
tr
e
su
lt
sf
o
rav
e
rag
eandv
ar
i
an
c
ef
o
ra
l
lo
f
innumbe
ro
fc
o
r
e
s
.W
eobs
e
r
v
edc
o
n
s
i
s
ten
rap
hfo
rP
ro
c
e
s
sw
ai
tt
im
ef
o
rtheB
a
nke
r'
s
th
ed
i
f
f
e
ren
texper
im
en
t
so
fth
iss
e
t
.Theg
hmi
ssho
wn inF
igu
re5
.
1
1
.
a
l
g
o
r
it

7
9

Process Wait Tme

!- Aveiage - va~nce!

550,(XX)

500,o:xJ
450,(XX)
400,(XX)
~ 3S()r())'.)

u

~ 300,o:xJ

::t.

u

~ "50,(XX)
200,(XX)
150,(XX)
100,(XX)
50,CY.Xl

0

20

25

30

35

40

~

so

~

ro

ffi

m

~

oo

f\l.Jmber of cores

~

oo

E

w

~

~

~

~

~

rn

Figure 5.11: P rocess Wait Time: Banker 's Algorithm (by varying the number of cores)

Wit h t he increase in number of cores , more processes can execute per unit t ime.
This results in more processes holding resources per unit t ime while wait ing for other
requested resources. Because of t hat, processes have to wait less for other requested
resources . Hence, t he P rocess wait time decreases with the increase in number of
cores .
Observation on Overhead Ratio: The graph for overhead ratio for Banker 's algorit hm is shown in Figure 5. 12. The overhead ratio increases wit h increases in t he
number of cores. The overhead ratio is higher t han 0.80 for t he given input paramet ers for all cores. We observed that for all experiments of this set , the overhead ratio
for t he Banker 's algorithm has always been higher than 0.75.

80

Overhead Ratio
j- oveihead Rat~ I
0.95
'O 0.90

i 0. BS

f 0.80
~

~ 0.75

1ii 0.70

~ 0.65
, 0.60
v'

~ 055

f 0.50

~ 0.45
1
0.40 ·
~

f 0.35

'§ 030
Cl

ci' 0.25
~ 0.20
0

'6 0.15
~

~ 0.10

0.05
0.00 ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~
20
25
3(J
3S
40
lS
SO
55
60
65
70
75
80
BS
90
95
100 105
110 115
120
125
130

1\1.urber of cores

Figure 5. 12: Overhead Ratio: Banker 's Algorit hm (by varying the number of cores)

For t he Banker 's algorit hm , safe state is checked for each request. Wit h t he increase in number of cores, more processes execute per unit time. Hence, more processes
can request resources per unite t ime. The algorit hm determines a safe sequence while
checking the safe state and it assumes t hat the processes will fini sh t heir execution
sequent ially according to t he safe sequence. This idea is not helped by increasing the
numb er of cor es. In som e cases, t h e incr ease in numb er of cor es m a ke t h e p erforma n ce

worse. Hence, t he overhead ratio increases wit h the increase in number of cores for
t he Banker 's algorit hm.

81

5.3

Summary

We presented the simulation experimentation for termination with the minimum runtime and the maximum runtime, in this chapter. As hypothesized , termination with
the minimum runtime outperforms termination with the maximum runtime in most
of the cases. We also presented simulation experiments for the Banker's deadlock
avoidance algorithm. These experiments showed valid results for deadlock avoidance
as well as deadlock detection and recovery. Hence we are confident that our simulation
implementation of deadlock in multicore systems is fairly sound.

82

Chapter 6
Conclusions and Future Work

Deadlock is a very common bug in software applications, yet, it is ignored by most of
the operating systems [32]. With the advent of multicore processors, the problem of
deadlock has gained renewed attention in research. Many approaches that have been
developed to handle deadlock in multicore systems were discussed in Chapter 2. In
this chapter, we conclude the thesis with providing possible future work that can be
conducted to extend the research carried our in this thesis.

6.1

Conclusion

The primary contributions of this thesis are: (i) The design and implementation of a
flexible framework to simulate deadlock in multicore systems and (ii) Three scenarios
implemented to incorporate different deadlock handling techniques. The scenarios
are: general scenario, scenario for the Dining Philosopher's problem and scenario for
the Banker's algorithm. Two deadlock handling strategies, deadlock avoidance and
deadlock detection & recovery, are simulated. The deadlock avoidance strategy is
the Banker's algorithm and the deadlock detection strategy is Dreadlocks. Deadlock
recovery follows deadlock detection , which is done by terminating one of the processes
involved in the deadlock. The experience gained by developing this simulator involves
83

software design, implementation and performance analysis. (iii) Demonstration of the
soundness and validity of the proposed framework by presenting the result analysis of
the experiments.
The proposed simulator has the basic components required to simulate a deadlock
handling algorithm in multicore systems. More deadlock handling techniques can be
easily added to this simulator to study, which can furnish initial insights on behavior
of the algorithms in practical multicore systems. These insights can really be helpful
to study the performances of the algorithms as well as to identify their shortcomings.
Based on the shortcomings, necessary guidelines can be offered for improvements of
the algorithm.

6.2

Future Directions

The proposed framework to incorporate deadlock handling algorithms in multicore
systems is just the beginning of research into deadlock handling algorithms in multicore systems. There are many directions in which the work presented in this thesis
can be expanded. We outline some of them next.

• The simulation can be made more sophisticated by refining and improving components such as user interrupts. Current system simulates only I/ 0 interrupts,
more real system interrupts can be introduced to the proposed framework.
• Cache memory can also be simulated to the system to make the framework's
behaviour closer to real systems.
• More scheduling algorithms can be added to observe performance of deadlock

84

handling algorithms with them.
• More deadlock handling algorithms can be incorporated.
• Modelling resources and Resource Manager can be refined and improved by
adding different types of resources with different properties. Also, function of
Resource Manager can be improved to deal differently with various resources.
• The current performance study has exposed some irregularities in the implemented algorithms as well as some limitations of them. For example, for Dreadlocks, digest propagation is not accurate and it considers all the processes with
equal priority. These limitations can be overcome to make the algorithms more
efficient. For example, to overcome limitation of digest propagation, a centralized schema can be introduces to update digests for all the processes at regular
intervals.

I would like to continue working on some of these directions in the future.

85

Bibliography

[l] Rahul Agarwal and Scott D. Stoller. Run-time detection of potential deadlocks
for programs with locks , semaphores, and condition variables. In Proceedings of
the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging,

PADTAD '06, pages 51- 60, New York, NY, USA, 2006. ACM.
[2] Rahul Agarwal, Liqiang Wang, and Scott D. Stoller. Detecting potential deadlocks with static analysis and run-time monitoring. In Proceedings of the First
Haifa International Conference on Hardware and Software Verification and Testing, HVC '05 , pages 191-207, Berlin, Heidelberg, 2006. Springer-Verlag.

[3] Alex A. Aravind and Viswanathan Manickam.

A flexible simulation frame-

work for multicore schedulers: Work in progress paper. In Proceedings of the
1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation,

SIGSIM PADS '13, pages 355- 360, New York, NY, USA, 2013. ACM.
[4] Ferenc Belile An efficient deadlock avoidance technique. IEEE Trans. Comput.,
39(7):882-888, July 1990.
[5] Saddek Bensalem, Jean-Claude Fernandez, Klaus Havelund, and Laurent
Mounier. Confirmation of deadlock potentials detected by runtime analysis. In
Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing
and Debugging, PADTAD '06, pages 41- 50, New York, NY, USA, 2006. ACM .

[6] Saddek Bensalem and Klaus Havelund. Scalable dynamic deadlock analysis of
multithreaded programs. In IN PARALLEL AND DISTRIBUTED SYSTEMS:

86

TESTING AND DEBUGGING (PADTAD - 3), IBM VERIFICATION CONFERENCE, 2005.

[7] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard.

Ownership types

for safe programming: Preventing data races and deadlocks. SIGPLAN Not. ,
37(11):211-230, November 2002.

[8] Omran A. Bukhres. Performance comparisons of distributed deadlock detection
algorithms. In Proceedings of the Eighth International Conference on Data Engineering, pages 210-217, Washington, DC , USA, 1992. IEEE Computer Society.

[9] E. G. Coffman , M. Elphick, and A. Shoshani. System deadlocks. ACM Comput.
Surv., 3(2):67-78, June 1971.
[10] Edsger W. Dijkstra. The origin of concurrent programming. chapter Hierarchical

Ordering of Sequential Processes, pages 198- 227. Springer-Verlag New York, Inc.,
New York, NY, USA, 2002.
[11 J Edsger Wybe Dijkstra. Cooperating sequential processes, technical report ewd123. Technical report , 1965.
[12] Dawson Engler and Ken Ashcraft. Racerx: Effective, static detection of race

conditions and deadlocks. SIGOPS Oper. Syst. Rev. , 37(5):237-252, October
2003.
[13] Luca Ferrarini , Luigi Piroddi, and Stefano Allegri. A comparative performance

analysis of deadlock avoidance control algorithms for fms. Journal of Intelligent
Manufacturing, 10(6):569- 585 , 1999.
[14] Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B.

Saxe, and Raymie Stata. Extended static checking for java. SIGPLAN Not.,
37(5):234-245, May 2002.
87

[15] Patrice Godefroid. Model checking for programming languages using verisoft. In

Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, POPL '97, pages 174-186, New York, NY, USA , 1997.

ACM.
[16] A. N. Habermann. Prevention of system deadlocks. Commun. ACM, 12(7):373-

ff., July 1969.
[17] Sibsankar Haldar and Alex Aravind.

Operating Systems. Pearson Education,

2010.
[18] W. Haque. Concurrent deadlock detection in parallel programs. Int. J. Comput.

Appl., 28(1):19-25 , January 2006.
[19] Klaus Havelund. Using runtime analysis to guide model checking of java pro-

grams. In Proceedings of the 7th International SPIN Workshop on SPIN Model
Checking and Software Verification, pages 245- 264, London, UK , UK, 2000.

Springer-Verlag.
[20] Richard C. Holt. Some deadlock properties of computer systems. In Proceedings

of the Third ACM Symposium on Operating Systems Principles, SOSP '71 , pages
64-71 , New York, NY, USA , 1971. ACM.
[21] Pallavi Joshi, Chang-Seo Park, Koushik Sen, and Mayur Naik. A randomized

dynamic program analysis technique for detecting real deadlocks.

SIGPLAN

Not. , 44 (6): 110- 120, June 2009.
[22] Edgar Knapp. Deadlock detection in distributed databases. ACM Comput. Surv.,
19(4):303-328, December 1987.
[23] Eric Koskinen and Maurice Herlihy. Dreadlocks: Efficient deadlock detection. In

88

Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms
and Architectures, SPAA '08, pages 297-303 , New York, NY, USA, 2008. ACM.
[24] Natalija Krivokapic, Alfons Kemper , and Ehud Gudes. Deadlock detection in

distributed database systems: A new algorithm and a comparative performance
analysis. The VLDB Journal, 8(2):79-100, October 1999.
[25] Soojung Lee and Junguk L. Kim. Performance analysis of distributed deadlock

detection algorithms. IEEE Trans. on Knowl. and Data Eng. , 13(4):623-636,
July 2001.
[26] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes:

A comprehensive study on real world concurrency bug characteristics. SIGOPS
Oper. Syst. Rev. , 42(2):329-339, March 2008.
[27] Mayur Naik, Chang-Seo Park, Koushik Sen, and David Gay. Effective static

deadlock detection. In Proceedings of the 31st International Conference on Software Engineering, ICSE '09 , pages 386-396, Washington, DC , USA, 2009. IEEE

Computer Society.
[28] Onur Ozgun and Yaman Barlas. Discrete vs. continuous simulation: When does

it matter. In Proceedings of the 27th international conference of the system dynamics society, volume 6, pages 1-22, 2009.
[29] Yao Qi, Raja Das, Zhi Da Luo , and Martin Trotter. Multicoresdk: A practical

and efficient data race detector for real-world applications. In Proceedings of
the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and
Debugging, PADTAD '09 , pages 5:1-5:11 , New York, NY, USA, 2009 . ACM.
[30] M. Sfinghal. Deadlock detection in distributed systems. Computer, 22(11):37-48,

November 1989.
89

[31 J William Stallings. Operating Systems: Internals and Design Principles. Prentice
Hall Press , Upper Saddle River, NJ, USA, 6th edition, 2008.
[32] Andrew S. Tanenbaum. Modern Operating Systems. Prentice Hall Press, Upper
Saddle River , NJ , USA , 3rd edition, 2007.
[33] I. Terekhov and T. Camp. Time efficient deadlock resolution algorithms. Inf.
Process. Lett., 69(3):149-154, February 1999.

[34] Amy Williams, William Thies, and Michael D. Ernst. Static deadlock detection
for java libraries. In Proceedings of the 19th European Conference on ObjectOriented Programming, ECOOP '05 , pages 602-629 , Berlin, Heidelberg, 2005.

Springer-Verlag.
[35] Chim-fu Yeung, Sheung-lun Hung, and Kam-yiu Lam. Performance evaluation
of a new distributed deadlock detection algorithm. SIGMOD Rec. , 23(3):21- 26,
September 1994.

90