AN ADAPTIVE LOAD SENSING PRIORITY ASSIGNMENT PROTOCOL FOR DISTRIBUTED REAL-TIME DATABASE SYSTEMS by Shah Nahid Mahmud B.Sc., Jahangim agar University, Bangladesh, 2003 THESIS SUBM ITTED IN PARTIAL FULFILLM ENT OF THE REQUIREM ENTS FOR TH E D EG REE OF M ASTERS OF SCIENCE IN M ATHEMATICAL, COM PUTER, A ND PHY SICA L SCIENCES (CO M PU TER SCIENCE) UNIVERSITY OF N O RTH ERN BRITISH COLUMBIA August 2012 © Shah N ahid M ahmud, 2012 1+1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre reference ISBN: 978-0-494-94088-4 Our file Notre reference ISBN: 978-0-494-94088-4 NOTICE: AVIS: The author has granted a non­ exclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distrbute and sell theses worldwide, for commercial or non­ commercial purposes, in microform, paper, electronic and/or any other formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conform em ent a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. W hile these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada Abstract Transaction processing in a distributed real time database system (D RTD BS) is coordinated by a concurrency control protocol (CCP). The performance o f a CC P is affected by the load condition o f a transaction processing system. For example, the perform ance o f the Adaptive Speculative Locking (ASL) protocol degrades in high load conditions o f the system. Priority protocols help a CCP by prioritizing transactions. The perform ance o f the priority protocols is also affected by system load conditions, but they can be optim ized by dynamically switching between priority protocols at run time w hen the system load changes. The objective o f this research is to develop a protocol, Adaptive P riority A ssignm ent protocol (APAP), which changes the priority protocol at run tim e to im prove the performance o f a CCP in a DRTDBS. APAP is implemented in a DRTDBS, where A SL is used as the underlying CCP to validate APAP. The performance o f APAP was tested under varying system load conditions with various combinations o f the database system parameters. Under the scenarios tested, APAP performed better than other priority protocols and demonstrated that dynam ic selection o f priority protocols during run time is an effective w ay to im prove the performance o f a CCP in a DRTDBS. Table of Contents Abstract......................................................................................................................................................... ii Table of Contents.........................................................................................................................................iii List of Tables................................................................................................................................................ vi List of Figures.............................................................................................................................................vii Acknowledgement........................................................................................................................................ix Chapter 1........................................................................................................................................................ 1 1.1 Transaction Processing..................................................................................................................3 1.2 Data Distribution........................................................................................................................... 5 1.3 Deadlines........................................................................................................................................6 1.4 Deadlocks.......................................................................................................................................7 1.5 Priority Assignment...................................................................................................................... 8 1.6 Preemption...................................................................................................................................10 1.7 Locking Protocols....................................................................................................................... 11 1.8 Commit Protocols........................................................................................................................ 13 1.9 Concurrency ControlProtocols (CCPs)...................................................................................... 16 1.10 Contribution.................................................................................................................................17 Chapter 2 ......................................................................................................................................................19 2.1 Evaluation of the ASL protocol.................................................................................................20 2.1.1Permits Reading o f Modified Prepared-Data for Timeliness..................................................20 2.1.2 Prompt-Early Prepare.......................................................................................................23 iii 2.1.3 Adaptive Exclusive Primary............................................................................................... 24 2.1.4 Speculative Locking............................................................................................................ 26 2.1.5 Adaptive Speculative Locking........................................................................................... 29 2.1.6 Synchronous Speculative Locking Protocol for Read-Only Transactions......................32 2.2 Switching Between Priority Protocols...................................................................................... 35 2.2.1 Adaptive Earliest Deadline..................................................................................................35 2.2.2 Sectional Scheduling........................................................................................................... 37 2.2.3 Maximum Miss First........................................................................................................... 38 2.2.4 Group-EDF...........................................................................................................................39 2.3 Summary..................................................................................................................................... 40 Chapter 3 ..................................................................................................................................................... 42 3.1 Distributed Real-Time Transaction Processing Simulator...................................................... 42 3.1.1 Node Architecture...............................................................................................................44 3.2 Graphical User Interface............................................................................................................ 48 3.3 Software Design:......................................................................................................................... 50 3.3.1 Concurrency Control Protocols.......................................................................................... 52 Chapter 4 ......................................................................................................................................................54 4.1 Adaptive Priority Assignment Protocol.................................................................................... 54 4.1.1 4.2 Implementation.....................................................................................................................56 Experiments and Results.............................................................................................................59 4.2.1 Baseline Experiment........................................................................................................... 59 iv 4.2.2 Performance of APAP varying the transaction lo ad ........................................................ 62 4.2.3 Performance of APAP with reduced page update rate..................................................... 64 4.2.4 Performance of APAP with larger cache size...................................................................67 4.2.5 Performance of APAP with increased slack tim e.............................................................69 4.2.6 Effects of system disk space on the performance o f A PA P.............................................71 4.2.7 Effect of system distribution on the performance o f APAP.............................................73 4.2.8 Effect of increasing the number o f processors..................................................................76 4.2.9 Effect of network topologies on the performance of APAP.............................................77 4.3 Summary..................................................................................................................................... 79 Chapter 5 ......................................................................................................................................................81 5.1 Future W ork................................................................................................................................83 Bibliography.................................................................................................................................................84 v List of Tables Table 2.1: Lock Compatibility M atrix (a) 2PL and (b) SL [7 ]...................................................... 28 Table 2.2: Lock compatibility matrix for SSLR [4 2 ].......................................................................32 Table 4.1: Load ranges for priority protocols.................................................................................... 56 Table 4.2: Parameter S ettings................................................................................................................60 VI List of Figures Figure 1.1: Partitioned and replicated data distribution p ro cesses.................................................. 5 Figure 1.2: Transaction Deadline M odel [ 1 3 ] ..................................................................................... 6 Figure 1.3: Two-Phase Locking - G rowing and Shrinking Phase [ 9 ] .......................................... 12 Figure 1.4: Commit Protocol - Commit and A bort P ath s................................................................ 14 Figure 2.1: A Transaction Processing [7 ]........................................................................................... 27 Figure 2.2: 2PL Processing [7 ] ............................................................................................................. 27 Figure 2.3: SL Processing [7 ]................................................................................................................27 Figure 2.4: SSLR Processing [4 2 ]........................................................................................................33 Figure 2.5: ASLR Processing [43]........................................................................................................34 Figure 3.1: N etw ork configuration [ 4 9 ] ............................................................................................. 44 Figure 3.2: Simulator SetupT ool.......................................................................................................... 49 Figure 3.3: Simulator ReportTool.........................................................................................................49 Figure 3.4: Simulation class structure [46].........................................................................................50 Figure 3.5: A running sim ulator........................................................................................................... 51 Figure 3.6: Speculative locking protocol dependencies [46].......................................................... 52 Figure 4.1: Sequence o f transaction selection process w ithout A P A P .........................................57 Figure 4.2: Sequence o f transaction selection process w ith A P A P ...............................................58 Figure 4.3: PTCT for the Baseline E xperim ent.................................................................................61 Figure 4.4: POU for the Baseline E xperim ent................................................................................... 62 Figure 4.5: PTCT for 200 transactions................................................................................................63 Figure 4.6: POU for 200 transactions.................................................................................................. 63 Figure 4.7: PTCT for the zero page update r a te ................................................................................64 Figure 4.8: POU for the zero page update r a te .................................................................................. 65 Figure 4.9: PTCT for the 50 percent page update r a te .................................................................... 66 Figure 4.10: POU for the 50 percent page update r a te .................................................................... 67 Figure 4.11: PTCT for 50 pages cache s iz e ........................................................................................68 Figure 4.12: POU for 50 pages cache size.......................................................................................... 68 Figure 4.13: PTCT for the 720-3600 ticks slack tim e ......................................................................70 Figure 4.14: POU for the 720-3600 ticks slack tim e ...................................................................... 71 Figure 4.15: PTCT for 4 d isk s..............................................................................................................72 Figure 4.16: POU for 4 d isk s................................................................................................................73 Figure 4.17: PTCT for 11 n o d es........................................................................................................... 74 Figure 4.18: POU for 11 n odes..............................................................................................................74 Figure 4.19: PTCT for 15 n odes........................................................................................................... 75 Figure 4.20: POU for 15 nodes............................................................................................................ 76 Figure 4.21: PTCT for 2 p ro cesso rs.................................................................................................... 77 Figure 4.22: POU for 2 p ro cesso rs.......................................................................................................77 Figure 4.23: PTCT for the 2D-torus netw ork topology..................................................................79 Figure 4.24: POU for the 2D-torus netw ork topology......................................................................79 viii Acknowledgement Foremost, I would like to express m y sincere gratitude to my supervisor, Dr. W aqar Haque for his guidance, patience, and im m ense know ledge in m y research. Throughout this research, his professionalism, attention to detail, and high quality o f standards trem endously helped me to find m yself in a better academic standing. I could not have achieved m y goal without his constant support and generosity. I also would like to thank my com m ittee m em bers Dr. Alex A ravind and Dr. Balbinder Deo, for their support, encouragement, and insightful comments. I spent a long time in Dr. Alex A ravind’s research lab. He gave me valuable suggestions about theses based on his experience with his other students, w hich helped me a lot. A special w ord o f gratitude to m y friend, Christina Tennant, for her continuous help and enthusiasm to improve m y thesis writing. For the last couple o f m onths I have been continuously bugging her to proofread m y thesis. She patiently checked every line o f m y writing and made suggestions to fix any issues. I also would like to thank m y friend, Barbara W illm er who helped check my writing m any times. Finally, I would like to thank m y family: my parents and my siblings for supporting m e throughout m y life. I am grateful to them that they kept faith in me in all situations o f m y life. ix Chapter 1 Introduction A database system provides a system atic and secure w ay to store inform ation and answer queries in an organized manner. In today’s world, almost every business uses a database system. Access to a database system is controlled by transactions, w hich are combination o f read operations that read data from the database systems and w rite operations that update data in the database systems [1]. A ccording to Berstein and N ew com er [2], a transaction must follow ACID properties: atomicity, consistency, isolation, and durability. Atomicity ensures that partial com pletion o f a transaction is not accepted. Consistency m eans database changes made by a transaction should not violate consistency o f the database. W hen a number o f transactions are running in parallel, isolation ensures that each transaction runs as if it were independent o f other transactions. The results of a transaction will be permanent as indicated by durability, even in the event o f a failure [2], A real time database system (RTDBS) is a repository for data, like a conventional database, which supports data retrieval and manipulation. In addition, it ensures “som e degree o f confidence in meeting the system ’s timing requirements” [3] [4], A RTDBS is evaluated by how many transactions com plete their tasks before the deadlines expire. The performance o f a RTDBS depends on the num ber o f transactions missing their deadlines, the effects o f transactions m issing their deadlines, the average ‘lateness’ or ‘tardiness’ o f late transactions, the present status o f the data, and the time interval in w hich the data in the database was collected from the external w orld [3]. A distributed database system (DDBS) is a collection o f data sites, w hich contain one or more databases connected by a com m unication network. For example, in the stock m arket, information is stored in a geographically distributed database, since stocks are bought and sold from different places. A DDBS supports sharing o f data and program s, and load balancing among all sites. It can also be increm entally expanded to any num ber o f sites [5], In a distributed real time database system (DRTDBS), transactions at each site have explicit timing constraints, w hich becom e m ore challenging to follow because the transactions are distributed and database consistency needs to be m aintained through controlled data access. Concurrency control protocols (CCPs) coordinate concurrent access to data and are considered the core component o f database systems. There are several concurrency control approaches used to m aintain the consistency o f the database while transactions are concurrently accessing data. These approaches can be categorized into two types: aggressive or optimistic where operations are scheduled immediately, and conservative or pessim istic where operations may be delayed [6], Tw o-phase locking (2PL) is the m ost popular concurrency control protocol in com m ercial products. In 2PL, all data item s have locks associated with them. W hen a transaction accesses a data item, it holds the lock o f that data item [6]. If the lock is an exclusive lock, the data item becomes unavailable to other 2 transactions until the transaction holding the lock com pletes its execution. Therefore, 2PL increases transaction execution time. The speculative locking (SL) protocol is an approach w here conflicting transactions are allowed to access a data item that is held by another transaction to m inim ize the transaction execution time, and resolve conflicts later [7]. The adaptive speculative locking (ASL) protocol extends the basic function o f the SL protocol in DRTDBS and outperform s it under most conditions by exploiting a variety o f techniques: efficient m em ory m anagem ent, hyper-threading, and transaction queue m anagem ent (discussed in C hapter 2) [8] [9], However, the ASL protocol uses the fixed priority assignm ent approach, w here a given protocol is selected when a transaction is initiated and used for the entire duration until it completes. The fixed approach o f choosing priority protocols may not always produce optimum results in a real time system, especially w here system conditions change frequently, since there m ay not be enough time for transactions to com plete before their deadlines. Our hypothesis is that by dynam ically switching betw een various priority protocols in a DRTDBS, the number o f transactions that m eet their deadlines can be m axim ized. T he goal is thus to develop an adaptive priority protocol approach for the ASL protocol that w ill allow automatic switching between priority protocols as the system load changes, thus im proving overall performance. The remaining part o f this chapter explains the necessary com ponents and features o f the DRTDBS and provides background know ledge pertaining to our work. This is followed by an outline o f the contribution o f this research. 1.1 Transaction Processing In a database system, the basic unit o f processing is a transaction, w hich is a set o f read/write operations that can be either local or global [1], Local transactions deal w ith data 3 locally at a single site, while global transactions deal w ith data at multiple sites and m ay have a number o f sub-transactions. In DRTDBS, m ost transactions are global, and transaction execution involves running sub-transactions at rem ote sites. According to the distributed transaction model [10], transactions are controlled by processes, which w ork at different sites to coordinate between a transaction and its sub-transactions. The process that executes at the site where the transaction originates is called the master. O ther processes that execute on behalf o f the m aster are called cohorts, w hich need to m aintain com m unication w ith the master for a successful global transaction execution. There are two types o f distributed transaction execution models: sequential and parallel [11]. In a sequential execution m odel, operations from a single cohort are executed sequentially. The cohort can only com m it after successful completion o f all operations. During the execution o f operations, a site m ay have only one cohort or nothing. In a parallel execution model, all the cohorts are initiated together and execute in parallel w ithout interfering w ith each other. Therefore, transactions complete earlier than in the sequential execution model. The lifetime o f a transaction can be divided into two phases: a w ork phase and a commit phase [9]. In its work phase, a transaction reads or manipulates data. The m aster process informs other participating cohorts about the work to be done at each site. The cohorts then complete the work and confirm s w ith the m aster about the com pleted work. In its commit phase, a transaction com pletes w hen the m aster gets confirmation from all the cohorts, and executes a commit protocol w hich m akes the changes permanent, o r executes an abort protocol which reverts any changes. Concurrent access of a data item causes inconsistency in the database. Serialization guarantees the serial execution o f transactions when they execute concurrently on the sam e data [6]. To protect data and ensure 4 serialization, locking and commit protocols are used in the work and com m it phase, respectively. 1.2 Data Distribution Data in a DRTDBS are distributed throughout the sites o f a system. In a D RTD BS, the nature o f the data distribution w ith respect to the execution o f transactions can severely affect the performance. There are two ways data can be distributed (Figure 1.1): partitioned and replicated [6], 1. In partitioned distribution, there are no intersections of data w hen data are distributed in different nodes. The partitioned distribution minim izes m aintenance cost, but if a single site fails then the data is lost and cannot be recovered. Therefore, the whole system fails. Partitioned Site 1 Replicated Site 2 a Site 3 a Figure 1.1: Partitioned and replicated data distribution processes 2. In replicated distribution, multiple copies o f the sam e data item s are distributed to different sites [6]. This increases availability o f the data to transactions w hen they need it for their operations. In such an environment, the system does not need to stop operation even w hen some sites fail because the required data m ay be available at other sites. However, updating data at one site requires updating all copies at other sites to prevent inconsistencies [12]. CCPs are used to ensure that the “database is a one copy equivalent” [6]. 1.3 Deadlines In a DRTDBS, deadlines represent tim ing constraints that a transaction m ust m eet to successfully commit. A global transaction requires processing of all associated sub­ transactions before it commits, so it requires m ore tim e than a local transaction. D eadlines can be categorized as (Figure 1.2): hard, soft, and firm [13]. Value Value V alue A itival Tim e • Time Deadline (I) hard deadline I Arrival Tim e Deadline (2) soft deadline Tim e Time A rrival Time Deadline (3) firm deadline Figure 1.2: Transaction Deadline Model [13] 1. A hard deadline follows strict tim ing constraints for transactions. If a transaction misses this timing constraint, its value becom es negative and that severely affects the system. 2. A soft deadline provides an extra am ount o f tim e for a transaction to finish its work after the deadline. W hen the deadline expires, the value o f the transaction degrades. If the transaction exceeds the extra tim e then its value becom es zero. 6 3. A firm deadline is similar to soft deadline, but it does not provide extra tim e after the deadline. W hen a transaction m isses the deadline, the value becom es zero and the transaction is discarded instantly [14], 1.4 Deadlocks A deadlock occurs when no transaction can com plete due to a circular w ait on data requests. For example, a transaction Ti requests a data lock which is held by another transaction T 2 , which may be waiting (either directly or indirectly) for data item s w hich are held by Ti. This circular wait causes a deadlock, w here no transactions can com plete [14]. Deadlocks can be handled in three different ways [5]: deadlock prevention, deadlock avoidance, or deadlock detection. 1. “Deadlock prevention algorithm s ensure deadlock free condition through guaranteeing that at least one o f the conditions that cause deadlocks fails to hold” [5], These algorithms suffer from a high num ber o f transaction restarts. 2. Deadlock avoidance algorithm s use prior inform ation about the use o f resources to analyze every incoming request, which helps them to predict deadlocks beforehand. These algorithms create low er system overhead than deadlock prevention algorithms. W ithout enough inform ation these algorithm s can fail [5]. 3. Deadlock detection algorithm s detect a deadlock w hen it occurs and abort one o f the transactions involved in causing it, to resolve the deadlock [5], In a DRTDBS, deadlock detection requires good coordination am ong sites. They also need to deliberate a transaction’s tim ing constraints, since a transaction needs to have enough 7 time to complete if the transaction needs to restart. Deadlock detection in a D RTD BS can be categorized as: centralized, distributed, or hierarchical [15]. In the centralized approach, a site is used as a central coordinator to m aintain the resource utilization graph o f the entire system. Only the coordinator updates the resource utilization graph and searches it for circular waits. The approach is easy to im plem ent, but it fails if the central coordinator site fails. In the distributed approach, the resource utilization graph is distributed am ong m any sites and requires coordination am ong the sites to detect deadlocks. W ithout good coordination between the sites, it is not possible to have exact inform ation about the entire system, making it a complex process. In the hierarchical approach, sites are arranged in a hierarchical order so that deadlock detection involves only some sites, m aking it sim pler than the distributed approach. However, a site can only detect deadlocks in its descendant sites [15]. 1.5 Priority Assignment Priority assignment protocols determ ine the order o f execution o f transactions. These protocols also determine w hich transactions should be blocked or restarted during deadlocks. Therefore, transactions need to be prioritized to avoid unnecessary blockages o r delays. There are three categories o f priority assignm ent techniques: static, dynam ic, and hybrid. W hen the priority o f a transaction is “assigned once and for all” , these are called static priority protocols [1], In static priority protocols, priorities o f transactions are set before the system executes the transactions and these priorities are not changed at run time. Static priority protocols require com plete inform ation about the transactions characteristics and are mostly suitable for small systems [16]. In dynamic priority protocols, the priority o f a transaction “changes from request to request” w here decisions about scheduling are m ade at 8 run-time [1]. Certain important characteristics (i.e. deadlines, slack tim e) o f a transaction change when the system restarts. Therefore, at each request, the characteristic o f a transaction is checked to determine the priority o f the transaction. In hybrid priority protocols, priorities are fixed for some transactions and varied fo r others. O ne use o f hybrid priority protocols is making some critical transactions non-preemptive w ith a static priority protocol during dynamic priority assignment [17]. Some o f the popular priority assignm ent protocols are described below: 1. First Come First Serve (FCFS): In FCFS, the transaction w ith the earliest arrival time is assigned the highest priority. Therefore, deadline inform ation is not considered during priority assignment. In other words, a new transaction w ith a close deadline will get a lower priority than an old transaction which m ay not have a close deadline. This is not desirable in a real tim e system [18]. 2. Shortest Job First (SJF): In SJF, transactions w ith the sm allest run tim e are executed next [19]. This protocol is suitable when a system has prior know ledge about the run time o f the transactions. This process produces the best result w hen the load is high because it minim izes the average w aiting time for a given set o f transactions. 3. Earliest Deadline First (EDF): In EDF, a transaction with an early deadline gets higher priority. A drawback o f this protocol is that it allocates higher priority to a transaction which is close to its 'deadline, but m ight m iss it, over a transaction that still has a chance to meet its deadline [20]. 4. M inimum Slack First (MSF): In M SF, a transaction w ith a shorter slack tim e gets higher priority. Slack time is the m axim um am ount o f tim e that a transaction can be 9 idle, but still complete before its deadline [1]. Therefore, MSF depends on both the execution tim e and the deadline o f a transaction. 1.6 Preemption In DRTDBS, transactions should be preem pted to avoid blockage o f high priority transactions [21]. If a lower priority transaction has a lock on a data item and a higher priority transaction issues a request for that lock, then the higher priority transaction has to wait until the lower priority transaction com pletes. This situation is called priority inversion [22]. Due to priority inversion, high priority transactions m ight miss their deadlines. There are two popular methods to solve the priority inversion problem: priority inheritance and priority ceiling. In priority inheritance, if a low er priority transaction T l holds a data lock and a higher priority transaction T H also requests that data lock, then TL tem porarily inherits the priority o f TH until it completes its critical section [22]. The critical section is the tim e w hen a transaction accesses shared data and is not allowed to be preempted. A fter the critical section, T l releases the lock and returns to its initial priority. The priority inheritance m ethods reduce the blocking time o f T H from the entire execution time o f T L to the execution time o f its critical section. However, this process m ight suffer from deadlocks, and the block duration can be significant if there is a chain o f blocking [22]. In priority ceiling, a transaction Tj can preem pt a blocking transaction Tj if T, has higher priority than other preem pted transactions. Otherwise, the transaction Tj is suspended, and the transaction Tj inherits T ’s priority. Priority ceiling not only m inim izes the blocking time but also prevents deadlocks because a transaction w ith an exclusive lock (discussed in the next section) will never be blocked by a lower priority transaction [22], H owever, in priority ceiling, a low priority transaction is unnecessarily blocked by a high priority 10 transaction even when an application is idle w hile reading data from or w riting data into the database [23], 1.7 Locking Protocols A locking protocol guarantees serialization o f transactions w ithin the system by utilizing locks on data items [24], I f a transaction or a sub-transaction w ants access to a shared data item, then it needs to request a lock on that data item. There are two types o f locks [25]: shared and exclusive. A shared lock is required w hen a transaction only needs to read a data item. An exclusive lock is needed w hen a transaction needs to m odify a data item. W hen a scheduler gets a request for a data item from a transaction, it checks the state o f the lock. If the data item is not currently locked or has a shared lock, then the scheduler perm its the transaction to hold the lock. Otherwise, if the data item is exclusively locked, the transaction needs to wait until the current lock has been released. This ensures that only one transaction gets accesses to a data item at a given time. However, this blocking behaviour o f a locking protocol greatly degrades the perform ance o f a DRTDBS because o f tim e constraints [11], In a DRTDBS, during a read operation, a single copy o f the data item is locked (shared) by the scheduler. During a write operation, all copies o f the data item are locked (exclusive) by the scheduler until the data modification completes [9]. One o f the m ost com mon locking protocols is the two-phase locking protocol (2PL) [11], which includes a growing phase and a shrinking phase (Figure 1.3). In the grow ing phase, a transaction only acquires locks on the required data. In the shrinking phase, a transaction frees all the acquired locks. During growing phase, no lock is released, and 11 during shrinking phase no new lock is acquired. Every transaction has to go through these phases in order to guarantee the consistency o f data [6]. Lock A cquired Acquire lock 1 R e le a s e Lock R e le a s e Lock Acquire lock ▼ Time 1 2 3 Growing P h a s e 4 | 5 6 Locked P h a s e J 7 8 Shrinking P h a s e Figure l . 3: Two-Phase Locking - G rowing and Shrinking Phase [9] The 2PL can be static or dynamic [I], Dynamic tw o-phase locking (D 2PL) and static two-phase locking (S2PL) work similarly, but have different lock settings. D 2PL sets locks on the data item required for a transaction and keeps the data locked until the transaction completes. S2PL sets locks on the data item beforehand, using prior know ledge o f the transactions that will access the data item. In a DRTDBS, especially for hard real tim e transactions, this prior knowledge is easily accessible [l]. Distributed S2PL decreases the number o f messages transmitted between sites in com parison to D2PL, because all lock requests o f a transaction are transm itted as one message. This also reduces the tim e delays for setting remote locks. Another advantage o f S2PL is that a blocked transaction cannot hold locks, meaning deadlocks do not occur. Therefore, D2PL, w ith its shorter average lock holding time, is preferable over S2PL for conventional non real tim e database system s [1], 12 1.8 Commit Protocols Commit protocols ensure that m odification o f data by transactions w ill be perm anent after transactions successfully com plete [9], One im portant feature o f transactions is atomicity, which can be secured by com m it protocols. To ensure atomicity o f a transaction, commit protocols prevent locks on data from being released until the m odification o f data becomes permanent [26]. In the distributed environment, atom icity is violated if some transactions com m it at some sites and abort at other sites [1]. Therefore, all the participating sites need to agree on committing or aborting. Moreover, to m aintain atomicity, once a cohort is ready to com m it “it has to retain all its data locks until it receives the global decision from the m aster”, w hich might cause priority inversion [1], The two-phase commit protocol (2PC) is the m ost com m only used distributed com m it protocol. The fundamental workflow o f the 2PC protocols explained by G upta et al. [27] is described below: The 2PC protocol has at least two-phases: the prepare phase and the commit phase (Figure 1.4). A commit protocol starts execution, when the m aster receives a W ORKDONE message from all the cohorts. In the prepare phase, the m aster sends PREPARE messages to all cohorts in parallel. A fter getting the PREPARE m essages, the cohorts vote for committing or aborting the execution. I f a cohort finds a suitable environment for committing, it sends a YES vote to the m aster and w rites a prepare log record to their local storage. This is called the “prepared state” for the cohorts. H owever, the cohort cannot commit until they get the final decision from the master. O n the other hand, if a cohort cannot complete the execution, it sends a N O vote to the master. The cohort w rites 13 an abort log to their local storage and aborts im mediately as a N O vote is considered a veto [27]. This is the end o f the prepare phase. WORKDONE WORKDONE I I PREPARE PREPARE YES NO COM M IT ABORT ACKNOWLEDGEMENT M ------------------------------ ACKNOWLEDGEMENT ^ ----------------------------- I I Transaction Commits Transaction Aborts Figure 1.4: Comm it Protocol - Com m it and Abort Paths The commit phase starts w hen the m aster receives the votes from all the cohorts. If there is not a single N O vote, then it writes a commit log record and sends the global decision, which is a COM M IT message to all the cohorts. This is called “com m itting state” for the master. W hen the global decision reaches the cohorts they write a com m it log and enter the “committing state” . The cohorts com mit by sending an A CKN O W LED G EM EN T message to the master. On the other hand, if a single NO vote is received, the m aster w rites an abort log and sends the global decision as an ABORT m essage to all the cohorts. This is called “aborting state” . After receiving the global decision o f abortion, all the cohorts write an abort log, and abort the transaction by sending an A CKNOW LEDGEM ENT m essage to the master. Upon receipt o f this m essage from all cohorts, the m aster writes an end log record and discards the transaction. There are several variants o f 2PC [1]: presum ed abort/commit protocols, one-phase commit protocols, and three-phase com m it protocols. The presumed abort/com m it protocols 14 reduce the message and logging overheads by m aking an explicit commit presum ption about the committing or aborting o f transactions. W hen a cohort recovers from the failure state, it communicates with the m aster for available inform ation about the transaction. I f the m aster does not have information available about the transaction, then the cohort can assum e that it has aborted. On the other hand, if the cohort gets the com m it decision from the m aster, it commits. The cohort then does not need to send an acknowledgment for the A B O R T or COMM IT message and also does not need to write an abort/commit record to the log. The master also does not write the abort/commit record and the end record [27] [28]. One-phase com mit protocols (1PC) com bine the com m it and prepare phases into one phase by removing the cohort voting phase to com m it or abort. The cohorts enter into “prepared state” at the time o f sending the W ORKDONE message. Thus 1PC elim inates one entire phase, which reduces commit processing overhead and delay [29] [30]. H ow ever, due to the long prepared state, 1PC suffers from priority inversion, because data locks cannot be preempted in the prepared state. In 2PC and 1PC, even if a single site fails, all participating cohorts “rem ain blocked until the failed site recovers” [27]. Three-phase com m it protocols remedies this problem by using an extra phase, which is called “precom m it phase” . This phase occurs betw een the two phases o f 2PC, and makes a preliminary decision about com m itting or aborting transactions. The preliminary decision then helps all the participating sites, to reach a global decision even though the master fails. However, three-phase com m it protocols increase the com m unication overhead by adding an extra m essage exchange betw een the cohorts and the master. Moreover, it forces the cohorts and the master to write a record to the logs in the “precom m it phase” [27] [31], 15 1.9 Concurrency Control Protocols (CCPs) CCPs make sure that multiple users can access data concurrently in a database management system. A CCP must protect data updates o f one user from access and updates o f another user until the first update becom es perm anent [32]. CCPs m aintain the serialization o f the transaction operations, and guarantee that the transactions will m aintain atomicity. Therefore, the m ain goal o f a CCP is to maximize the concurrency and m aintain consistency o f the databases. O n the other hand, CCPs in a DRTDBS ensure that transactions are meeting their deadlines, in addition to maintaining consistency constraints o f the databases [33], In a DRTDBS, maxim izing the concurrency is not enough. The transactions need to be prioritized to maximize the schedulability, w hich helps transactions m aintain their tim ing constraints. It is also im portant that transactions are preemptible to reduce the blocking time o f transactions. Thus CCPs in a DRTDBS m inim ize the duration o f blocking time by utilizing efficient priority assignm ent and preem ption protocols. As stated earlier, CCPs are classified into two types: optim istic or aggressive and pessimistic or conservative [6], 1. Optimistic protocols do not block transactions; rather they optim istically schedule them instantaneously. This im mediate scheduling can violate the serialization order o f operations if the scheduler receives an operation later w hich should have been scheduled earlier than an executed operation. In this situation, optimistic protocols abort the transactions to m aintain the serialization. T he optimistic process is a faster process, but it m ight result in a higher num ber o f transaction rejections. 16 2. Pessimistic protocols block an operation o f a transaction im m ediately if there is any data conflict and continue blocking until the possibility o f data conflicts disappears. Delaying the operations by blocking decreases the possibility o f data conflict and abortion during transactions, but excessive delays can cause transactions to exceed deadlines. Pessim istic protocols are suitable for transactions that rarely conflict. 1.10 Contribution Few studies have been done on concurrency control protocols in a DRTDBS as it is difficult to manage distributed data and deadlocks, and coordinate transactions and their sub­ transactions performing at different sites. A priority protocol plays an im portant role in a real time system as it determines whether a transaction w ill be completed on tim e or not [16]. EDF is an optimal priority protocol, because if EDF cannot schedule a transaction, then it is not possible for other priority protocols to schedule that transaction [34]. H owever, the concept o f assigning higher priority to transactions w ith the earliest deadlines is not suitable in high load conditions, because transactions m ight m iss their deadlines due to lack o f tim e [16]. An important CCP in DRTDBSs is ASL which controls a transaction’s access to data based on a fixed priority protocol. W e investigated the performance o f A SL protocol using several common priority protocols under different system configurations. R esults o f the experiments indicated that a priority protocol that performs w ell under certain configuration, may perform poorly or moderately under other configurations. This allowed us to determ ine a set o f load ranges in w hich different priority protocols perform superiorly. To m axim ize the performance under all system conditions, we concluded that an adaptive approach to selecting priority protocols is needed. Researchers have been trying to achieve adaptive approaches to utilize the performance variations o f different priority protocols under varying system conditions [16] [35] [36]. In all techniques, a com m on practice is to sw itch betw een priority protocols o f a system based on the load, to im prove the overall perform ance o f the system. However, no research has been done so far to im prove the perform ance o f A SL protocol by dynamically switching between priority assignm ent protocols. We use the load ranges determined as explained earlier to create an adaptive protocol, called Adaptive Priority Assignment Protocol (APAP). APAP uses the load condition at run tim e to decide which priority protocol should be used next while keeping A SL as the underlying CCP. Using this approach we observe significant improvement in the overall system performance. The protocol and results are presented in Chapter 4. 18 Chapter 2 Related Work W e use Adaptive Speculative Locking (ASL) as the underlying concurrency control protocol (CCP) in this thesis; therefore we needed to understand ASL and its techniques. W e evaluated ASL and reviewed current research about the speculative locking approach, w hich is the underlying structure o f ASL. W e also studied transaction scheduling techniques w hich consider changing the system environm ent during run time. This chapter is divided into two sections. In the first section, w e review the A SL protocol with other locking and com m it protocols, from which A SL inherited properties such as lending uncommitted data and adaptive approach. W e also describe inheritance techniques among those protocols. In the second section, we discuss some scheduling techniques that adaptively switch between priority protocols depending on the system environm ent and that are most relevant to our research. 19 2.1 Evaluation of the ASL protocol A SL inherits the concept o f speculation locking o f data from SL protocols [7]. However, the concept o f lending uncom m itted data has also been used in the past in other CCPs, such as PROMPT, PEP, SL, etc. A SL also borrowed the concept o f m onitoring the system performance and making adaptive decisions to change system behaviour from A EP [8 ]. In this section we discuss those CCPs briefly. 2.1.1 Permits Reading of Modified Prepared-Data for Timeliness Permits Reading O f M odified Prepared-data for Timeliness (PROM PT) is a com m it protocol based on firm-deadline designed for the DRTDBS [37], It also extends the concept o f centralized 2PL high priority (2PL-HP) for distributed real tim e environments. In the 2PLHP protocol, if a higher priority transaction is holding a lock on a data item, then all requests for that lock will be blocked until the lock is released. On the other hand, if the requesting transaction has higher priority, then the lock holding transaction is aborted im m ediately to release the lock. PROMPT extends this concept further by adding three m ore steps: 1) in the prepared state, read locks are released by the cohorts just after the cohorts receive the PREPARE message from the master. However, update locks are still held by the cohorts until the global decision about com m itting or abortion is available; 2 ) it is not possible to abort a cohort if it is in the prepared state; and 3) transactions can lend uncom m itted data optimistically when the lending transaction is only in the com m it phase. W hen the borrow ing transactions have access to the uncom mitted data, there are three scenarios that describe the interaction between the lenders and the borrowers: 1. The global decision o f com m itting or aborting for the lender is available, but the borrow er’s local execution is still incomplete, In this case, the lender com m its, or 20 aborts depending on the global decision, and the borrower follow s the lender to com mit or abort. 2. The borrow er completes its local execution, but the global decision for the lender is still not available. In this case, the borrow er has to w ait and is not allow ed to take any initiative related to com mitting until the global decision for the lender is available or the borrower misses its deadline. This situation is called “put on the s h e lf’. If the lender receives the global decision to commit, then the lender com m its and the borrower initiates commit related processing that is called “taken o ff the s h e lf’. If the lender aborts then borrow er’s data becomes useless and the borrower aborts. 3. If the borrower aborts during data processing, then the lending is cancelled and the borrow er’s updates are rolled back. PROM PT has three additional features to m ake the data lending process faster and avoid wasting system resources: active abort, silent kill, and healthy lending. 1. In active abort, if a participant cohort is about to abort locally, it sends this information immediately to the master, rather than waiting for the com m it phase. Active abort provides a transaction m ore time to com plete and also facilitates proper usage o f both logical and physical system resources. 2. In silent kill, if a transaction is rejected before the commit phase o f the m aster, then the rejection is recognizable by the cohorts without communication w ith the master. Therefore, the master does not need to invoke the abort protocol, because abortion happens silently. The silent kill process saves system resources b y eliminating message passing betw een the m aster and the cohort. 21 3. In healthy lending, if a transaction is about to miss the deadline, then the transaction is disallowed to lend data to avoid abortion o f the borrower transaction. PROM PT permits borrow ing locked data, but it does not have a cascading abort problem for two reasons. First, the lending transaction is always expected to com m it, because it is in the prepared state, so local data conflicts cannot abort the lender. Secondly, the sibling cohorts are going to commit, because all prior data conflicts are handled. M oreover, PROM PT has a controlled lending policy w hich does not perm it the borrow er to be a lender simultaneously, so PROMPT affects only the im m ediate borrow er [37], PRO M PT’S performance was studied by H aritsa et al. [37] against 2PC, presum ed commit, presumed abort, and 3PC for sequential and parallel transactions during both high level o f data and resource contention, only high level o f data contention, slow and high network speed, and high and low degree o f data distribution. In all experim ents, PRO M PT performed better than the other protocols, especially in the low load condition. PRO M PT also showed a higher borrowing rate (the average num ber o f data item s borrow ed per transaction) during low to medium load. PR O M PT’S success ratio (the fraction o f tim es that a borrowing was successful) was 1 during low load, but decreased w hen the system load increased. Therefore, PROM PT perform s poorly in high load condition. H owever, the success o f the borrower depends on the success o f the lender, because the borrow er has to abort if the lender aborts. On the other hand, if the borrow er completes before the lender, the borrower cannot commit until the lender com pletes, resulting in increase o f transaction execution time. 22 2.1.2 Prompt-Early Prepare Prompt-Early Prepare (PEP) is a one-phase (1PC) real tim e commit protocol based on a RTDBS. PEP integrates the early prepare (EP) protocol w ith the lending property o f the PROMPT protocol [29]. The standard (2PC) protocol has higher m aster/cohort communication overhead. To reduce the com m unication overhead, PEP overlaps the prepare and commit phases into one phase using EP protocol, w hich is a 1PC protocol. EP reduces the transaction execution time by rem oving the voting phase o f the 2PC protocol. In PEP, the prepared state o f a cohort starts at the tim e o f sending the W ORKDONE m essage to the master [38]. PEP is also optimized by a presumed com m it mechanism, w here the m aster sends the commit decision, but cohorts do not need to send A CKN O W LED G EM EN T messages to the master. The m aster also does not write an end log record, rather it w rites a membership log record to identify all the cohorts involved in the execution. However, being a 1PC protocol, EP suffers from priority inversion because o f the long duration o f the prepared state. The situation deteriorates if the participating transactions are sequential, where cohorts execute one after another rather than parallel. PEP deals w ith this problem by incorporating the concept o f lending prepared data. PEP also incorporates the active abort policy o f PROM PT to reduce the response time. Haritsa and Ramamritham [29] compared the perform ance of PEP w ith PRO M PT, EP, and CENT (a centralized system) in both parallel and sequential transaction environments. In both environments there were four experiments: 1) data and resource contention, 2) pure data contention, 3) fast netw ork interface, and 4) highly distributed transaction. During the first two experiments, both EP and PEP outperformed PR O M PT and PEP outperformed EP, because o f the m essage passing overhead o f PROMPT. In the case o f 23 fast network interface, PROM PT perform ed better, but was still outperformed by PEP. PEP was only outperformed by PROM PT in the sequential high distributed transaction environment when the priority inversion period o f PEP is m uch longer than PROM PT. PEP exploits the concept o f optim istically lending uncommitted data from PROM PT. Moreover, PEP reduces m essage and logging overheads through the use o f 1PC protocol. However, there are a few issues w ith PEP. PEP suffers from a high num ber o f transaction aborts because it goes into the prepared state when data processing is still unfinished. The extension o f the prepared state duration increases priority inversion. A lso, deadlocks can occur, because a lender transaction, w hich has already lent data items, can still access new data items. 2.1.3 Adaptive Exclusive Primary Adaptive exclusive prim ary (AEP) is an adaptive concurrency control protocol from w hich ASL borrowed the concept o f dynam ically changing behaviour during run tim e [9]. AEP is designed for distributed database systems and dynamically sw itches betw een an optimistic and a pessimistic CCP to im prove data and resource contention issues [39]. The optimistic and pessimistic CCPs are the exclusive w riter with locking option (EW L) protocol and the priority site locking (PSL) protocol, respectively. EW L has a controlling site called exclusive writer and primary site (EW /PS). In EWL, a transaction updates data in the database optim istically without considering any data conflict. After the transaction completes, it sends a request for the update to the EW /PS. The request would be approved and the data update w ould be permanent i f there is no data conflict; otherwise, the transaction needs to w ait in a queue. 24 PSL controls the access to each file in the database system using a prim ary site (PS), which also supports transaction execution as an ordinary site. A n update transaction sends a lock request for a file in the database to PS. PS checks the file status and approves the lock request if the file is available; otherwise, PS disapproves the lock request, w hich forces the transaction to wait in a queue. AEP dynamically switches betw een the optim istic and pessimistic protocols if potential data conflicts are found. A transaction begins with executing the EW L protocol. W hen data conflict occurs, AEP switches to the PSL protocol. AEP also uses PS to control access to a distributed file system like PSL and EWL. AEP maintains a registry to keep track o f the local active transactions which are incomplete. The registry helps a new transaction in determining any potential data conflicts. A ccording to this information, A EP perform s PSL if there is a potential conflict; otherwise, it perform s EW L, from then on. Tai et al. [39] state that AEP has three assumptions: 1) “with EW L and AEP, an access conflict is detected using a method based on sequence numbers” ; 2 ) “all transactions are read/write transactions”; and 3) “there is only one file in the database system in question and the file is replicated at and shared among all the sites” . The operational procedures o f PSL, EWL, and AEP are similar and can be divided into tw o steps. First, the prim ary site gives permission for a transaction to update a file. Second, the transaction checks if there are any conflicts, and then it decides w hether or not to update the file. The performance o f AEP is discussed by Tai et al. [39] in com parison to PSL and EW L by varying the transaction inter-arrival tim es in two different transaction execution rates. In high transaction execution rates, EW L always outperformed PSL, but in low 25 transaction execution rates, EW L showed little im provem ent over PSL. H owever, A EP always outperformed PSL and EWL. The adaptive concept o f AEP inspired ASL; however, A EP is not properly implemented in a real tim e distributed database environment. Transaction deadlines and tim e constraints were discussed, but the im plem entation o f those variables is not obvious. Moreover, AEP cannot detect any conflict at a non-local database site. 2.1.4 Speculative Locking Speculative locking (SL) protocols extend standard 2PL protocols to allow parallelism among conflicting transactions [40] [7]. The 2PL protocol does not perm it an uncommitted data item to be shared. SL allows any transaction to borrow uncom m itted data from the conflicting transactions. The borrow ing transaction can have access to tw o versions o f data: a before image, w hich is the data before the conflicting transaction updates it, and an after image, which is the updated data produced by the conflicting transaction. The borrowing transaction then perform s speculative operations on both the before and after images o f the data. If the conflicting transaction commits, the borrowing transaction retains the after image o f the data, otherw ise it keeps the before image o f the data. Therefore, transaction blocking time is low, m aking transaction processing faster. Transaction processing in a database system is show n in (Figure 2.1 [7]). The notation S; indicates the start o f execution, E; is the com pletion o f execution, Cj is the completion o f commit processing and Aj is the abortion o f a transaction, Ti. Figure 2.2 illustrates the transaction processing for 2PL, where two transactions, Ti and T 2, need access to the pages X, Y and X, Z, respectively. The transaction T 2 needs to wait until transaction Ti commits and releases the lock on page X. 26 Execution Commit j Cj/Ai EEi ----------------------- ► 5i Time Figure 2.1: A Transaction Processing [7] Speculative locking processing is shown in Figure 2.3 taken from [7], Transaction Ti has locks on pages X and Y. Ti releases the locks ju st after it completes processing and creates before and after images o f the data X (X i) and Y (Yi). T 2 requests for locks on X and gains locks on both X and Xi. T 2 starts speculative executions T 21 and T 22 right aw ay and creates after images o f both X and X] which are X 2 and X 3 . I f T 1 com m its then X 3 w ill remain, otherwise X 2 will remain. Ti ri[X] w ^ ] n[Y] w ^ , ] r2[X] w 2[X!] r2[Z] w ^ ] T2 Tim e Figure 2.2: 2PL Processing [7] T 1: n [X ] w ^ ] S, T 2i: r-i[Y] w , ^ ] I Ct r2[X] w 2[X2] r2[Z] w 2[Zi] T 2: T22: r2[Xi] w 2[X2] r2[Z] w 2[Z2] e2 c T im e Figure 2.3: SL Processing [7] 27 Lock Requested B yT , Lock Held by Tj Lock H eld by Tj R W Lock Requested By T; R yes no R yes no spyes W no no EW sp-yes no sp y es (a) R EW SPW (b) Table 2.1: Lock Compatibility Matrix (a) 2PL and (b) SL [7] Lock com patibility m atrixes for both 2PL and SL are shown in Table 2.1 [7], A typical 2PL protocol has two types o f locks: read (R) and write (W). A transaction requests a read lock to read a data item and a write lock to update a data item. O n the other hand, to perform speculative operations, SL has two forms o f write locks: execution-write (EW ) and speculative-write (SPW). An update transaction requests EW locks on the data. W hen the operation creates an after image, SL converts the EW lock into SPW lock, and the data becomes available to other transactions. In SL, the number o f parallel transaction processing increases exponentially as the level o f lending increases. For example, for n num ber o f transactions, there can b e 2n number o f speculative executions. Therefore, if n transactions conflict, there are 2 n num ber o f possibilities for term ination o f the transactions. This version is known as the naive version and is indicated by SL(n). D ue to the high num ber o f speculative executions, SL suffers from a higher number o f transaction abortions. To solve this issue, SL introduces som e variants to restrict the number o f speculative executions. These variants are described below: 28 > SL(O): This variant is very optimistic. It assumes that a transaction w ill abort and terminate if any prior transactions abort. > SL(1): In this version, if m ore than one prior transaction aborts then current transaction will abort. > SL(2): In this version, if m ore than tw o prior transactions abort then current transaction will abort. SL is a faster process than 2PL, because transactions can start executing sooner. SL opens a new door in concurrency control protocol research. However, a large num ber o f speculative executions can occur w ith SL, causing data contention that degrades the performance o f the system. All the SL variants assum e that memory is unlim ited, w hich makes them inappropriate for m any systems. 2.1.5 A d ap tiv e Speculative L ocking Adaptive Speculative Locking (ASL) is based on the SL protocol and follows the adaptive nature described in the A EP protocol [9] [ 8 ]. A SL uses the same underlying architecture o f SL for transaction processing. However, SL assumes infinite system m em ory, which is unrealistic. To remedy this, A SL m axim izes the size o f the local buffer and uses a page-based virtual memory mechanism. This mechanism is a m em ory m anagem ent algorithm that controls allocation and de-allocation o f the m em ory space. Since SL suffers from high data contention when the num ber o f speculative executions explodes, three variants were introduced, SL(0), SL(1), and SL(2). These variants restrict the num ber o f speculative executions depending on the num ber o f previous transaction aborts. A SL does not depend on the number o f previously aborted transactions and introduced the following 29 techniques: hyper-threading, m emory m anagem ent including virtual memory, and transaction queue m anagement [9] [8 ] for improving performance. Hyper-threading (HT) is a Simultaneous M ulti-Threading (SMT) technology. H T utilizes instruction level and thread level parallelism to achieve perform ance gains. Therefore, a single physical processor can run concurrent executions o f m ultiple separate instructions. In HT, one physical processor has two architectural states. O ne architectural state represents a logical processor and can execute an instruction stream. Therefore, one physical processor can act as two logical processors and can process tw o concurrent processes or threads simultaneously. H T shows a 65% performance increase over previous generation processors. However, HT is application and hardw are dependent. A SL exploits HT by creating a thread for each speculative execution or concurrent process [41]. ASL protocol has a very effective m em ory m anagem ent system. It uses tw o types o f memory concepts: system cache and virtual m em ory [9] [ 8 ]. The cache is a volatile and easily accessible storage area, managed by a cache manager, w hich is used for storing short term data. On the other hand, virtual m em ory is a technique that implements an operating system paging concept to improve lim ited m em ory issues, w here data are m oved to the physical disk w hen the cache is full, as if part o f the cache. A new transaction requires enough m emory space to be reserved either in the cache or in the swap disk before requesting data from the database. W hen the transaction has enough space reserved, it reads data from the disk into the cache, which are then locked by the transaction manager; otherw ise, it waits until enough space is available. The transaction then processes the data and releases the locks. In memory m anagem ent systems, a transaction cannot block other transactions until it locks all required pages in the m em ory A SL considers all versions o f a page as a unique page 30 when they are in the memory or m oving back and forth between the m em ory and the disk. [9] [ 8 ], A SL protocol shows its adaptive nature by using the transaction queue m anagem ent (TQM). Page swapping is an effective w ay to solve limited memory issues, but it creates high communication overhead. TQM balances betw een cache utilization and page swapping. To improve the cache utilization, TQ M holds or releases transactions from the queue depending on the available space in the system cache. TQ M has two param eters: hold level (HL) and enter level (EL). H L and EL help to determ ine the available system cache (A SC) which helps the transaction m anager to com pute the am ount o f total system cache utilization (TSCU). If TSCU is greater than HL, then the transaction manager does not deliver any transactions or subsequent transactions to the scheduler, and keeps checking the A SC value. The transaction manager releases transactions from queue w hen the ASC value becom es less than EL. By using the correct configuration o f HL and EL values, TQM helps to m inim ize data contention and maximize the cache use [9] [ 8 ]. The performance o f ASL was tested against all variants o f SL by varying cache sizes, number o f transactions, inter-arrival times, disk sizes, percentage of read/w rite operations, and HL and EL values. ASL outperform ed SL in all experiments. H ow ever, A S L ’s performance degraded in high load condition [9] [ 8 ]. In those experiments, A SL used a static priority protocol approach which is not effective under all load conditions. Therefore, performance o f ASL can be improved by switching betw een priority protocols at run time. In this thesis, we propose an adaptive priority protocol approach for ASL. 31 2.1.6 Synchronous Speculative Locking Protocol for Read-Only Transactions Synchronous speculative locking protocol for read-only transactions (SSLR) is based on the SL protocol [42], Transactions can be divided into two types: read-only transactions (ROT), which only read data, and update transactions (UT) w hich modify data. In SSLR, a ROT can access data items, which are held by a UT and perform speculative executions. O n the other hand, a UT- is not allowed to access data items that are held by a U T and has to wait until the data items are released. The lock compatibility matrix for SSLR is shown in Table 2.2 [42], For w rite locks in SSLR, if one transaction holds a SPW lock then unlike SL no other transactions can get the EW lock. There are two types o f read locks in SSLR: a read lock for U T s (RU) and a read lock for ROTs (RR). Lock Requested B yT , RR RU EW SPW RR yes yes no sp_yes RU yes yes no no EW no no no no Lock Held by Tj Table 2.2: Lock compatibility matrix for SSLR [42] In SSLR, committing o f a RO T does not depend on preceding conflicting transaction like it does in SL. If the preceding transaction is still uncom m itted then the R O T com m its with the before images o f the data items. However, if the preceding transaction com m its before the ROT commits, then the ROT com m its with the after images o f the data items. For example, T 2 is a ROT and T 3 is a UT (Figure 2.4). T] is a running UT w hich produces an 32 after image x t o f data xo. Ti accesses both X] and xo and perform s two transactions T2i and T 22 - As T 2 completes before T i, T 21 will remain. W hereas, as T 3 is a UT, it w ill w ait until Ti completes [42], T,: X i [ X q ] W, [x , ] q Ta S [y0] W j [y, ] i, [p0] w, [p, ] i 2[Xo]r2[Zo] c T22 T3: time c Figure 2.4: SSLR Processing [42] SSLR not only outperform s other read-only transaction based protocols, but also improves issues with the correctness o f transactions (serialization) and the data currency which represents how recently a requested data item was changed. However, w hen a RO T completes execution before the preceding UT, and com mits before the U T com mits, the order o f transaction executions changes, w hich m ight make the system unstable w hen the system load changes frequently. The SSLR has two variants. The first is A synchronous Speculative Locking Protocol for ROTs (ASLR) [43] where a RO T can execute asynchronously, reducing the w aiting tim e o f the speculative transactions. Rather than waiting for the conflicting U T to produce after images, the ROT is allowed to access available data item versions to carry out speculative executions. The transaction can start other speculative executions independently based on the available after images. For example, in Figure 2.5, T 2 is a RO T which is conflicting w ith a UT Ti. The speculative transaction T 21 o f T 2 can access xo and start speculative executions. 33 W hen transaction T] produces Xi, the speculative transaction T 22 is started. T 2 can com m it when any one o f the speculative transactions complete. r 1[Xq] Wi M r i[Pol W ,[P|] r 1[q0] w ,[q,] s3 c3 time : > Figure 2.5: A SLR Processing [43] The second variant is Synchronous Speculative Locking Protocol for R O Ts exploiting Semantics (SSLR-S) [44] [45], Parallelism can be improved by using a property o f ROTs, called “compensatability” that reduces waiting time significantly. In “compensatability”, when a ROT is in conflict with a UT, a list is created and recorded w ith identification numbers o f the UT and the data item m odified by UT. In the com m it process, the ROT reads the update value o f the data item from the transaction log by using identification numbers. The SSLR-S classifies ROTs into tw o types: com pensatable RO Ts (CROTs) and non-compensatable ROTs (NCROTs). A CROT is processed w ithout blocking. When a CROT conflicts with an UT, it shows “compensatability” . H ow ever, N C R O T follows synchronous speculation like SSLR. For example, Ti is a CROT and T 2 is an UT. Here T | conflicts with T 2 on data item xo.but it perform s a parallel execution w ith T 2 w ithout any blocking. W hen Ti commits, it reads the update value o f xo from the transaction log and performs a compensation operation. 34 2.2 Switching Between Priority Protocols This section describes some scheduling algorithm s m ost relevant to our thesis, w hich switch between priority protocols during run time. ED F is the m ost used priority protocol in real time transaction processing. However, ED F does not perform well during high load conditions. The following methods use EDF under low load conditions, but sw itch to a different priority protocol w hen the system load becom es high. 2.2.1 A daptive E arliest D eadline Adaptive Earliest D eadline (AED) is an adaptive scheduling algorithm based on a RTDBS [20]. Under low or m oderate resources and data contention, ED F results in the fewest missed deadlines, but when the load increases gradually, perform ance o f EDF degrades abmptly. To improve the perform ance o f EDF in an overloaded environm ent, A ED incorporates an adaptive approach using a feedback control mechanism. A ED divides the transactions into two groups nam ed HIT and MISS. Transactions w hich have higher probability o f meeting deadlines fall into the HIT group. O n the other hand, transactions which are less likely to meet their deadline are categorized as the MISS group. It is always expected that the transactions that can m eet their deadlines should be in the HIT group. Transactions in the HIT group follow EDF scheduling and transactions in the M ISS group follow random priority (RP) mapping. To uniquely identify transactions, a random ly generated key is assigned to a new transaction, which is then used to order a list o f transactions. The position in the list is very important as it is used to determine the relevant group for the transaction. To determ ine a group, the position o f the transaction w ithin the list is com pared with a dynam ic control variable, called HITcapacity. The transaction goes to the HIT group if the value o f position is 35 less than or equal to the HITcapacity value, otherwise, it goes to the M ISS group. A feedback process is used to set the value o f the HITcapacity. Initially, a scheduler called the priority mapper initializes the value o f the HITcapacity. Tw o parameters, H ITbatch and ALLbatch, are used for com puting two ratios called hit ratios, H itR atio(H IT) and HitRatio(ALL), that make sure only the transactions which can complete are in the H IT group. HitRatio(HIT) represents the fraction o f transactions in the HIT group that m eet their deadlines. HitRatio(ALL) represents the same m easurem ent in terms of all transactions in the system. In an ideal case, the HIT group has a HitRatio(ALL) o f 1.0 and the M ISS group has a HitRatio(ALL) o f 0.0. The hit ratios are continuously checked and fed back to the priority mapper. The hit ratios values help the priority m apper to re-evaluate the H ITcapacity value, which is an iterative process [2 0 ]. Haritsa et al. [20] com pared the perform ance o f AED with EDF, random priority (RP), no priority (NP), and latest deadline (LD). AED perform ed like ED F during low load conditions when EDF outperform ed all other priority protocols, and perform ed like RP during high load conditions when RP outperform ed all other priority protocols. Therefore, EDF or RP performs well in a particular load condition while AED performs w ell under all load conditions. They also show that the H itRatio(A LL) for H IT group varied from 1.00 to 0.98 when the system moved from low to high load, whereas, the HitRatio(ALL) for M ISS group varied from 0.0 to 0.1. However, depending on a random number key for determ ining the transaction groups might not always be accurate. M oreover, calculating H ITcapacity is a complicated process, which requires prior know ledge o f the transaction characteristics. A ED does not target distributed systems, so its perform ance in a distributed system is not tested. 36 2.2.2 Sectional S cheduling Sectional scheduling (SS) is an adaptive approach for transaction scheduling based on hard real-tim e systems, which changes behaviour according to the system environm ent [16]. The system environment does not rem ain constant. It changes as the system evolves, w hich makes the real time transactions vulnerable to fail their executions before deadlines. ED F is the most stable scheduling algorithm in real time systems, but it perform s unpredictably when system characteristics change, especially w hen the system load changes. SS m easures the current load o f the system and adequately adapts to changes in the system environm ent. According to SS, the system load can be partitioned into three cases: norm al load, overload, and serious overload. The load is indicated by p. In normal load or low load (p <= 1), as EDF shows 100% processor utilization, SS uses EDF for transaction scheduling. In overload conditions (1 < p <= 3), SS uses Deadline/Value First protocol w hich prioritizes transactions according to their values. A value represents the importance o f a transaction in relation to other transactions [34], SS has three principles. The first principle is that a transaction has higher priority if it has an earlier deadline or a larger im portance value than another transaction. The second principle is if two transactions have the sam e deadline, the transaction with the larger im portance value will have the higher priority, or if tw o transactions have the same importance value, then the transaction with the earlier arrival tim e will have the higher priority. Finally, the third principle is if two transactions have the sam e importance value and deadline, then the transaction w ith the earlier arrival tim e w ill have the higher priority. In serious over load conditions (p > 3), SS follows H ighest V alue D ensity First (HVDF) where a transaction has the highest priority if it has the highest value density. HVDF is based on the Highest V alue First (HVF) algorithm where a transaction has the 37 highest priority if it has the highest im portance value. The assumption is that a transaction has higher priority if it has a higher value density and a lower slack time [16]. SS has an improved version, w hich is called robust sectional scheduling (RSS), w hich is more predictable than SS in overload condition. In this condition, RSS rejects a transaction w ith the least value that is affecting the system load. RSS has a recovery m echanism in which the rejected transactions are stored in a queue, and w hen the system is idle, RSS recovers those transactions [16]. Ding and Guo [16] compared SS w ith EDF, HVF, and HVDF by varying the load o f the system. SS outperformed all other priority protocols under all load conditions. In comparison w ith SS, RSS had a low er num ber o f m issed jo b s under high load conditions. However, SS is not implemented in a DRTDBS w here transaction operation type (read or write) might affect the system load. It is also unclear how changes in system load w ere implemented. 2.2.3 Maximum Miss First M aximum M iss First (MMF) is a non-preem ptive scheduling algorithm for soft real time systems [35]. M M F schedules a transaction by calculating the m iss ratio o f the transaction. M iss ratio is the num ber o f m issed jobs at a certain time divided b y the num ber o f released jobs during that time. The m iss ratio o f a transaction tj is defined as [35]: M R j(t) = N ^iss (n 1 where A//™55 ( t) is the num ber o f m issed jobs o f transaction t; at time t and N?ob ( t) is the number of released jobs o f transaction Tj at tim e t. The values o f /V;miss ( t ) and /v/o£>(t) are zero for a new transaction Tj, but they increase as t j releases jobs. 38 MMF assigns priority to the transaction w hich has the highest m iss ratio. If tw o transactions have the same miss ratio, then M M F uses the EDF priority protocol for scheduling transactions. W hen the system load is low, the miss ratio is zero; therefore, M M F acts like EDF, which is preferable under low load conditions. When the load increases gradually, the miss ratio becomes the key factor for scheduling [35]. Asiaban et al. [35] com pared M M F w ith EDF, gEDF and other sim ilar scheduling algorithms. MMF showed the same miss ratio for different transactions w ith different periods while the other algorithms did not. Therefore, M M F works w ell for all transactions. M M F produced a lower num ber o f consecutive missed jobs in comparison w ith other algorithm s. MMF also showed better jitter (the m axim um tim e variation between the finishing tim es o f any two consecutive jobs that completed successfully). M M F demonstrated the sam e system utilization as EDF and gEDF. MMF does not depend on the execution time o f the transactions w hich m akes it m ore stable. However, if the num ber o f consecutive m issed jobs is high, then M M F can block the transactions for a long time because o f the high num ber o f m issed attempts. This can cause transactions to miss their deadlines, w hich degrades the performance of the system. 2.2.4 G ro u p -E D F Group-EDF (gEDF) is a scheduling algorithm designed for non-preem ptive soft real time systems. It uses both Earliest Deadline First (EDF) and Shortest Job First (SJF) to schedule a transaction in the system [36]. It creates groups o f transactions based on their deadlines, where deadlines o f the transactions are close to each other. A group in the gEDF algorithm is created based on a group range param eter, Gr, represented by a percentage value. If the deadline value o f a transaction falls under the G r percent of the deadline value o f 39 the current transaction, the transaction is considered to be in the same group as the current transaction. Groups are scheduled based on the EDF protocol; when the deadlines o f all transactions in a group are earlier than the deadlines o f all transactions in other groups, then the first group has higher priority. H owever, gEDF follows SJF to schedule individual transactions within a group. Li et al. [36] studied the perform ance o f gEDF by varying load tolerance (w hat extent a transaction can miss the deadline), the deadline value o f the transactions, the execution time o f the transactions, etc. gEDF perform s like EDF during low load, but perform s better than EDF during high load. The gEDF protocol also outperforms best-effort algorithm s which switch priority protocols according to the system load. According to Li et al. [36], though gEDF outperform s EDF, it does not guarantee fairness because it has tendency to favour only small transactions. H owever, if w e concentrate in increasing the num ber o f com pleted transactions before the deadlines, then this is a good strategy. Nevertheless, gED F does not consider distributed real tim e system s where a transaction m ight have a num ber o f sub-transactions, so grouping o f transactions at different sites and coordinating betw een them can be difficult. 2.3 Summary In a RTDBS, a transaction not only needs to m aintain the serialization, but also needs to complete before the deadline. Traditional CCPs guarantee serialization in transaction execution, but cannot guarantee m eeting deadlines. Again, a RTDBS in a distributed environment (DRTDBS) needs to consider data availability, system resources availability, and communication overhead between transactions and sub-transactions. B ecause o f these reasons, extending traditional CCPs by using an optim istically lending data technique or 40 adaptively changing the behaviour o f the system according the system state becam e very popular in designing a CCP. The ASL protocol adapted both features and proved itself as an effective CCP in a DRTDBS. However, the perform ance o f A SL is affected by the system attributes (i.e. network latency, network topology, priority protocols). Therefore, the performance o f a system can be improved by finding an optim al solution for these attributes. Dynamically switching between priority protocols according to the system load is an optimal solution to improve the system performance. From various solutions presented in this chapter, we found that if we can quantize the system load, then it is easy to sw itch betw een priority protocols considering the value o f the load. H owever, none of the solutions targeted a DRTDBS. Therefore, finding a solution for switching betw een priority protocols in a DRTDBS is the focus, and m ain contribution, o f this thesis. 41 Chapter 3 The Simulator To analyze the performance o f the A SL protocol in a DRTDBS, a distributed real­ time database model implemented in a distributed real tim e transaction processing sim ulator (DRTTPS) is used. DRTTPS was developed in the parallel and distributed com puting research lab at UNBC. The key features o f DRTTPS are provided in this chapter. A detailed description o f the simulator can be found in [46] [9]. 3.1 Distributed Real-Time Transaction Processing Simulator A model is not a real system, but describes the real system w ith all necessary information in a simpler manner. A sim ulation o f a model describes the w orkflow o f the model [47], The discrete event sim ulation model is a type o f simulation m odel, w here a representable system only changes its states at discrete points o f time [48]. The continuous event simulation model is another type o f sim ulation model, which changes states o f the system continuously over time. The discrete event sim ulation model is preferable over the continuous event simulation model because o f its sim plicity [47]. A sim ulation m odel requires development o f a software application to implement the workflow o f a real system. The software application consists o f entities that represent physical elem ents o f the real world. The entities interact with each other to perform actions, w hich represents the behaviour o f the real system [9]. The software application is called the simulator. Distributed Real-Time Transaction Processing Sim ulator (DRTTPS) is a discrete event simulator that simulates a DRTDBS [46] [9]. DRTTPS is flexible to incorporate new concurrency control protocols (CCPs) and can be a test bed for analyzing their perform ance. DRTTPS also allows other com ponents to be added such as a new priority protocol or new network architecture. In DRTTPS, events are executed sequentially. A n event can be an action that a component o f a sim ulator perform s during execution. To m aintain the sequence o f events, they are inserted into a list in the order they need to be executed based on the event’s execution time. The events are then executed by rem oving them one by one from the beginning o f the list to the end by incrementing time. A tick is the unit to m easure a discrete amount o f time in the simulator. The network configuration in a distributed system is shown in Figure 3.1 [49]. It consists o f one or more sites w here each site has one or more nodes, and each node has one or more real-time databases. It maintains a local area network between all nodes in a site and a wide area network between all sites. The system provides virtual routers to each site to maintain the connection between nodes inside the site or w ith other sites. It also provides routing tables to each node, w here each routing table contains routes to all conceivable destination nodes. The destination nodes may be inside the container site or in other sites. The network connection is very reliable. A failure message is always re-sent to confirm the arrival o f the message at the destination and the system also updates the routing table during 43 any connection failure or recovery to have an efficient routing path. A netw ork connection has the following properties: 1. Source: a network connection originating node. 2. Destination: a network connection ending node. 3. Bandwidth: m aximum capacity o f a netw ork connection. 4. Latency: the time taken by a m essage to travel from source to destination. 5. External usage: the percentage o f the bandw idth occupied by external users, if any. The bandw idth excluding the external usage is considered as the effective bandwidth o f the system. site C site A network communication via network site B Figure 3.1: N etw ork configuration [49] 3.1.1 Node Architecture A node is the core com ponent o f DRTTPS and has the following characteristics: 1. Concurrency Control Protocol: It indicates w hich CCP is followed by the transactions in the node. 2. Preemption Protocol: A preem ption protocol controls how the transactions are preem pted in the system to avoid priority inversion. The options are described below: > High Priority: A transaction which is holding a lock on a data item can be preem pted only if a transaction which requests a lock on the same data item has higher priority than the lock holding transaction. > Priority Inheritance: The low er priority lock holding transactions inherits the priority o f the higher priority w aiting transaction to complete rather than being aborted. > N o preemption: No preem ptions will be attempted. 3. Deadlock Resolution Protocol: This protocol defines which transaction w ill be aborted if a deadlock occurs. The options are shown below: > First Deadlock Resolution: A list is generated w ith the transactions w hich are involved in a deadlock. First deadlock resolution aborts the transaction at the top o f the list to resolve the deadlock. > Priority D eadlock Resolution: Priority deadlock resolution requires a list of the transactions sorted according to their priority. It aborts the lowest priority transaction from the list to resolve the deadlock. 4. Priority Protocol: A priority protocol defines the order o f execution o f transactions, for exam ple EDF, SJF, M SF, and FCFS, as described in C hapter 1. 45 APAP selects one o f the existing priority protocols and switches betw een them depending on the load conditions. 5. M ax active transactions: It sets the m axim um limit on the num ber o f transactions that can concurrently run in a node. Excess transactions need to w ait in a queue. 6. Timeout: It represents the m axim um time limit a transaction can be idle during execution before it is aborted. This tim e limit helps to resolve distributed deadlocks by predicting that the transaction is trapped in a deadlock if it exceeds the time limit. A node consists o f several hardw are and software components: processor manager, disk manager, buffer, swap disk, and optionally workload generator [46], The processor manager contains and arranges processors in the node. A processor can use a hyper­ threading technique which allows processing m ore than one page at a time [46], A processor has two attributes: 1. Process Time: It indicates the processing tim e o f a page measured by ticks. 2. Hyper-threading: It represents if the hyper-threading technique is enabled or disabled in the node. System perform ance can be analyzed with or w ithout hyper­ threading. The disk manager arranges disks in a node w hich store pages. D isks are non-volatile storage w here data can be partitioned or replicated. A disk has tw o attributes: 1. Access time: It represents the num ber o f ticks the system takes w hen it reads or writes a page from or to the disk. 46 2. Page Range: It represents the num ber o f pages in that disk. D ata partition or replication can be controlled by page ranges. However, CCPs in D RTTPS are not designed to handle replicated data. A b u ffer is a volatile storage where transactions perform read/write operations on the pages. A transaction needs to have all requested pages to be loaded to the buffer from the disk if the buffer has enough space. The buffer has a param eter w hich represents the maximum number o f pages a buffer can support during read and write operations. If the buffer does not have enough space, pages m ust be swapped out to a sw ap disk. The sw ap disk component indicates the physical disk and it contains a parameter, called access time, which represents the num ber o f ticks the system requires to swap a page . A w orkload g e n e ra to r is an optional com ponent o f a node, w hich is used for generating transactions in the node. It has the following parameters to control the num ber and nature o f the transactions: 1. Size: It represents how m any transactions are generated. 2. Arrival: It represents the inter-arrival time o f transactions in the system generated by the workload generator. At a low inter-arrival time, many transactions enter the system w ithin a short tim e period w hich causes more transactions to run concurrently, resulting in high system load. 3. Slack time: It represents the deadline o f a transaction, as w ell as the extra tim e after the deadline. The range for the baseline experiment has been selected to represent a deadline w hich includes a slack o f 2 - 6 tim es the execution time. 4. W ork size: It represents the num ber o f pages a transaction m ight access in its lifetime. 5. Pages: It defines the total num ber o f pages in the system. 6. Update: It represents a percentage o f update operations in transaction’s execution. 3.2 Graphical User Interface SetupTool is the core o f DRTTPS that allows users to set up and run sim ulations by creating site structure, node structure, and netw ork architecture (Figure 3.2). The SetupTool contains a number o f site com ponents and corresponding parameters, w here param eters control the characteristics o f all the site components. A sim ulation is run using com binations of specified param eter values. The SetupTool also has a com ponent called variation, w hich defines the number o f simulations that can be run at the same time. The SetupTool has tw o graphical panels. The left panel shows the site com ponents and the right panel shows parameter settings o f that particular site component. The SetupTool allows users to save simulations, where all site com ponents and param eters are encapsulated in a binary im age file. The binary image file is used later to load the simulation. After running a sim ulation, another component o f DRTTPS, called ReportTool (Figure 3.3), provides a user interface to create and save charts showing the results. 48 i-M: DRTTPS Setup Tool — Run * 3011 Run Simulations | Run Simulations Remotely J \ ^ 3 S ite 0 H j-. Set GCOFile Y S ave^ f Output Conf^utaOon J Options | Randomi're Seeds j BuikJ from script .J Node 0 9 a Wodeol Concurrency Control Protocol ? - C 3 v s iia E o fl C o n ts in e r 0 f C 3 Variation 0 Type ^Adaptors Speculative Locking Q Workload(Sorter# . Decision Algorithm C 3 P ro c e sso r lla n s g e r 0 o- C3 OisfcManager 0 D SuflerO D SwapDisK6 1 ®-C3 Node2 o- Enters 1150 Node C 3 N ode 3 Preemption Protocol o-C3node 4 o - C 3 M ode 5 Typo {Higher Priority Preempts * “ C 3 N ode 6 Priority Protocol-" Type {Dynamic Priority Protocol t Deadlock Resolution Protocol t Type jPrtOftty Deadlock Resolution ■Priority Protocol"" Type {dynamic Priority Protocol Priority Protocol TT V Figure 3.2: Simulator SetupTool IJrmm&d f m load 3 SM 8 tQMocM t - Q v a i ia f c n C - 0 Y a r ia k r.i -[Y w ifetier^ 3 S S sO **C3Ne&WHlcO <*-C3 HcceO M 3H c«t ►G 3 n w e ? ►CD Hoes 3 ^ L jH o d s * ► CSHodsS Vbw 'VjriatoiCo^aineiO flja_tW_cftrS5 .itl.stoj IfinaJVaJae c l PerceatCom ptetK S o s tim e ttlrnjRiVakiC n a . l f i i .s s n .? i j t n _ 9sm _J ?U€»riCC!TiS;C5dwT^s Q llsanV alu* Q Maximum Value load ru a_ i81 .s e tu p »■(3M ods6 Q Export ;;v/Oafr{«€daafcepesaEfc 8.46 6.3' m' VanW n0 Figure 3.3: Simulator ReportTool 3.3 Software Design: DRTTPS provides a com mon pluggable fram ework for all com ponents. The components are organized in a hierarchical structure (Figure 3.4 [46]) w here all functioning components are children o f the node module. A ll children o f sam e base class have the sam e interface for interacting with other modules. There is a base-line CCP w hich is at the top o f the CCP inheritance tree, and contains basic functionality. CCP Level II inherits from the base-line protocol and CCP level III inherits from the CCP level II. All protocols at the sam e level with CCP Level II or CCP Level III will have the same physical structure (discussed in detail in the following subsection). —1 EventQueue SiteComponent —} CCP Baseline 4zf CCP Level 111 —| Processor Manager | I—| Pro —| Disk Manager | —} Buffer 4zzi Figure 3.4: Sim ulation class structure [46] The simulator architecture is flexible to incorporate new protocols w ithout m ajor m odification o f the code. An example could be priority protocols. If a new priority protocol follows the common architecture given by DRTTPS, then the protocol w ould be easily recognized by DRTTPS. The discrete event model is im plem ented by creating a global event queue which stores all events and exists at top o f the hierarchical class structure. D RTTPS uses Java reflection [50] to create an event queue, w hich does not require m essage passing 50 between classes. Java language w as chosen for the developm ent of the sim ulator for its versatility, efficiency, and platform portability. For a new event task, a new event object is created and added to the event queue. The events are executed one by one from the global event queue. The performance analysis o f the protocols is executed by a separate class w hich keeps track o f any output values in the system. The values are then displayed in graphs in real-time as the simulation progresses (Figure 3.5). Finally, the ReportTool displays the statistics associated with the graph. DRTTPS can save or load a simulation for future use by encapsulating user interface, statistics, param eter setting, graphs etc. in a single object, by utilizing the serialization technique of Java language [50]. i-fe ] S i m u l a t i o n . T ra n s a c tio n s i '■ - S i te C o m p o n e n t s Name: Site 0:Node O Concurrency control protocol: Adaptive Speculative Locking Using Hold a t 150%, E nterat 150% 1 3 S it e 0 h" C l N e tw o r k 0 Preemption protocol: Higher Priority Preempts using Earliest Deadline First Deadlock resolution protocol: Abort Lowest Priority Transaction using Earliest Deadline First I 1 | f “ E 3 P ro c e sso r M anager 0 9~ EES D T sk M a n a g e r 0 I- D W o r k lo a d G e n e r a to r 1 Priority protocol: Earliest Deadline First Replication protocol: Closest Mode | C l B u tt e r o P t S w a p D is k 0 Max active transactions: 30 Transaction timeout: Constant distribution with value 10OOOO E 3 N ode 1 f- E 3 N ode 2 f~ G 3 M ode 3 f ~ 1 N o d e 4 f - O N ode 5 N ode 6 IE I... * E v en t D e s c rip tio n r a n s a c t i o n c r e a t e d in W o r k l o a d G e n e r a t o r 1 t r a n s a c t i o n 0 ra n s a c tio n a rriv e s tra n s a c tio n 0 a rt tra n s a c tio n tr a n s a c tio n 0 S p a w n s u b t r a n s a c t i o n t r a n s a c t i o n 1 :0. S it e 0 : N o d s 5 ) 3 w n s u b t r a n s a c t i o n t r a n s a c t i o n 2 : 0 . S ite ff.N o d e 3 p a w n a u b t r a n s a c t i o n t r a n s a c t i o n 3 : 0 , S it e 0 : N o d e o A c c e s s t r a n s a c t i o n 0 , p a g e 5 3 : 1 i n B u ffe r 0 A c c e s s w a s a m i s s R e q u e s t a c c e s s r e a d p a g e 5 3 : 1 i n D i s k M a n a g e r O tP is K O a d y t o a c c e s s r e a d p a g e 5 3 : 1 i n D i s k M a n a g e r Q :D is k 0 e g i n a c c e s s r e a d p a g e 5 3 :1 In D i s k M a n a g e r O . P i s K 0 r a n s a c t i o n t r e a t e d in W o r k l o a d G e n e r a t o r 1 t r a n s a c t i o n A a n s a c tio n a rriv e s tra n s a c tio n 4 a rt tra n s a c tio n tra n s a c tio n 4 IS p a w n s u b t r a n s a d i o n t r a n s a c t i o n 5 :4 , S it e 0 : N o d e <■ Figure 3.5: A running simulator 51 A daptive S p e c u la tiv e L ocking S p ecu lativ e L ocking 0 C o n c u rren cy C ontrol A b stract S p e c u la tiv e Locking S p ecu lativ e L ocking 1 S p ecu lativ e L ocking 2 S p ecu lativ e L ocking U nlim ited Figure 3.6: Speculative locking protocol dependencies [46] 3.3.1 Concurrency Control Protocols Figure 3.6 shows speculative locking protocol dependencies in D RTTPS. The implementation o f a CCP follows a hierarchical order. Concurrency Control is at the top o f all CCPs’ dependency hierarchies w hich provides default methods that are com m on to descendant CCPs. Abstract Speculative Locking inherits Concurrency C ontrol for all default methods o f CCP and adds other methods w ith specific speculative locking functionality. Abstract Speculative Locking provides a generic structure for all speculative locking protocols. The structure includes com mon speculative functionalities as w ell as som e distinct functionality (such as restricting num ber o f speculative executions). A bstract Speculative Locking also provides a version tree structure to keep track o f all versions o f a page. Speculative locking protocols are implem ented by inheriting all methods from C oncurrency Control and Abstract Speculative Locking. In addition it adds some extra features, w hich are not available in other speculative protocols; for example, ASL adds hyper-threading, memory m anagement including virtual memory, and transaction queue m anagem ent techniques. 52 For a new transaction, the executing CCP checks the required pages on the version tree to lock the versions o f the pages. N ot all versions o f a page will be locked. It depends on the executing CCP’s page restriction criteria. The CCP sets speculative read or w rite locks on the pages and creates a group o f all the locks to keep track o f all locks o f a transaction. In DRTTPS, a transaction is aborted if it is pre-em pted b y other transactions or if it becomes deadlocked. The aborted transaction releases all locks that w ere obtained or requested. All versions o f the pages on the speculative tree created by the transaction are removed and protected for future use. The transaction is then positioned in a queue for future execution. In the next chapter, we present our proposed m ethod protocol and dem onstrate its superior performance as compared w ith other priority protocols. 53 Chapter 4 The Proposed Protocol, Experiments and Results This chapter describes our proposed protocol, A daptive Priority A ssignm ent Protocol (APAP) in detail. This is followed by the presentation o f experiments and results. 4.1 Adaptive Priority Assignment Protocol Adaptive Priority Assignment Protocol (APAP) is an adaptive protocol for assigning priority to transactions in order to im prove the performance o f a DRTDBS under varying system loads. A priority assignment protocol is an integral part of transaction processing. The effect o f this protocol on the overall perform ance o f the system is greatly im pacted b y data contention (number o f users’ requests to a database system at any tim e) and resource 54 contention (conflict o f access to shared resources). The factors that cause data and resource contention include inter-arrival times o f transactions, disk size, cache size, num ber o f transactions, network topology, num ber o f physical nodes, and the page update rate. The performance o f a priority assignm ent protocol varies w ith different system environments; especially with different loads [16]. The system load “fluctuates drastically from day to day, hour to hour, m inute to minute, even second to second” [51]. The load in a database system can be defined as the dem and o f the database system, w hen a transaction performs queries and analysis through the DBMS. M oreover, any batch jo b s or system commands can also create a dem and [51]. The load can be quantified by the utilization o f the system, denoted by the following formula taken from [36] [52]: £/ = EF=1— ---------1 Pi 3.1 where ej and p; denote elapsed execution tim e and the total assigned processing tim e o f transaction T , , respectively. ED F achieves 100% processor utilization during low load [16]. Conversely, during high load, EDF exhibits a significantly poor performance, thus prom pting selection o f a better priority protocol [53]. The load calculation takes place w henever a new transaction arrives in the system. W e ran the system with different load configurations until the number o f completed transactions were maximized, and then recorded the load. A fter analyzing the performance with different loads, a range w as determined for a particular priority protocol (Table 4.1). 55 Priority Protocol Load Range ED F O to 1.25 FCFS 2.2 to 2.3 SJF Rest o f the time Table 4.1: Load ranges for priority protocols The performance o f M SF protocol is com paratively better under high load than in low load. However, in high load, SJF has a higher chance to complete a transaction than MSF because SJF minimizes the idle time o f system during transaction execution. M oreover, in our experiments, during high load SJF perform ed better than MSF. Therefore, we did not use MSF as one o f the options for APAP. FCFS performed the best over a small load range from 2.2 to 2.3, beyond which the perform ance o f SJF was superior. APAP uses the load range to decide which priority protocol to execute. W hen the system selects a new task it sends a load value to APAP. After getting the load value, A PA P matches the load value with the load ranges recorded in it. If the load falls w ithin the range o f the currently executing priority protocol then no change happens. If the load falls w ithin a different range then a switch operation is performed to change the executing priority assignment protocol to the newly determ ined priority protocol. 4.1.1 Implementation W e use DRTTPS as a test bed for APAP and use ASL as the underlying concurrency control protocol. As previously discussed, DRTTPS provides a common architecture to select the next transaction from a list o f transactions through a priority protocol engine. W hen a new transaction arrives in the system, it is placed in a list w here it w aits until scheduled. The transaction selection process proceeds as follows: The priority protocol 56 engine receives a request from the system to select a transaction from the w aiting list. T he priority protocol engine calculates and sends all the transactions’ values in the list to the backend comparator, which selects the highest priority transaction (Figure 4.1). The value is the deadline when the priority protocol is EDF or the execution time w hen the priority protocol is SJF. In the case o f EDF, the highest priority transaction w ill have the earliest deadline, whereas in the case o f SJF, it will have the lowest execution tim e. The transaction w ith the highest priority is then returned to the system for scheduling. W hile the transactions are executing, the preem pted and/or aborted transactions are added to a priority queue in the backend comparator for future consideration. (1) Send list Priority Protocol Engine System (4) Selected transaction (2) Forward list with transactions’ values Backend Comparator (3) Selected transaction Figure 4.1: Sequence o f transaction selection process without A PA P W hen APAP is introduced in DRTTPS architecture, it acts as a m ediator betw een the system and the priority protocol engine (Figure 4.2). The selection of a transaction from the list o f transactions waiting to be scheduled proceeds as follows: The APAP engine receives a load value from the system and determ ines the most suitable priority protocol based on that value. This modification ensures the dynam ic selection o f the priority protocol. The APAP engine receives a list o f transactions from the system instead o f the priority protocol engine. The APAP engine communicates with the priority protocol engine to get the values o f the transactions in the list and sends the list with the transactions’ values to the backend 57 comparator so that transaction selection can be com pleted and returned to the system. The communication between the APAP engine and the priority protocol engine is transparent to the system, as it only communicates directly w ith the APA P engine. (1 ) L o a d v a lu e (6) Forward list with transactions’ values (3) Send list System APAP Engine Backend Comparator (7) Selected transaction (8) Selected transaction Priority Protocol Engine Figure 4.2: Sequence o f transaction selection process with A PA P In order to minimize the switching operation overhead, APAP only tests load for switching upon arrival o f a new transaction and not w hen transactions are pre-em pted and/or aborted and added to a priority queue. However, it continues acting as a m ediator to add transactions to the priority queue. The inclusion o f APAP in the DRTTPS does not cause any overhead to the original w orkflow o f transaction selection, as APA P sim ply selects transactions in the same m anner (i.e. using the values for the transactions returned by the priority protocol engine). 58 4.2 Experiments and Results In this section, we describe the experim ents conducted in this study. For each experiment, the performance o f APAP was com pared w ith other priority protocols including: EDF, SJF, FCFS, and MSF (discussed in chapter 1). The key performance m etric is the percentage o f transactions completed on tim e (PTCT), that is, before the deadlines. In addition, we observed the switches betw een the protocols in APAP and present it as the percentage o f usage (POU) o f priority protocols given a particular system configuration. 4.2.1 B aseline E x p erim en t In a DRTDBS, there are m any param eters that can change the system load, such as system resources (cache size, number o f disks), transaction inter-arrival tim e, slack time, network topologies etc. In our baseline experiment, w e followed the param eter settings given in Table 4.2. Later we varied these param eters to change the overall system load. For all experiments, PTCT and POU are m easured at a transaction inter-arrival tim e range o f 5 to 55 ticks, because there was no observable result below o r above this range. W e used binary tree topology for network connections am ong sites, w here a node could have at m ost two child nodes. W e assume bandwidth is unlim ited and there is no netw ork latency. In Figure 4.3, when the transaction inter-arrival time is between 5 and 30 ticks, SJF demonstrates significantly higher PTCT than ED F and FCFS, because in high load scenarios, EDF and FCFS protocols suffer from a high num ber o f transaction aborts. U nder heavy load, EDF performs worse than the other protocols until an inter-arrival time o f 40 ticks. B eyond this point, the transactions have enough tim e to complete and EDF clim bs to 100% completion rate. APAP performs better than all tested protocols by sw itching betw een the priority protocols. 59 Parameter Type Network Node Transaction Generator Parameter Value N etw ork Topology Bandwidth Binary Tree 1 0 0 0 bit/tick N etw ork Pipe Latency Active Transaction Count D isk Count per N ode M axim um Pages Held per Disk D isk Access Tim e Cache Size Swap D isk Access Time Transaction Process Time Pages per Transaction 0 Slack Time Inter-arrival time Page U pdate Rate Transaction Count Table 4.2: Parameter Settings 30 2 100 35 ticks 20 35 ticks 15 ticks 4-12 pages 720-2160 5-55 ticks 1 0 0 percent 100 100 ■ E D F hui- 50 a. - S J F 40 - * - F C 30 — F S M S F — "SSS—— A P A P 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.3: PTCT for the B aseline Experiment Figure 4.4 shows the POU s o f priority protocols in APAP for the baseline experiment. W hen the inter-arrival tim e o f the system is betw een 5 and 20 ticks, the system usually runs with SJF because o f its superior perform ance during that time. W hen the inter­ arrival tim e o f the system increases, the load condition o f the system decreases. Thus, A PA P increases the usage o f EDF, rather than SJF. The inter-arrival time betw een 20-30 ticks shows a transition period during w hich each o f the tw o protocols is used approxim ately 50% o f the time. The usage then becom es m ore distinct in favour o f EDF due to decreased system load. APAP follows FCFS during the inter-arrival tim es o f 5 to 10 ticks and 15 to 30 ticks, with POUs up to 2.4% and 3.8%, respectively. 61 100 90 80 70 60 3 o a. 50 E D F 40 S J F 30 F C F S 20 10 0 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.4: POU for the Baseline Experiment 4.2.2 Performance of APAP varying the transaction load M ore transactions means m ore data conflicts and m ore speculative executions, w hich increase the system load. W e used the param eter settings in T able 4.2, except we increased the number o f transactions to 200. As a large num ber o f transactions are executed, data contention and resource contention also increase, resulting in a large num ber o f transaction rejections. Therefore, the performance o f all priority protocols, including A PA P, degrade. Figure 4.5 shows the PTCT for all protocols for this experiment. APAP continues to exhibit superior performance overall. 62 100 so 70 — ® — E D F 50 — ES— S J F 40 — F C F S 30 - * - M CL — — se— 5 25 40 45 50 S F A P A P 55 In te r-a rriv a l tim e Figure 4.5: PTCT for 200 transactions The POUs o f all protocols w ith 200 transactions are shown in Figure 4.6. It is clear that APAP runs SJF for m ore time than the baseline experiment. The FCFS is used during the inter-arrival time ranges from 5 to 15 ticks (m axim um 4% POU) and 20 to 30 ticks (maximum 2.2% POU). 100 90 80 70 60 o a . 50 - E D F 40 - S J F 30 - t 2 t- F C F S 20 10 0 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.6: POU for 20 0 transactions 63 4.2.3 Performance of APAP with reduced page update rate Page update rate is an important factor in transaction processing and indicates the percentage o f write operations in a transaction execution. The w rite operations use exclusive locks on the data, thus blocking other transactions from accessing that data for a certain time. This blocking time increases the transaction execution time. Consequently, a high or low page update rate affects the system load. W e show results from two experim ents in this section. In the first experiment, we changed the page update rate to 0 w hich represents a read-only scenario. 100 90 80 70 — 60 ut- 50 Q. 40 <3>— E D F — B — S J F — 30 Ufar" * F C F S — X — 20 M S F — J i e - A P A P 10 0 15 20 40 In te r-a rriv a l tim e Figure 4.7: PTCT for the zero page update rate From Figure 4.7, EDF demonstrates m axim um PTCT at inter-arrival tim es o f 5 and 10 ticks, because there is no data conflict and thus no blockage o f transactions. SJF exhibits higher PTCT than in the baseline experiment, but in com parison to other protocols, this is a poor performance. APAP demonstrates 6.9% less PTCT than EDF at inter-arrival tim e o f 5 ticks. However, when the system runs w ith an inter-arrival tim e o f more than 10 ticks, A PA P 64 outperforms all priority protocols and achieves PTCT o f 100% before EDF. Figure 4.8 show s that APAP runs m ostly with SJF until an inter-arrival tim e o f 15 ticks. D uring that period, the usage o f SFJ varied from 85.2% to 76.1%, the usage o f EDF increases from 13.8% to 20.3% , and the usage o f FCFS varies from 1% to 5%. W hen the inter-arrival tim e is m ore than 25 ticks, APAP only uses EDF because it shows 100% efficiency during that period. 100 90 80 70 60 3 o 50 — ❖ — E D F 40 — g§— S J F — sfe— F C F S Q. 30 20 10 0 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.8: POU for the zero page update rate Next, we changed the page update rate 50 percent which implies that h a lf o f the pages accessed are also m odified (Figure 4.9). The system load is higher in this case than that o f the zero page update rate in the previous experiment. D ue to this higher load, ED F perform s more poorly during the low inter-arrival times. However, during the high inter-arrival tim es, EDF outperforms SJF, MSF, and FCFS. APAP exploits all the priority protocols and consistently performs better than the other protocols except w hen the inter-arrival tim e is 20 and 25 ticks. At these inter-arrival times, EDF exhibits a PTCT o f 3.1% and 4.5% greater 65 than APAP while transitioning from high to low system load. Beyond inter-arrival tim e o f 30 ticks, APAP performs the same as EDF at 100% PTCT. 100 90 80 70 u 60 — E D F 50 — ® — S J F 40 - * r - F C F S 30 M S F Q. 20 — A P A P 10 0 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.9: PTCT for the 50 percent page update rate The POU o f APAP for the page update rate o f 50 percent is show n in Figure 4.10. During the low inter-arrival times and high system load, APAP runs SJF frequently until an inter-arrival time o f 25 ticks. The POU o f SJF and ED F is 83.52% and 14.8%, respectively, at the 5-tick inter-arrival time. After that, the usage o f SJF gradually decreases and that o f EDF gradually increases with a crossover point at 25 ticks. T he POU o f FCFS rem ains low varying from 0 to 5.35%. APAP only uses ED F beyond the inter-arrival tim e o f 40 ticks. 66 100 90 80 70 60 oCL 50 - ♦ - E D 40 - • - S J F 30 - A - F C F F S 20 10 0 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.10: PO U for the 50 percent page update rate 4.2.4 Performance of APAP with larger cache size In this experiment, w e study the effect o f cache size on APAP and other protocols. Cache size is important to the perform ance o f the A SL protocol because it m ust find space available in the cache or swap disk before it requests a page from the database. I f there is not enough space, then the transaction needs to wait. Therefore, cache size affects the system load. W hen the cache size was increased from 20 to 50 pages (Figure 4.11), the PT C T o f all the priority protocols improve because o f the larger memory. The PTCT o f SJF increases linearly, unlike the baseline experim ent w hich has a drop in PTC T at the 40-tick inter-arrival time. The PTCT for MSF improves 4.45% on average. FCFS has a maximum 20.8% jum p at the 30-tick inter-arrival time from the baseline experiment. ED F displays a large increase o f PTCT at 35 ticks and outperforms all other protocols. APAP is exceeded by ED F slightly (3.4%) at 35 ticks, but attains a PTCT o f 100% at 40 ticks with EDF. 67 100 90 70 60 hO 50 ICL 40 - ♦ - E D — ■ S J F — F —&r~ F C F S - * - M 20 — ^ — 10 15 20 25 30 35 40 45 50 S F A P A P 55 In te r-a rriv a l tim e Figure 4.11: PTCT for 50 pages cache size 100 90 80 70 60 u o Ql 40 20 10 5 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.12: POU for 50 pages cache size Figure 4.12 indicates that the POU o f SJF varies from 84.3% to 92.4% until an inter­ arrival time o f 30 ticks. The POU o f EDF varies from 5.6% to 11.7% until 30 ticks. FCFS has maximum 5.34% POU at 30 ticks. W hen the PTCT o f EDF increases sharply at 35 ticks during the transition from high to low system load, the POU o f ED F also increases to 48% at 68 35 ticks. Afterwards, the POU o f EDF gradually reaches 100% at an inter-arrival tim e o f 50 ticks. W hen we increased the cache size to 100 pages, the performance for all protocols w as observed to be almost the same as the perform ance w ith a m em ory size o f 50 pages. 4.2.5 Performance of APAP with increased slack time An increase in slack time relaxes the deadlines and allows enough tim e for a transaction to complete. W hen the slack tim e o f the system w as increased to 720-3600 ticks, the PTCT o f all priority protocols im proved (Figure 4.13), since the system now had enough time to execute all transactions. All priority protocols have a PTCT betw een 30% and 50% when the inter-arrival time is at 5 ticks. As the inter-arrival tim e increases, the perform ance o f all protocols also improves. EDF outperform s SJF at the 20-tick inter-arrival tim e, earlier than the baseline experiment. FCFS shows 11.4% and 14.5% m ore PTCT than ED F during the inter-arrival times o f 5 and 10 ticks. However, after 15 ticks FCFS is outperform ed by EDF. MSF performed the same as in the baseline experiment, indicating that it is not affected by the increased slack time. 69 100 90 80 -■ -E D F i— ui- 50 —A— SJF 40 —H— FCFS —* — APAP —©— MSF 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.13: PTCT for the 720-3600 ticks slack time One noticeable finding is the perform ance o f the protocols increase rapidly until the 20-tick inter-arrival time, after w hich it increases slowly. APAP outperforms all protocols under all load conditions. Figure 4.14 shows the POUs o f the priority protocols in APAP for slack tim e o f 7203600 ticks. The POUs o f SJF and EDF in APAP at the inter-arrival time o f 5 ticks are now closer, at 30% and 70%. The POUs then change quickly w ith the POU o f ED F becom ing greater than that o f SJF after an inter-arrival tim e o f 15 ticks. As the PTCT o f ED F slow ly increases to 100%, the POU o f EDF also slowly goes up to 100%. In A PA P the POU o f FCFS is up to 4.3% between the inter-arrival times o f 5 and 20 ticks. FCFS is also used in the inter-arrival times between 20 and 30 ticks, but only with a POU of 1%. 70 100 90 80 70 60 D 50 O Q. —#— EDF 40 -■ -S JF 30 - 6 - FCFS 20 10 0 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.14: POU for the 720-3600 ticks slack time 4.2.6 Effects of system disk space on the performance of APAP In this experiment, we study the effect o f system disk space on the perform ance o f APAP and other protocols. The num ber o f disks affects the data availability for a transaction during execution. If we increase the num ber o f disks, a transaction has a high probability o f getting required data in the local disk, which reduces the blocking and execution tim es o f the transaction. Therefore, the num ber o f disks affects the system load. We increased the num ber o f disks from 2 to 4 for this experiment. Because o f the increase in resources, there is a large change in the PTCTs o f all protocols over the baseline experim ent (Figure 4.15). H ow ever, during low load when the inter-arrival time is between 5 and 10 ticks, SJF outperform s EDF, FCFS, and MSF. After 10 ticks, the perform ance curve o f EDF shows a steep rise confirm ing its superior performance during low load. FCFS and M SF always perform close to each other and surpass SJF after 10 ticks. APAP outperforms all priority protocols under m ost load conditions, except an inter-arrival tim e o f 20 ticks w here EDF performs slightly (4%) better. 71 100 90 -♦ -E D F 60 uiQ. —• —SJF —&— FCFS —X— MSF —it*— APAP 5 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.15: PTCT for 4 disks The POUs o f EDF and SJF at low er inter-arrival tim es (5 to 10 ticks) are at 37.6% and 56.5%, respectively (Figure 4.16). As the system is a suitable environm ent for ED F when the inter-arrival time is more than 10 ticks, A PA P gradually switches to using ED F more than SJF. After 30 ticks, APAP only runs EDF. APAP also uses FCFS 5.85% at the inter-arrival time o f 5 ticks, w hich gradually levels to around 1% between 15 and 20 ticks before going to zero at 25 ticks. 72 100 90 80 70 60 3 O a. 50 —-O -E D F 40 —O —SJF 30 - t S— FCFS 20 10 0 5 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.16: POU for 4 disks 4.2.7 Effect of system distribution on the performance of APAP An increase in the number o f nodes in a DRTDBS increases data availability and load distribution. Therefore, it reduces the overall system load. In the first experim ent, w e changed the num ber o f nodes from 7 to 11 (Figure 4.17). D ue to the high data locality, the PTCT for EDF is now almost same as SJF even at an inter-arrival time o f 5 ticks. H owever, the performance o f EDF does not increase at a high rate when th e inter-arrival tim e is m ore than 5 ticks. FCFS shows 7.7% m ore PTCT than EDF at an inter-arrival tim e 20 ticks. M SF exhibits the lowest PTCT until the 15-tick inter-arrival time after which it perform s close to SJF. In fact, MSF performs the same as it did in the baseline experiment, because slack tim e does not improve by increasing the num ber o f nodes. APAP continues to outperform all other priority protocols. 73 100 80 70 —♦ — EDF iu— 50 tCL SJF 40 —•*— FCFS —-X— MSF — 35 40 45 50 APAP 55 In te r-a rriv a l tim e Figure 4.17: PTCT for 11 nodes 100 90 80 70 60 3 oa. 50 —❖— EDF 40 —B — SJF 30 —■A—FCFS 20 10 0 — i— A — i— 1------------ 1 5 10 15 20 25 30 — r ~ B — i— 0 — i— Q —i 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.18: POU for 11 nodes Figure 4.18 indicates that the POU o f SJF decreases from 73.4% to 66.5% during the inter-arrival times between 5 and 10 ticks, whereas the POU o f EDF increases from 23.2% to 28.6% during the same period. After that, the POUs o f SJF and EDF remain m ore or less flat, until the inter-arrival time o f 20 ticks w hen APAP clearly show s preference for ED F over 74 other priority protocols. The POU o f FCFS increases to 4.9% at 10 ticks, then decreases gradually until it becomes zero at 35 ticks. 100 90 80 EDF 60 hu 50 H o . SJF 40 tSt-FC FS ■X— APAP * -M S F 5 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.19: PTCT for 15 nodes Figure 4.19 shows perform ance results w hen the number o f nodes w as further increased to 15. Since there is high data locality, ED F surpasses SJF, M SF, and FCFS consistently, and achieves a PTCT o f 100% at the inter-arrival tim e of 15 ticks. A PA P show s 8.23% more PTCT than EDF at 5 ticks, but 2% and 2.3% less PTCT than ED F at 10 and 15 ticks, respectively. Hereafter, APAP attains 100% o f PTCT. Because o f the superior performance o f EDF, APAP prefers EDF over SJF from the beginning which proves the adaptive nature o f APAP (Figure 4.20). 75 100 80 3 oCL -EDF 40 -SJF -FCFS 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.20: POU for 15 nodes 4.2.8 Effect of increasing the number of processors In this experiment, w e study the effect o f m ore than one processor in a node on APAP. We increased the number o f processors from 1 to 2, and no significant im provem ent in performance was observed for any o f the protocols including APAP (Figure 4.21). It should be noted that the num ber o f processors is not the real bottleneck in our experiments. As we discussed, the performance o f the protocols is greatly affected by the page update rate, cache size, slack time etc.; increasing the num ber o f processors does not affect the performance o f priority protocols. The PTCT o f APAP increases to a m axim um o f 5.7% at the 20-tick inter-arrival time over the single processor experiment. The m axim um increase for EDF is 5.9% at the 35-tick inter-arrival time and 4.6% for SJF at 40 ticks. The POUs in Figure 4.22 for two processors are visibly different than the single processor, because slight performance changes o f the protocols increase the usage o f EDF at the inter-arrival times from 5 to 10 ticks and decrease the POU at the inter-arrival tim e o f 20 ticks. 76 100 - ♦ — EDF -•-S JF CL 40 -sSr-FC FS -K — M SF APAP 15 20 25 35 40 45 50 55 50 55 In te r-a rriv a l tim e Figure 4.21: PTCT for 2 processors 100 Z3 O a. 40 5 10 15 20 25 30 35 40 45 In te r-a rriv a l tim e Figure 4.22: POU for 2 processors 4.2.9 Effect of network topologies on the performance of APAP Network topologies affect the performance o f A SL protocol [54]. In order to study this with APAP, that is, using dynamic switching betw een priority protocols, w e changed the 77 network topology from binary tree to 2D -Torus using 16 nodes. Due to the high data locality, all priority protocols perform better than the baseline experim ent with 7 and 15 nodes. H ere SJF and FCFS perform close to each other and better than EDF when the inter-arrival tim e is between 5 and 10 ticks (Figure 4.23). However, ED F outperforms SJF, M SF, and FCFS priority protocols when the inter-arrival tim e is m ore than 10 ticks and the system load is low. APAP demonstrates the m axim um PTCT consistently. As shown on Figure 4.24, between the inter-arrival tim es of 5 and 10 ticks, A PA P performs with SJF around 75%-70% and w ith EDF around 25%-20%. A fter the inter-arrival time o f 10 ticks, the POU o f SJF decreases and the PO U o f ED F increases rapidly until it is used 100% by APAP at the inter-arrival tim e o f 25 ticks. The POU o f FCFS increases to 7.9% at the inter-arrival tim e o f 10 ticks and goes back to zero at 15 ticks. W e get sim ilar results for 2D-Mesh, Hypercube-4, and ring topologies. 78 100 90 80 70 £a 60 EDF 50 —81— SJF 40 —£t— FCFS 30 —X— APAP 20 -M S F 10 0 5 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.23: PTCT for the 2D-torus network topology 100 90 80 70 3 o a . 60 50 ♦ — EDF 40 SJF 30 FCFS 20 10 0 T—f l —i—S —r —t 3 " I" B —i 10 15 20 25 30 35 40 45 50 55 In te r-a rriv a l tim e Figure 4.24: POU for the 2D-torus network topology 4.3 Summary > APAP outperformed all priority protocols especially in high load conditions. However, in some low load conditions EDF completed 2% to 6% m ore transactions than APAP. 79 > A change in the page update rates from 100% to 0%, disk size 2 to 4, and the number o f nodes from 7 to 15 greatly im proved the performance o f all priority protocols. > A n increase in the num ber o f processors from one to two in each node and the cache size from 50 to 100 did not have a significant effect on the perform ance o f any priority protocol. > The network topologies we tested had sim ilar effect on all priority protocols. > W hen the inter-arrival time was increased, the POU of ED F increased, and always reached 100%. 80 Chapter 5 Conclusion CCPs ensure consistency o f a database when m ultiple transactions request the sam e data in a database. In a distributed environm ent, CCPs also need to coordinate betw een transactions and their sub-transactions, w hich execute at different sites. The A SL protocol is a CCP for a DRTDBS which follows the underlying structure o f Speculative Locking (SL) protocols as well as provides additional features (discussed in Chapter 2). A SL outperform ed SLs, but its performance degrades when the system is in a high load condition [9], A database system uses priority protocols when a CCP coordinates transaction processing to order transactions. EDF is an optim al priority protocol for ordering transactions in a database system. However, E D F ’s perform ance also degrades in high load conditions. O n the other hand, some other priority protocols (such as SJF) perform better than ED F in such conditions. A common trend to optim ize the perform ance of priority protocols is dynamically changing from one to another according to the system load conditions. However, the existing solutions are not am enable to a DRTDBS. Our proposed method, Adaptive Priority Assignm ent Protocol (APAP), im proved the performance o f a CCP to a large extent in a DRTDBS under all load conditions. APAP switches between the priority protocols according to the system load using a load range table 81 w hich contains load values where a given priority protocol is expected to perform better. This is done at run time. The ASL protocol is used as the underlying CC P for all o f our experiments. We observed A SL’s im proved performance by varying the num ber o f transactions, cache sizes, num ber o f processors, page update rates, num ber o f disks, and network topologies. APAP outperform ed all priority protocols in most conditions. In some low load conditions, when the inter-arrival tim e varied, EDF exhibited 2% to 6% m ore completed transactions than APAP. W e can summarize the observations in the following way: 1. APAP yields an overall superior perform ance w hen system load is high. 2. Longer slack time and m ore resources (more disks nodes, or cache size) decrease the system load, improving perform ance o f all priority protocols. However, increasing cache size beyond a certain value does not further im prove the performance o f priority protocols. 3. In some low load conditions, EDF performs slightly better than A PA P. However, any improvement attempt in these load conditions degrades the overall performance o f APAP. Therefore, the fact that ED F outperform ed APA P is considered as a limitation o f APAP and can be considered negligible. As APAP switches between priority protocols according to the system load, w e also observed the percentage o f usage (POU) o f the priority protocols in APAP. The PO U charts indicate that: 1. APAP uses mostly SJF when the system load is high. 2. W hen the system load decreases, APAP increases the use of EDF. 82 5.1 Future Work W e found that changing priority protocols at run time improves the perform ance o f a CCP in real-tim e distributed database systems. However, in this research, we assum ed that there exists a single workload generator. In a practical application, transactions can be generated at more than one node. Therefore, w e can further our research by testing the performance o f APAP under multiple w orkload generators. Another area o f im provem ent would be the use o f replicated databases. They improve the performance o f a transaction processing system by increasing data locality where a data item has one or m ore copies at different nodes. However, the test bed o f this research, DRTTPS, is not designed to handle replicated databases. W e can m odify DRTTPS for this scenario and test the perform ance o f APAP. Finally, a real-tim e distributed system is also susceptible to failure. In our experiments, the DRTTPS was not built as a fault tolerant system w hich continues to function even when some com ponents fail. Therefore, we can test how A PA P perform s in such an environment. 83 Bibliography [1] U. Shanker, M. Misra and A. K. Saije, "Distributed real time database system: background and literature review," Distributed and Parallel Databases, vol. 23, pp. 127-149, January 2008. [2] P. A. Bernstein and . E. Newcomer, Principles of transaction processing (Second Edition), San Francisco, California: Morgan Kaufmann Publishers, Inc., 2009. [3] B. Kao and H. Garcia-Molina, "An Overview of Real-Time Database Systems," in proceedings o f NATO Advanced Study Institute on Real-Time Computing, St. Maarten, Netherlands., 1992, pp. 463-486. [4] A. Buchmann, "Real-Time Databases," Encyclopedia o f database technologies and applicatios , pp. 524-529, 2005. [5] C. F. Yeung and S. L. Hung, "A new deadlock detection algorithms for distributed real-time database system.," in 14th Sysmposium on Reliable Distributed System., Bad Neuenahr, 1995, pp. 146-153. [6] P. A. Bernstein, V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Database Systems, M. A. Harrison, Ed., Manlo Park, California: Addison-Wesley Publishing Company, 1987, pp. 265-304. [7] P. K. Reddy and Kitsuregawa, Masaru, "Speculative Locking Protocols to Improve Performance for Distributed Database Systems," IEEE Trans, on Knowl. and Data Eng., vol. 16, pp. 154-169, February 2004. [8] W. Haque and P. R. Stokes, "Adaptive speculative locking protocol for distributed real-time database system," in Procedings o f the 19th IASTED Interntional Conference on Parallel and Distributed Computing and Systems, Cambridge, Massachusetts, 2007, pp. 382-390. [9] P. R. Stokes, Design and Simulation of an Adaptive Concurrency Control Protocol for Distributed Real-Time Database System, Prince George, Canada: Master's Thesis,University of Northern British Columbia, 2007. [10] O. Ulusoy, "Distributed Concurrency Control," in Real-time database systems: architecture and techniques, Norwel, Massachusetts 02061, USA, Kluwer Academic Publishers, 2001, pp. 205-215. [11] D. Agrawal, A. El Abbadi and R. Jeffers, "Using delayed commitment in locking protocols for real-time databases," in Proceedings o f the 1992 ACM SIGMOD international conference on Management o f data, San Diego, California, USA, 1992, pp. 104-113. [12] A. E. Abbadi and S. Toueg, "Availability in partitioned replicated databases," in Proceedings o f the fifth ACM SIGACT-SIGMOD symposium on Principles o f database systems, Cambridge, Massachusetts, United States, 1986, pp. 240-251. [13] K. Ramamritham, "Real-Time Databases," International Journal o f Distributed and Parallel Databases, vol. l,pp. 199-226, 1996. [14] R. Obermacrk, "Disctributed Deadlock Detection Agorithm," ACM Transactions on Database System, vol. 7, pp. 187-208, 1982. 84 [15] M. Singhal, "Deadlock Detection in Distributed Systems," Computer, vol. 22, no. 11, pp. 3748, 1989. [16] W. Ding and R. Guo, "Design and Evaluation of Sectional Real-Time Scheduling Algorithms Based on System Load," in The 9th International Conference fo r Young Computer Scientists, Hunan, 2008, pp. 14-18. [17] D. Levine, C. Gill and D. Schmidt, "Dynamic Scheduling Strategies for Avionics Mission Computing," in Digital Avionics Systems Conference, Bellevue, W A , USA, 1998, pp. 1-8. [18] L. Gruenwald, M. Montealegre and C. N. Lau, "Performance Comparison of Scheduling Techniques to Manage Transactions for Real-Time Mobile Databases in Ad Hoc Networks," in http://paginas.usco.edu.co/proyeccion/Documentos/entornos/entornos20/ArticuloMatildeMont ealegre.pdf. [19] [Online]. Available: http://www.personal.kent.edu/~rmuhamma/OpSystems/os.html. [20] J. R. Haritsa, M. Livny and M. J. Carey, "Earliest Deadline Scheduling for Real-Time Database Systems.," in in Proc. IEEE Real-Time Systems Symposium, San Antonio, TX , USA, 1991, pp. 232-243. [21] K. W. Lam and S. L. Hung, "A pre-emptive transaction scheduling protocol for controlling priority inversion," in Third International Workshop on Real-Time Computing Systems and Applications, Seoul, 1996, pp. 144 - 151. [22] L. sha, R. Ragunathan and L. John, "Priority Inheritance Protocols: An approach to Real-Time Synchronization," IEEE transaction on Computers, vol. 39, pp. 1175-1185, September 1990. [23] Y. S. Philip ,. K.-l. Wu , K.-j. Lin and S. H. Sang , "On real-time databases: Concurrency control and scheduling," Proceedings o f the IEEE, pp. 140-157, 1994. [24] Z. Kedem and A. Silberschatz, "Controlling concurrency using locking protocols," in 20th Annual Symposium on Foundations o f Computer Science, San Juan, Puerto Rico, 1979, pp. 274 -285. [25] Z. M. Kedem and A. Silberschatz, "Locking Protocols: From Exclusive to Shared Locks," Journal o f the ACM (JACM), vol. 30, no. 4, pp. 787-804, Oct 1983. [26] D. Skeen, "Nonblocking commit protocols," in Proceedings o f the 1981 ACM SIGMOD international conference on Management o f data, Ann Arbor, Michigan, 1981, pp. 133-142. [27] R. Gupta, H. Jayant and K. Ramamritham, "Revisiting Commit Processing in Distributed Database Systems," Proceedings o f the 1997 ACM SIGMOD international conference on Management o f data, vol. 26, no. 2, pp. 486-497, 1997. [28] Y. Al-Houmaily, P. K. Chrysanthis and S. P. Levitan, "An argument in favor of the presumed commitprotocol.," in Proceedings o f the IEEE International Conference on Data Engineering, Birmingham, 1997, pp. 255--265. [29] J. R. Haritsa and K. Ramamritham, "Adding PEP to real-time distributed commit processing," in Procedings o f the 21st IEEE Real-time Systems Symposiums, Orlando, USA, 2000, pp. 3746. [30] M. Abdallah, R. Guerraoui and P. Pucheral, "One-Phase Commit :Does It Make Sense ?," in Procedings o f the International Conference on Parallel and Distributed Systems, Tainan, 85 Taiwan, 1998, pp. 14-16. [31] M. Atif, "Analysis and Verification of Two-Phase Commit & Three-Phase Commit Protocols," in International Conference on Emerging Technologies, 2009, pp. 326 -331. [32] P. A. Bernstein and N. Goodman , "Concurrency Control in Distributed Database Systems," ACM Comput. Surv., vol. 13, no. 2, pp. 185-221, 1981. [33] L. Sha, R. Rajkumar and J. P. Lehoczky, "Concurrency Control for Distributed Real-Time Databases," SIGMOD RECORD, vol. 17, pp. 82-98, March 1988. [34] G. Buttazzo, M. Spuri and F. Sensini, "Value vs. Deadline Scheduling in Overload Conditions," in Procedings o f the 16th IEEE Real-Time Systems Symposium, Pisa, Italy, 1995, pp. 90-99. [35] S. Asiaban, M. E. Moghaddam and M. Abbaspur, "A Real-Time Scheduling Algorithm for Soft Periodic Tasks," JDCTA: International Journal o f Digital Content Technology and its Applications, vol. 3, pp. 100-111, 2009. [36] W. Li, K. Kavi and R. Akl, "A non-preemptive scheduling algorithm for soft real-time systems," Computers and Electrical Engineering, vol. 33, no. 1, pp. 12-29, Januray 2007. [37] J. R. Haritsa, K. Ramamritham and G. Ramesh, "The PROMPT real-time commit protocol," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 2, pp. 160-181, 2000. [38] M. Abdallah and P. Pucheral, "A Non-Blocking Single-Phase Commit Protocol for Rigorous Participants," in In Proceedings o f the National Conference Bases de Donnes Avances, Grenoble, France, 1997. [39] A. T. Tai and J. F. Meyer, "Performability Management in Distributed Database Systems: An Adaptive Concurrency Control Protocol," in Proceedings o f the 4th International Workshop on Modeling, Analysis, and Simulation o f Computer and Telecommunications Systems, San Jose, C A , USA, 1996, pp. 212-216. [40] P. K. Reddy and M. Kitsuregawa, "Improve performance in distributed database systems using speculative transaction processing," in Second IASTED European Parallel and Distributed Systems Conference, Vienna, Austria, 1998, p. 275—285. [41] T. Marr and et al., "Hyper-threading technology architecture and microarchitecture," Intel Technology Journal, vol. 6, pp. 4-15, 2002. [42] T. Ragunathan and P. K. Reddy, "Performance enhancement of Read-only transactions using speculative locking protocol," in Sixth Annual Inter Research Institute Student Seminar in Computer Science (IRISS 2007), Hyderabad, India, 2007. [43] T. Ragunathan and P. K. Reddy, "Improving the performance of read-only transactions through asynchronous speculation," in HPCS 2008, San Diego, CA, USA, 2008, p. 467—474. [44] T. Ragunathan and . P. Krishna Reddy, "Exploiting Semantics and Speculation for Improving the Performance of Read-only Transactions," in International Conference on Management o f Data COMAD 2008, Mumbai, India, December 17-19, 2008, pp. 162-173. [45] T. Ragunathan, P. Krisna Reddy and M. Goyal, "Semantics-Based Asynchronous Speculative Locking for Improving the Performance of Read-only Transactions," in Proceedings o f the 2010 Spring Simulation Multiconference, Orlando, Florida, 2010, pp. 238:1—238:4. 86 [46] W. Haque and P. R. Stokes, "Simulation o f a complex distributed real-time database system," Proceedings o f the 2007 spring simulation multiconference, vol. 2, pp. 359-366, 2007. [47] A. Maria, "Introduction to Modeling and Simulation," in Winter Simulation Conference, Atlanta, Georgia, United States, 1997, pp. 7-13. [48] J. Banks and S. J. Carson, "1NRODUCTION TO DISCRETE_EVENT SIMULATION," Proceeding o f the 1986 Winter Simulation Conference proceedings, pp. 17-23, 1986. [49] A. Silberschatz, H. F. Korth and S. Sudarshan., Database Systems Concepts, New York, NY, USA: McGraw-Hill, Inc., 2006. [50] Sun MicroSystem, "Java Language Specification," Palo Alto California : Sun Microsystem, vol. 1.4.2,2003. [51] C. S. Mullins, "Defining Database Performance," October 2010. [Online]. Available: http://www.dbta.com/Articles/Columns/DBA-Comer/Defining-Database-Performance70236.aspx. [52] G. C. Buttazzo, J. A. Stankovic , S. Superiore and S. Anna, "RED: Robust Earliest Deadline Scheduling," Proc. o f 3rd International Workshop on Responsive Computing Systems, pp. 100111, 1993. [53] C. L. Liu and J. W. Layland , "Scheduling Algorithms for Multiprogramming in a Hard-RealTime Environment," Journal o f the ACM (JACM), vol. 20, no. 1, pp. 46-61, Januray 1973. [54] W. Haque, Q. Pai and S. N. Mahmud, "Effect of Network Topology on the performance of Adaptive Speculative Locking Protocol," in Parallel and Distributed Computing and Systems, Dallas, USA, 2011. 87