PREDICTION AND PREEMPTIVE CONTROL OF NETWORK CONGESTION IN DISTRIBUTED REAL-TIME ENVIRONMENT by Ramandeep Dhanoa B.Tech. , Rajasthan Technical University, Jaipur, India, 2012 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA J anuary 2016 @Ramandeep Dhanoa, 2016 Abstract Due to ever increasing demand for network capacity, the congestion problem is inflating. Congestion results in queuing within the network, packet loss and increased delays. It should be controlled to increase the system throughput and quality of service. The existing congestion control approaches such as source throttling and re-routing focus on controlling congestion after it has already happened. However, it is much more desirable to predict future congestion based on the current state and historical data, so that efficient controlling techniques can be applied to prevent congestion from happening in future. We have proposed a Neural Network Prediction-based routing (NNPR) protocol to predict as well as control the network traffic in distributed real time environment. A distributed real time transaction processing simulator (DRTTPS) has been used as the test-bed. For predictions, multi-step neural network model is developed in SPSS Modeler, which predicts congestion in future. ADAPA (Adaptive Decision and Predictive Analytics) scoring engine has been used for real-time scoring. An ADAPA wrapper calls the prediction model through web services and predicts the congestion in real-time. Once predicted results are obtained, messages are re-routed to prevent congestion. To compare our proposed work with existing techniques, two routing protocols are also implemented - Dijkstra's Shortest Path (DSP) and Routing Information Protocol (RIP). The main metric used to analyze the performance of our protocol is the percentage of transactions which complete before their deadline. The NNPR protocol is analyzed with various simulation runs having parameters both inside and outside the neural network input training range. Various parameters which can cause congestion were studied. These include bandwidth, worksize, latency, max active transactions, mean arrival time and update percentage. Through experimentation, it is observed that NNPR consistently outperforms DSP and RIP for all congestion loads. 11 Acknowledgement Firstly, I would like to thank Dr. Waqar Haque for his guidance, motivation, enthusiasm, and immense knowledge in this field. His guidance and support helped me overcome many difficult situations in this journey. Throughout this research, his professionalism, attention to detail, and high quality of standards tremendously helped me to find myself in a better academic standing. I could not have achieved my goal without his constant support and generosity. I also would like to express appreciation to my supervisory committee Dr. Alex Aravind and Dr. Iliya Bluskov for their support and direction in my research. I would like to thank my colleagues for their ongoing support for discussing new ideas and problems. Last but not the least, I would like to thank my parents, my siblings, and my husband for their patience and encouragement , and for the sacrifice that they have made while I completed this research. 11 Contents Abstract Acknowledgments ii Contents iii List of Figures viii List of Tables xii 1 Introduction 1 1.1 Transaction Execution 3 1.2 Distribution of Data . 4 1.3 Transaction Deadlines and Priority Scheduling 6 1.4 Network Congestion and Control Techniques 8 1.5 Data Mining and Predictive Analytics . 10 1.6 Contribution . . . . . 13 1. 7 Thesis Organization . 14 lll 2 Related Work 2.1 2.2 2.3 15 Congestion Control Mechanisms 15 2.1.1 Slow Start Congestion Control . 15 2.1.2 Fast Retransmit and Fast Recovery 17 2.1.3 Random Early Detection . 17 2.1.4 Back Pressure Technique . 18 2.1.5 Choke Packet Technique . 19 2.1.6 Implicit and Explicit Congestion otification . Congestion Control using Predictive Analytics 20 2.2.1 Time Series Prediction 21 2.2.2 Neural Networks 24 .. . 2.2.2.1 Congestion control by throttling the source 25 2.2.2 .2 Congestion control by Re-routing 27 2.2.3 Fuzzy Logic . . . . . . . . . . . . . . 28 2.2.4 Other Techniques and Applications . 29 Summary . . . . . . . . . . . . . . . . . . . 30 3 Simulator Architecture, Neural Networks and Cloud-Computing 3.1 19 31 The Simulator - DRTTPS . . 31 3.1.1 Network Architecture . 32 3.1.2 Node Architecture .. 34 IV 3.2 Concurrency Control Protocol . 36 3.1.2.2 Preemption Protocol 37 3.1.2.3 Routing Protocols 37 3.1.3 Simulation Set-up . . . 40 3.1.4 Performance Analysis . 40 Neural Networks 3.2.1 3.2.2 . . . . .. . 41 Types of Neural Networks 44 3.2.1.1 Single Layer Perceptron 44 3.2.1.2 Multi-layer Perceptron . 44 Neural Network Training Process 45 3.2 .2.1 Preparing Input Data 3.2.2.2 Neural Network Type and Architecture . 47 3.2.2.3 Neural Network Training and Configuration 47 3.2.2.4 Analyze Model Performance 48 . 45 Cloud Computing . . . . . . . . . . 49 3.3.1 Model Deployment Process . 50 3.3.2 ADAPA in the Amazon Cloud . 51 Prediction Model and Implementation 53 4.1 Network Congestion Prediction Model 53 4.1.1 54 3.3 4 3.1.2.1 Determining Inputs of Network Model V 4.1.2 4.2 Implementation . . . .. . 64 4.2.1 Prediction Module 65 4.2.2 Trace module . . . 66 4.2.2.1 Tracer Class . 67 Adapa Wrapper . . . . 70 4.2.3.1 72 4.2.3 4.3 59 Neural Network Model ModelExecution Class 4.2.4 Routing protocols . 73 4.2.5 Execution Flow 74 Summary .. . . . . . 75 5 Experiments and Analysis of Results 5.1 Prediction Model Testing . . . . . . . 5.1.1 5.2 76 76 Model testing with non-trained parameters, but within the trained input range . . . . . . . . . . . . . . . . 77 5.1.1.1 Run 1 - High Congestion Load 77 5.1.1.2 Run 2 - Medium Congestion Load 81 5.1.1.3 Run 3 - Negligible Congestion Load . 84 5.1.2 Model testing with parameters outside the input training range 87 5.1.3 Summary . . . . . . . . . . . . . . . 89 Comparison between DSP, RIP, and NNPR 90 Vl 5.2.1 Impact of Bandwidth . . . . . 92 5.2.2 Impact of Page Update Rate . 95 5.2.3 Impact of MAT . . 97 5.2.4 Impact of Latency 98 5.2.5 Impact of Work-size 98 5.2.6 Summary . . . . . . 99 6 Conclusion and Future Work 101 6.1 Future Work . . . . . . .. . 102 Bibliography 110 vii List of Figures 1.1 Life Cycle of a Transaction . . . . . . . . . . . 4 1.2 Partitioned vs. Replicated Data Distribution . 5 1.3 Transaction Deadlines [1] . 6 1.4 Data Mining Process . . . 12 2.1 Slow Start Algorithm [2] . . . . .. . . . . . .. . 16 2.2 Fast Retransmit and Fast Recovery Algorithm [2] 18 2.3 Back Pressure Technique [3] 19 2.4 Choke Packet Technique [3] 20 2.5 Neural Network Structure [4] . 24 3.1 Network Architecture [5] . . . . 33 3.2 Workload Generator Attributes 36 3.3 Shortest path between Node 1 and 8 using DSP Algorithm 38 3.4 Variation Container . 40 3.5 Report Tool . . . . . 41 Vlll 3.6 Neuron Design [6] . . . . . . . . . . . 42 3.7 Neural Network Training Process [7] 46 3.8 PMML Sample Code . . . . . . . . . 49 3.9 Predictive Model Deployment Process . 50 3.10 ADAPA Control Centre Interface . . . 51 4.1 Bandwidth vs PTCT 54 4.2 MAT vs PTCT . 55 4.3 Update vs PTCT 56 4.4 Latency vs PTCT . 57 4.5 Worksize vs PTCT 57 4.6 Snapshot of links at 200th Tick 59 4.7 Neural Network Model 61 4.8 Type Node . . . . . . . 62 4.9 Neural Network Model Summary View 63 4.10 Predictive Importance Chart . 64 4.11 Architecture . . . . . . . . . . 65 4.12 Trace Module Class Diagram. 67 4.13 ADAPA Wrapper Class Diagram 71 4.14 Input XML Format 73 4.15 Execution Flow .. 75 lX 5.1 Hyper-Cube Network Topology 77 5.2 Analysis of Link 0-2 . 78 5.3 Analysis of Link 1-3 . 79 5.4 Analysis of Link 6-7 . 80 5.5 Analysis of Link 4-6 . 81 5.6 Analysis of Link 5-7 . 82 5. 7 Analysis of Link 2-3 . 83 5.8 Analysis of Link 3-2 . 84 5.9 Analysis of Link 1-5 . 85 5.10 Analysis of Link 7-5 . 85 5.11 Analysis of Link 6-2 . 86 5.12 Analysis of Link 4-6 . 88 5.13 Analysis of Link 5-1. 89 5.14 Analysis of Link 2-6 . 89 5.15 Performance of Prediction Model 90 5.16 Impact of Bandwidth (MAT - 30) 93 5. 17 Impact of Bandwidth (MAT - 60) 94 5.18 Impact of Page Update Rate (Bandwidth - 15) . 95 5.19 Impact of Page Update Rate (Bandwidth - 5) 96 5.20 Impact of MAT . . . . . . . . . . . . . . . . . 97 X 5.21 Impact of Latency 99 5.22 Impact of Worksize 100 Xl List of Tables 3.1 Routing table for Node 1 . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1 Parameters Set tings . . . 58 4.2 Congestion Load Ranges 60 4.3 Neural Network Training Simulation Runs 60 4.4 Neural Network Settings . . . . .. . . . . 62 5. 1 Neural Network Testing Simulation Runs 78 5.2 Prediction Result Snapshot of Link 4-6 . 81 5.3 Neural Network Testing Simulation Runs (outside input training ranges) 87 5.4 Baseline Experiment Parameters Settings . . . . . . . . . . . . . . . . XU 91 Chapter 1 Introduction Database system is a collection of information shared by many users [1] . In today's world, databases and database systems have become a crucial part of life . When a database system is accessed, read and write operations are executed. When these operations are executed on the database in a particular order, it is called a transaction [1], for example depositing money in a bank. To maintain integrity and consistency of the database, a transaction must follow ACID properties: atomicity, consistency, isolation and durability [8]. Atomicity makes sure that a transaction will fail , if any part of the transaction dies. Consistency ensures that a transaction will always leave the database in a consistent state, whether it is completed or not. When many transactions are executing concurrently, isolation guarantees that all t he transactions are running independent of each other. Durability implies that a committed transaction cannot be undone, even if the system crashes. Database systems can be classified as centralized and distributed. In a centralized database system, data is stored and maintained on a single node; whereas in a distributed database system many nodes run independently at different locations, and they interact and share data with each other via a communication network [9]. In distributed database systems, availability of data can be augmented by replication of data. In a fully replicated system, each node stores the entire database, thus allowing 1 easy access to all the database entities. In this system, there is no risk of loss of data if any node fails. However, it also raises issues related to concurrency control. In a partially replicated system, the database is replicated on a subset of the network nodes. In a real-time system, every transaction has a specified time allocated to complete. If a transaction does not complete in the assigned time, it is called a tardy transaction [8]. System's throughput is determined by the transactions completed in the assigned time. Distributed real-time database system (DRTDBS) [10] is a collection of databases scattered over different data sites connected via a communication network, where transactions have consistency and time constraints. The primary objective of DRTDBS [5] is to increase the number of completed transactions before their deadlines, while maintaining the serializability of transactions. If the outcome of the transactions running concurrently is same as of running them serially, then transactions are said to be serializable. In DRTDBS, nodes communicate through messages by finding an efficient link, as specified by the network. A link (pipe) has parameters, such as latency, bandwidth, source and destination node. Latency is the amount of time taken to send a message from source to destination node. Bandwidth represents the total amount of data which can be passed through the link at any given time. The number of messages being sent from source to destination node depends on the source-destination link's bandwidth. If the messages coming on the link are more than link's bandwidth, they wait in a queue. Because of resource and time constraints, every computer tries to complete its operations in the assigned time. However, when resource requirement surpasses the capacity of network, congestion occurs [ll]. In a distributed network, congestion occurs if a network link has queued messages waiting to get processed thereby decreasing the quality of service. Some effects of congestion are increased delay and dropping of packets. Congestion should be controlled to improve the system throughput and quality of 2 service. There are many techniques to control congestion [11], but these techniques control the congestion after it has happened. With an increased demand for highly efficient networks, it is important to analyze the potential network traffic beforehand and predict the traffic [12], so that efficient control techniques can be applied. Our hypothesis is that network congestion can be reduced by predicting traffic and dynamically re-routing the messages, thus increasing the percentage of completed transactions before their deadlines. 1.1 Transaction Execution Transaction is a list of operations executed as a basic unit of data processing in database systems. The transaction 's operations to read and write a database object can be denoted by Rr(O) and Wr(O) , respectively. A transaction can be either local or global depending on the way it needs to access data [l]. Local transaction demands data access at its local node only, whereas global transaction requires data access at multiple nodes and may spawn many sub-transactions. In DRTDBS , execution of a transaction is managed by processes executing on different nodes to collaborate between a transaction and its sub-transactions. The process executing on the node where transaction generates is called coordinator, and processes executing at other nodes on behalf of coordinator process to support global execution of transaction by synchronizing transaction with its sub-transactions are called cohorts [l]. When cohorts complete their tasks , they send acknowledgments to their coordinator, resulting in the successful execution of a transaction. When a transaction executes , it goes through various stages as shown in Figure 1.1. • Active State - In this stage, transaction starts its execution. • Partially Committed - In this state, when cohorts complete their assigned tasks, they send an acknowledgment to the coordinator process. 3 Figure 1.1: Life Cycle of a Transaction • Aborted - If transaction's execution gets interrupted by terminating the current ongoing operation, it gets aborted. Thereby, the transaction is rolled back i.e. database is returned to its starting state. Once the transaction is aborted, it can be restarted or terminated. • Committed - When coordinator receives an acknowledgment from all the cohorts, the transaction successfully completes or commits. All the changes resulting from this transaction are permanently updated to the database. In real-time systems, availability of data at the required time is very significant. When the entire database is stored on each node, data is available to a transaction on its local node. However, when the database is partitioned across the nodes, the data may not be directly accessible. Hence, the type of data distribution decides data's accessibility. 1.2 Distribution of Data In DRTDBS , database is distributed across the system and the system's performance significantly depends on the way data is distributed. There are two ways in which data can be distributed: partitioned and replicated. 4 • In partit ioned distribut ion, t here is only one copy of t he database, which is distributed across some or all nodes as shown in Figure 1.2. An advantage of such distribution is t hat it enhances t he performance of t he system because of easy management , back-up, and restore of data. Since a small volume of data is accessed, data retrieval operations are also efficient. On t he contrary, a drawback is t hat failure of one node may lead to t he breakdown of t he entire system because t he failed node may have high valued data t hat is accessed by different nodes. • A part ially replicated system stores copies of database fr agments at each node. In a fully replicated system, there are multiple copies of t he database and each node stores one copy of t he entire database. It increases t he availability of data facilitating transaction's execut ion. Even if one node fails, t he syst em will not collapse because other nodes will have t he required data. However, maintaining consistency is t he main problem here. To prevent inconsistencies, if one copy of t he database gets updated, t hat change should get reflected in other copies as well. This can be achieved by concurrency control protocols [13]. Partitioned Data Full y Repl icated Data Figure 1.2: Partitioned vs . Replicated Data Distribut ion 5 The primary factors affecting the decision to use data replication are database size and usage frequency. When the usage frequency is excessive, data replication decreases the cost of data retrieval operations [14] . 1.3 Transaction Deadlines and Priority Scheduling In DRTDBS , the performance of the system depends on the number of transactions completing before their deadlines. In general, transaction's deadline can be classified into three types namely hard, soft and firm deadline [l] . For each type of deadline, there is a value associated that declines as the function of completion time if deadline is missed. Value Value - - - - - t - - - - Time Arrival Ti me Deadline Hard Deadline Value - - - - ~ - - Time Arrival Time Deadline Arrival Time Soft Deadline Deadline Firm Deadline Figure 1.3: Transaction Deadlines [l] • In hard deadline, transaction is assigned a rigid deadline to complete its operations. If deadline is missed, the value gets reduced to negative amount , indicating disastrous consequences on the system. The examples of hard deadline applications include radar tracking system and military applications. So, to handle hard deadline application effectively there is a need to design a fault-tolerant system. Redundancy is a standard approach to attain fault tolerance [1 5]. There are three types of redundancies: hardware redundancy, software redundancy, and time redundancy [15]. Hardware redundancy means critical hard6 ware components of the system are duplicated, when failure occurs duplicated components replace the faulty ones. Software redundancy refers to the software components duplication, which are functionally identical. It comes into effect when the working software crashes/ fails. Time redundancy is giving extra time to execute failed operations (Failed operations can be software failure or any other kind of failure). • A soft deadline allows a transaction to miss its deadline and allocates extra time (slack time) to complete its operations. Once deadline is missed, the value starts decreasing; but the transaction is still allowed to complete its tasks. When the transaction has still not completed within the additional allocated time, its value becomes zero. The examples can be stock trading and payment of taxes. • A firm deadline is same as the soft deadline, but it does not provide extra time to transaction to complete its operations. Once deadline is missed, the value becomes zero. For example - incorrect forecasting of a sales company can lead to a great loss in terms of revenue. Priorities are allocated to transactions in order to define their significance and order of executions. Priority assignment algorithms can be predominantly categorized into three types: static, dynamic and hybrid [l]. In static priority scheduling algorithm, transaction's priorities are fixed and do not change once they are assigned. When priorities are assigned at run time, it is called dynamic scheduling of transactions. Dynamic scheduling algorithm assigns priority to transaction considering the factors such as deadline, slack time (extra time given to a transaction after deadline to complete its operations), execution time, and others. A hybrid algorithm is a mixture of static and dynamic priority scheduling i.e. some transactions have fixed pribrities, while others get assigned at run time. Some of the examples of priority scheduling algorithms are described below [8] : • Earliest Deadline First (EDF ): In EDF, transaction with the closest deadline is allocated the highest priority. A limitation of this algorithm is that it 7 may assign higher priority to a transaction which may have almost missed its deadline, and thus restricting other transactions to execute which may complete before the deadline. • First Come First Serve (FCFS): According to FCFS, transaction with earlier arrival time will be designated higher priority. It does not consider deadline information of the transactions , and may assign higher priority to a transaction having longer deadline and vice-versa. • Shortest Job First (SJF): SJF assigns the highest priority to the transaction having minimum execution time. However, SJF does not perform well in applications where execution time of transactions cannot be computed beforehand. • Minimum Slack First (MSF): According to MSF, transaction having minimum slack time will be assigned the highest priority. These priority scheduling algorithms decide t he order in which transactions execute in case of congested scenarios, hence increasing the overall performance of the system. 1.4 Network Congestion and Control Techniques Network congestion is defined as a state when the quality of service deteriorates because of an increase in network load [11 ]. It occurs when data arriving on node's link exceeds its bandwidth capacity. During congestion, data gets queued on the node's buffer, which results in increased packet delays and hence decreases the number of transactions completed before the deadline. Congestion cannot be just determined from queue's length, it also depends on link's bandwidth and latency. For example, suppose Node A needs to send 40 message units to Node B. Iflink A-B has 20 message units bandwidth, with 5 seconds latency, it will take 10 seconds to clear Node A's queue. But if bandwidth is 5 and latency 10 seconds, it will take 80 seconds to clear the queue. So, even if queue length is same in both the scenarios, bandwidth and 8 latency play a major role in determining the congestion of link. Congestion can be classified as low, medium or high depending on the threshold time limit(time taken l to clear a node's queue). We define congestion of a link as follows: . I I:n-o sizei 'BWi x L1 Congestwn 1 = I (1.1) where sizei is the size of queued message i, BW is the bandwidth of the link l, and L is the latency of the link. In DRTDBS , if queues on individ~al nodes are not cleared/ processed for a prolonged time, transactions will start missing their deadlines at a very high pace, consequently engendering low system throughput. To be efficient, DRTDBS should have a congestion control algorithm that can manage traffic efficiently. The aim of congestion control algorithm is to increase the network efficiency i.e. throughput of the network by decreasing the delay and packet drops. Congestion control schemes are classified as [11 , 16]: • Open loop control: Open loop control does not use any feedback to control the system, for example - light switch automatically gets turned off after certain duration [11]. It has no knowledge of the output state i.e. it does not inspect system dynamically. It is further divided into source control and destination control. - Source control [17] tries to control traffic at the source level by controlling arrival rate of traffic, for example - Input buffer limit algorithm [18] controls traffic by applying a limit on input buffer, which eventually prevents the input traffic from entering the system. - Destination control tries to control the traffic at intermediate or destination nodes , for example - Selective packet discarding policy [19] simply discards the packet when there are no empty buffers. • Closed loop control: Closed loop system uses feedback from the output to control the system [20], for example, when the room temperature is high, the 9 thermostat sends a signal to air conditioner to switch on the cooling unit. Closed loop system has complete knowledge of the output state i.e. it analyzes network performance dynamically. It is further divided into global and local feedback control. - In global feedback control, feedback is attained from complete path i.e. destination to source, for example - Rate based control [21] examines the incoming traffic on each node and compares it with threshold capacity. If incoming traffic is greater than threshold value, it tunes the rate of incoming traffic and broadcasts the messages to all nodes in the network. - According to local feedback control, feedback is attained just from the neighbour node, for example - Hop by hop control [16]. In this algorithm, each node will monitor its outgoing link traffic to the neighbour node. It then realizes the congestion status using feedback information from neighbour and adjusts traffic rate dynamically. There are many congestion control techniques (such as controlling the arrival rate of packets in the network, discarding packets and routing techniques) proposed by the researchers over many years [16 , 18, 19, 22-24]. However, these algorithms come into effect once congestion has been discovered by the system. An ideal approach would be to analyze the present data, predict network congestion and take corrective actions to prevent the congestion. 1.5 Data Mining and Predictive Analytics Data mining is a mechanism to extract meaningful information from data through different mathematical algorithms [25]. This information can be discovering patterns or trends, finding correlation or clustering of data. Data mining is a crucial step in the process of predictive modelling. For example - to predict which customers are most likely to purchase the new product , data mining provides hidden and relevant information from the present data and identifies meaningful trends or patterns in 10 data. By analyzing present scenarios and predicting events using statistical and mathematical techniques, predictive analytics determines uncertainty and risk within data, as a result deciding optimal outcomes for the system. Based on the business requirement , predictive analytics life cycle has the following phases [25]: • Data Understanding - This phase includes understanding business/ system needs, determining system goals and planning accordingly. • Data Preparation - Data preparation collects, selects and verifies data according to system objectives. In this phase, data cleaning, integration and transformation are the vital steps. Data cleaning handles missing or irrelevant information present in the data. Disparate data sources from different locations are integrated into one data source through data integration. Data transformation converts the format of data from one type to another. • Data Modelling - In data modelling, a model is developed trying to achieve the business goals. Modelling techniques can be classified into the following categories [25]: Clustering is grouping of data objects based on the similarity in one way or other (distance, nature and characteristics). Each group, called cluster, is dissimilar to other clusters. Clustering helps to identify the natural groupings present in the data. It is being used in many applications such as pattern recognition, image processing, and others. Association modelling discovers relationships between one or more variables in a dataset. The relationships defined between attributes are called association rules. For example, customers who buy cheese in grocery market are most likely to buy bread as well. So, bread and cheese relationship can be expressed as an association rule - cheese implies bread, with 90 percent probability. Classification predicts and categorizes the data based on their attributes. For example, the classification model can be used to identify student as a 11 good or bad scorer based on student 's attributes such as attendance details and exam scores. • Result Interpretation - The model is analyzed and evaluated in this phase. One way to evaluate a model is to randomly divide the large dataset into three sub-sets, which are then used for training, validation and testing. The training set is used to train the model; validation set assesses model's performance by analyzing the trained model; and testing set is used to estimate the error rate of the model. If results are not satisfactory, this phase goes back to data preparation phase (Figure 1.4) to remove model's deficiencies; else next phase is followed. • Decision Optimization - In this end phase, an optimal decision is taken on the basis of result evaluation phase. One best solution is chosen among many feas ible solutions considering factors like cost, profit , and others. Vb ci&tr = ~ AW Figure 1.4: Data Mining Process A desirable feature of analytics is to develop a predictive model, which is readily available to the users present at different locations. The data generated by predictive models is extremely critical to the businesses, therefore the model should be accurate and accessible to the decision makers on-demand. The deployment of predictive models on the cloud allows users to deploy and score the predictive models in realtime (explained in Chapter 3). 12 1.6 Contribution Due to increase in network demand, congestion problem is increasing. Congestion results in queuing within the network, packet loss and increased delays. With an increased demand for high-performance networks , there is a need to increase the network's throughput and quality of service. Various congestion control techniques are discussed in [11 , 16]. However, these techniques attempt to control congestion after it has already happened. It is important to proactively analyze network traffic and predict potential congestion before it occurs, so that efficient controlling techniques can be applied as a preemptive measure. We have proposed a protocol to predict congestion as well as control the network traffic in distributed real-time environment using distributed real-time transaction processing simulator (DRTTPS) as a test-bed. For predictions, multi-step neural network technique is used, which predicts congestion (1 step ahead , where 1 step is 100 ticks) . After predicting the network traffic, the predicted congested link's messages are re-routed to other links. To compare the proposed work with other techniques, two routing protocols are implemented - Dijkstra's Shortest Path (DSP) algorithm [26] and Routing Information Protocol (RIP) [27] . The primary metric used to analyze the performance is the percentage of transactions completed before their deadline. Some of the key considerations of our work are listed below: • To meet real-time system requirements, the proposed technique ensures that prediction results are attained in a reasonable time. • Instead of reducing the traffic in system or increasing the inter-arrival time of packets [17, 20 , 21, 28 , 29], network traffic is controlled by routing techniques . • Invocation of prediction model is a controllable parameter that is configured by the user according to the requirement (by default, it is set to 100 ticks). 13 1. 7 Thesis Organization The rest of the thesis is organized as follows. In addition to literature survey, Chapter 2 also contains some information about different data mining methods. Chapter 3 provides a brief discussion of our test-bed i.e. DRTTPS , neural networks and cloud computing. Chapter 4 explains the prediction model developed to predict congestion, and gives the implementation details. Chapter 5 presents experimentation and analysis of results. Chapter 6 provides the conclusion and directions for future research. 14 Chapter 2 Related Work With an increased growth in network applications, network congestion is growing rapidly. Congestion should be controlled not only to increase system throughput, but also to improve quality of service and data transmission reliability. Many congestion control schemes have been proposed by researchers to control and prevent congestion. This chapter is divided into two sections. The first section outlines general congestion control mechanisms proposed in the literature, and second section describes congestion control using data mining and predictive algorithms. 2 .1 Congestion Control Mechanisms In literature, many approaches have been proposed to handle congestion, either by preventing or controlling or avoiding mechanisms. Some of the congestion control mechanisms are discussed in the following subsections: 2.1.1 Slow Start Congestion Control Slow start [22] is a type of congestion control mechanism, in which the sending rate of packets is proportional to the transmitting rate of the network. Initially, sender determines network's capacity by transmitting one packet to the receiver. When a node receives the packet, it sends an acknowledgement (ACK) to the sender. The 15 sender then increases the congestion window (cwnd) size to 2, and sends 2 packets as shown in Figure 2.1. After receiving each ACK from the receiver, it elevates congestion window size by 1. For example, when ACK 1 and ACK 2 are received, congestion window size becomes 3 and 4 respectively (as shown in Figure 2.1). This results in exponential growth of congestion window size over round trips. Once cwnd is more than the threshold value, slow start algorithm increases cwnd linearly, rather than exponentially (i.e. window size is increased by 1 for each round trip time). Receiver Sender Cwnd=l k to Cwnd=2 Cwnd=3 Cwnd=4 Packets Packet6 • ••• • Figure 2.1: Slow Start Algorithm [2] Slow start is crucial in avoiding congestion collapse [30]. Congestion collapse occurs when the sending rate of packets exceeds the network capacity. Thus, when a sender sends a large file , network can be overloaded and packets may start dropping. Slow start algorithm overcomes this problem but can cause delays in transmission, 16 which can be a significant limitation for real-time applications. 2.1.2 Fast Retransmit and Fast Recovery Fast retransmit and fast recovery [22] is used to promptly regain the lost packets. Unlike transmission control protocol (TCP 1 ) , it does not use a transmission time-out to resend the packets. If a node receives a data segment which is not in sequence, it will send a duplicate acknowledgement. Duplicate acknowledgement (DUP ACK) is an acknowledgement sent by a node to sender while receiving an out of order data packet. For example, when sender transmits packet 1 to receiver, upon receiving the packet receiver sends an ACK 1 to sender and expects packet 2 from sender. Now if packet 2 gets lost and receiver receives packet 3, it will send again ACK 1 to sender (as shown in Figure 2.2), which is called duplicate acknowledgement. If sender receives three duplicate acknowledgements from the receiver, it will con- sider that data packet (packet number greater than duplicate ACK number) is lost and retransmits the packet. This algorithm works efficiently when data packet losses are not frequent. 2.1.3 Random Early Detection Random Early Detection (RED) is a congestion avoidance technique proposed by Floyd and Jacobson [23], which controls congestion by dropping packets when queue size exceeds a threshold value. In this technique, once average queue length surpasses maximum threshold value, new arriving packets are discarded or marked by setting a bit in packet 's header with a certain probability, where probability is a function of average queue size. If average queue size is less than the minimum threshold value, packets are allowed to enter into the queue. Contrarily, May et al. [31] argued and experimentally proved that if the average queue size exceeds the maximum threshold value, then dropping good packets do not 1 TCP is a connection-oriented protocol responsible for setting-up the connections and handle data-communication over network layer. 17 Receiver Sender Packen Packet2 Packet3 DUPACK 1 DUPACK 2 RePacket2 DUPACK 3 ... .. Figure 2.2: Fast Retransmit and Fast Recovery Algorithm [2] increase the performance of the system. They demonstrated that RED implemented with small buffers does not alleviate system throughput significantly, whereas large buffers increase system throughput but parameter setting is challenging. 2.1.4 Back Pressure Technique In back pressure technique [32], once a node becomes congested, it ceases receiving packets from its immediate upstream node. It may result in the congestion of immediate upstream nodes, which consequently stop receiving packets from their preceding nodes as shown in Figure 2.3. This technique starts with congested node and disseminates in the opposite di- 18 Back Pressure Source Node Congested Node Back Pressure Destination Node Back Pr essure Destination Node Congested Nodes Figure 2.3: Back Pressure Technique [3] rection of packet flow , to the source node. However, this technique can be of limited value, since it can only be used for connection-oriented networks. 2.1.5 Choke Packet Technique Choke packet [32] is a packet generated by the congested node and transmitted back to the source node for congestion notification as shown in Figure 2.4. Inevitably, source has to reduce its sending rate until it stops receiving these packets. Unlike back pressure technique, choke packet technique gives congestion warning directly to the source node, skipping intermediate nodes. Therefore, it is a fast technique to notify sender to decrease its sending rate. This technique is good when there are fewer source nodes causing congestion at a particular time. But, when the congested node has queued data from different sources, it is difficult to determine where to transmit choke packets. 2.1.6 Implicit and Explicit Congestion Notification In implicit signalling, source node waits for a hint / signal to take congestion control steps in the system [24]. From those signals, sender believes that there is congestion in the system. For example, when sender does not receive confirmation of sent packets, 19 , - - - - - - - ; Choke Packet > - - - - - ~ Source Node Congested Node Destination Node Figure 2.4: Choke Packet Technique [3] it assumes there is a congestion. As a result , sender reduces its transmission rate. Explicit Congestion Notification [24] is a congestion avoidance technique, which notifies congestion by setting a congestion notification bit , rather than dropping the packets. A Congestion Experienced (CE) bit is included in each packet's header, which is set by routers to signal congestion. So, when the network starts approaching congested state, receiver sends an acknowledgement to sender. In response to this acknowledgement, sender reduces the sending rate of packets and it further informs the receiver about the reduction in sending rate, so that it stops resending acknowledgements. 2.2 Congestion Control using Predictive Analytics In the developing world of networking, more emphasis is placed on network's reliability and efficiency. To ensure this, network congestion should be handled in a way so that it minimizes network breakdowns. There is a necessity to detect network problems as soon as possible, so that preventive measures can be taken accordingly. Hence, historical network traffic needs to be analyzed to predict network behaviour so that efficient congestion control techniques can be applied. There are two approaches to build a prediction model: empirical and analytical. Empirical approach is based on the observation and experimentation. Model is experimented on a real system or a simulator representing real system. Simulator-based models are the most common models to predict network congestion because simula- 20 tor captures t he details of the underlying system and provides a variety of real-time scenarios to test the model. On the other hand, analytical models are less expensive, easy to evaluate but complex to build. Some of the techniques proposed in literature are discussed in the following subsections: 2.2.1 Time Series Prediction To forecast , past data is analyzed to identify trends and seasonality. Using this information, data is projected into the future using statistic modeling techniques. Time series is an ordering of data noted at regular intervals of time, for example, monthly sales of a store. It is very important in the time series to analyze each point 's correlation (measure of degree of association) wit h the previous point in t he series. Two functions to analyze this correlation are discussed in [33] and presented below: • Autocorrelation function (ACF): It is defined as the correlation of an observed value with its past values, for example, autocorrelation of X time series at lag 2 is the coefficient of correlation between X(t) and X(t-2) , and similarly X(t-2) with X(t-4). So, it is also expected that there is a correlation between X(t) and X(t-4). • Partial Autocorrelation function (PACF): It is defined as the correlation of the observed data with its lag after removing the observed data correlation with lower order lags. In order to predict through time series, the stationary assumption should be satisfied, which is defined by ACF and PACF functions [34]. If the stationary assumption is not satisfied, Mean Square Error (MSE) will be high , leading to inaccurate predicted results. MSE indicates the difference between predicted and estimated values. Some models for time series prediction are described in [12] and summarized below: 21 • Auto Regressive (AR) model : AR model is based on the belief that each observed value in the time series is correlated to its previous lag values, for example, if the order of AR model in time series X is 3, it means X(t-3) is required to predict X(t). This model is used when the present data is related to the previous data, for example - student's previous scores are required to predict present scores. • Moving Average (MA) model : In this model, a moving average is measured to analyze the seasonality or trend of data, for example, moving average order of 2 indicates that deviation of previous two values from the mean should be inspected to predict the current value. This model is generally used to forecast financial data and stock prices. • Auto Regressive Moving Average (ARMA) model : This model consists of AR and MA models, called ARMA(p, q) model, where p is the order of AR model and q is the order of MA model. • Auto Regressive Integrated Moving Average (ARIMA) model : When the data is not stationary, some differencing or transformation is applied on the data to satisfy the time series stationary assumption. If the time series data is differenced by the order of d, t he model is called ARIMA(p, d, q). Different time series models have been used to predict network traffic. It is, however, assumed that the data is stationary [34]. Stationary data has constant mean, constant variance, and the covariance is independent of time. The stationary assumption can be measured by ACF and PACF. To predict the network packets, traffic monitoring was done for one year by connecting to intra-network [12]. The total number of packets was measured every hour. ACF and PACF functions were not satisfied when predictions were done with the collected data or even after transforming the data. The data was then classified first by monthly, then by weekly periods; still the stationary assumption was not satisfied. Finally, when the data was classified 22 on a daily basis, the stationary assumption was satisfied and network packets were predicted using AR model. Jung et al. [12] used AR model to predict network congestion, followed by another method [33] which focuses on controlling the traffic using routing techniques. Each node has its own routing table and routing decisions are made using Dijkstra's algorithm. The proposed algorithm checks if the predicted packets are greater than the given bandwidth and updates the routing table accordingly. etwork Simulator- 2 (NS-2) [35] has been used as a test-bed to compare the proposed algorithm with OSPF routing protocol and prove algorithm's efficiency. This work is similar to our work with following exceptions: • Unlike Jung et al. 's research, neural network is being used in this research to predict network congestion because DRTTPS has many parameters affecting the congestion value (queue length) , and neural network can understand the complicated relationships between the parameters and predict non-linear complex functions . If one parameter is tweaked, congestion value changes, and neural model analyzes the congestion successfully (shown in Chapter 5). • Our prediction model is robust in a sense that it can analyze and predict different types of congestion loads (low, medium or high) in the system. • Routing algorithm is dynamic i.e. if congestion is not predicted, then routing tables are not updated. Long-range dependence and self-similarity in larger time span are the characteristics which should be captured by a good traffic model [36] . Zhou et al [36] proposed a model which is a combination of linear time series ARIMA and non-linear time series GARCH 2 model. Three separate time scales have been used to predict the network traffic from one-step-ahead to k-step-ahead prediction. It captures long as well as 2 GARCH (36] is a non-linear time series model used to capture the varying variance over time. 23 short-range dependence. The model was compared with FARIMA 3 model to prove its efficiency. 2.2.2 Neural Networks Artificial neural network is influenced by the biological nervous system, such as brain. The basic units are neurons and these units are organized in layers [4]. There are three layers in neural network (Figure 2.5) explained below: Input layer Hidden layer Output layer Input #1 ~ Input #2 -. Output Input #4 -, Figure 2.5 : Neural Network Structure [4] • Input Layer: Units in the input layer are the inputs of neural network. • Hidden Layer: Depending on the network structure, there can be one or more hidden layers in the neural network. Hidden layer transforms the input data and capture non-linear dep endencies in the data through activation functions (explained in Chapter 3). 3 FARIMA [37] model is capable to capture the property of real traffic with long-range and short- range dependent behavior. 24 • Output Layer: Output layer represents the output of the system. The edges through which these units are connected in the layers have some weights , representing a numeric value controlling the input. Neural network learns through training i.e model is fed with many datasets of known outputs. As training continues, the model keeps on adjusting its weights according to the input , and gradually becomes more accurate. Because of an efficient learning mechanism of neural network, it can predict network congestion with good accuracy [20, 28]. After prediction, congestion can be controlled by many ways. One way is to throttle the input arrival rate [20, 28], or apply different routing mechanisms [38- 40]. The following subsections explain the congestion prediction done through neural network model and congestion control by throttling the source or applying different routing techniques. 2.2.2.1 Congestion control by throttling the source Bivens et al. [28] have used simple feed forward neural network to predict the source of congestion. Once the congestion is predicted, flow rate 4 restriction is applied to the source node which is responsible for congestion. This technique is able to detect and correct congestion in almost 90 percent of the cases (out of 31 cases , congestion is correctly predicted 27 times) [28]. Network Simulator (NS), a discrete event simulator, is used to model the traffic patterns. In this simulator, a topology is chosen where all the network nodes are trying to send messages to one node at a random bit rate to depict the real network traffic. For each node, statistics are recorded during the simulation run, such as the packets received and queue size. The bandwidth allocation to links is random and the latency is set to a constant value. A control agent has been implemented that executes at a polling interval and contains two programs - C wrapper and MetaNeural Network Software. The communication between NS simulator and these two programs is done through files. Initially, C wrapper is called from the simulator, which reads the files updated by the simulator and then performs calculations like the average number of packets, the variance of 4 Flow rate is the number of messages moving in a link/ pipe in a given time frame 25 packets, and the third moment 5 . These values are normalized and then sent to neural network to make a decision. Neural network program generates the prediction result in another file which is converted to a readable format by C wrapper so that the simulator can process the file. Once the congested node is predicted, bit rate is reduced by adding a small amount of time to the sending interval. When the incoming traffic packets exceed the outgoing packets, congestion is reduced by controlling the traffic rate [17, 20, 21 , 28, 29]. A feedback control algorithm is proposed in [21] to predict the buffer occupancy L step-ahead through multi-step neural network. It also estimates the resources required through Back-Propagation (BP) neural network which is then used by the source node to adjust the sent-out rate accordingly. With the help of simulation model, it is shown that high prediction accuracy can be achieved by using fewer predictive steps. Thottethodi et al. [17] proposed a self-tuned mechanism which throttles the source upon congestion detection. Congestion is estimated by comparing the global information of network with the threshold value. If the global estimate is larger than the threshold value, packet injection is controlled. Threshold value is not static; it gets changed through self-tuned mechanism. Liu et al. [29] considered queue length as a measure to estimate the performance of Asynchronous Transfer Mode (ATM) network. The congestion control of Available Bit Rate (ABR) service in ATM networks is achieved by implementing predictor and controller in the system using BP neural networks. The future arrival rate of traffic is predicted with metrics, such as past arrival rates and bandwidth. The controller predicts the queue length by taking inputs like predictive available bandwidth, queue length, control law 6 and their historical values. By using fairness algorithm at different connections, dynamic fair rate is allocated to each virtual circuit. At last , the performance of predictor and controller is compared to FARIMA model to prove its efficiency. 5 Third moment is used to define the skewness of numbers . 6 Control law computes source arrival rate 26 Fan and Mars [20] predicted the video traffic by finite impulse response (FIR) neural network and controlled congestion by throttling the input arrival rate. FIR neural network is a modification of conventional neural network, where network's weights are replaced by FIR linear filter. 11 FIR means that for an input excitation of finite duration, the output of the filters will also be of finite duration 11 [20]. Ogras and Marculescu [41] pred.icted the congestion on Network-on-Chip (NOC) and proposed a flow control algorithm which controls the total number of packets in the network. The primary drawback of both the aforementioned work is that they are reducing the input arrival rate, rather than controlling the existing congestion. 2.2.2.2 Congestion control by Re-routing With the increasing complexity of network structure, it is difficult to find the best path [30]. In [38], two approaches are implemented using neural networks to resolve the routing and congestion control problem. The first approach uses feed-forward neural networks. This neural network indicates whether the link is congested or not by receiving input , such as the average number of packets, the variance of packets and the polling flag of sending packets. The second approach uses a recurrent neural network to decide the complete path from source node to destination node. eural network inputs are source node, destination node, link time costs and congestion status (output obtained from first neural network). The best path is obtained as an output through several iterations. Both approaches are applied to two different network topologies and demonstrated promising results. Prediction models are much more efficient as compared to the mathematical models [39]. Mohan et al. [39] has implemented two approaches to predict the congestion free path. In the first approach, association rule mining and traditional artificial neural network are used. Association rule mining defines the constraints , rules and statements derived from the data. Neural network takes the input like packet drop, response time and node degree, and yields output as the congestion weight that is used to determine the best path. The second approach is an improved version of 27 the feed-forward neural network, called self-motivated functional link feed-forward neural network. With an improvement in the architecture, the neural network was trained with additional inputs to give the best reliable path. Finally, the two approaches have been compared to prove that the prediction error of the proposed work is comparatively less as compared to the traditional feed-forward neural network. Barabas et al. [40] incorporated neural network with multipath routing algorithm (Situation Aware Multipath algorithm) to improve the performance of the congested A comparison of proposed work has been made with OSPF and EMP network. routing protocols. Through experimental results, it is proved that the percentage of lost packets is reduced significantly and hence the performance of the network is improved. 2.2.3 Fuzzy Logic The fuzzy logic technique has been used by many researchers to predict the network congestion [42- 44]. In this context, fuzzy logic scales the degree of congestion, rather than defining complete congestion or not. Xiang et al. [42] developed a fuzzy neural network 7 to predict the arrival rate of traffic in future. After predicting arrival rate, the queue length is calculated by Lindley equation [45]. If the queue length is estimated to overflow, encoding rate of the source is reduced by 25 percent of the current sending rate. The fuzzy logic approach was examined with BP network and no-feedback control method, and concluded that its packet discarding rate is much smaller as compared to these methods. Swathiga and Chandrasekar [44] developed a fuzzy logic system to predict congestion level in wireless sensor networks. The congestion level is scaled as low (Al), medium (A2) and high (A3) (based on threshold values). After a certain interval, each node in wireless sensor networks measures node degree, data arrival rate and queue length. These three values are accepted as input by the fuzzy logic system, and the congestion level is yielded as an outcome. If the congestion level is Al , no control algorithm is applied. However, if the congestion 7 Fuzzy neural network is a combination of fuzzy logic and neural network. 28 level is between A2 and A3 , adaptive rate adjustment technique is triggered. In this technique, node sends a rate regulation message to upstream nodes and a new data sending rate is generated using current rate, node degree and queue length. NS-2 has been used to implement the proposed technique and compare with Hybrid congestion control [46] in wireless sensor networks. It has been shown that the proposed technique is superior by analysing a number of performance metrics such as average packet delivery ratio and packet drops. 2.2.4 Other Techniques and Applications A Kalman-filter based prediction technique is proposed in [47], [48]. Haught et al. [47] extended the work done by Stuckey et al. [48] on Kalman-filter based prediction. Stuckey et al. worked with very small network whereas Haught et al. are dealing with complex network structure. Kalman filters are placed on network links to record data periodically. The sample rate of the filter is one second. Periodically, it records the queue size and current arrival rate. It predicts the queue size at next interval based on t he current and past queue sizes. So, the arrival rate of the packet is controlled by analyzing the predicted queue size with the help of control algorithm implemented in the network. NS-2 is used to implement the proposed algorithm. A new system called Network Bandwidth Predictor (NBP) to forecast the network bandwidth is proposed in [49]. NBP uses Network Weather Service (NWS) 8 to gather traffic statistics. The raw data is further processed by Network Traffic Pre-Processor and neural network is trained with the input such as timestamp, minimum and maximum number of bits in one second in that bin size (the period at which user wants to make prediction) , average number of bits in one second, and the predicted value. By testing many real-time datasets, NBP prediction mechanism is shown to be superior to NWS. A Graphical User Interface (GUI) is also provided, which gives a report for analysis and accuracy comparison with NWS . Neural networks have also been used to predict road traffic with high accuracy 8 NWS is a methodology which measures the hop-by-hop available bandwidth on all links. 29 [50 , 51]. Yu et al. [50] successfully captured bursty nature of traffic by developing a back-propagation feedforward neural network model. Hussein Dia [51] has developed dynamic neural network models (time-lag recurrent network and hybrid networks) to predict short-term traffic. The models are trained with the speed measurements from the historical time intervals. Through experimentation, it has been proved that high degree of accuracy is obtained when speed data was predicted up to 5 (or 15 minutes) into the future. 2.3 Summary Congestion Control techniques using prediction algorithms are far more superior than general congestion control techniques in terms of network's performance. For congestion prediction, it has been demonstrated that neural network is a very feasible method because of its highly sophisticated learning mechanism and complex computational capability. To predict congestion, the input parameters for a predictive model are generally based on the packet arrival rate and queue length. Many solutions are proposed to control congestion by reducing/ controlling the sending rate. But decreasing traffic rate is like avoiding congestion, rather than controlling. More efficient techniques are needed to control the congestion without reducing the existing traffic rate. 30 Chapter 3 Simulator Architecture, Neural Networks and Cloud-Computing This chapter contains three sections. The first section explains the design and architecture of the distributed transaction processing simulator (DRTTPS), which is used as a test-bed in this research to predict network congestion. The second section gives an overview of neural networks, and the training process. The last section discusses the importance of deployment of predictive models on the cloud. It further describes ADAPA on the Amazon cloud and the model deployment process. 3.1 The Simulator - DRTTPS Studying and conducting experiments directly on a real system is not feasible due to cost, time, complexity and error-prone nature. Therefore, simulation is used to understand, model and analyze the system. A simulation model describes the realsystem workflow and relationship between different entities. Simulation of a system can be either continuous or discrete [52]. When the state of the system continuously changes over time, it is called continuous event simulation. For example, an air-plane has state variables - velocity and position, which change continuously with respect to time. In discrete event simulation, the state of the system is based on the occurrence 31 of events, which occur at discrete points in time. The banking system is an example of discrete event simulation, where the total number of customers is one of the state variables. This variable changes only by the occurrence of events such as the arrival of new customer and departure of the customer after being served. To predict and control network congestion in distributed real-time database system, distributed real-time transaction processing simulator (DRTTPS) is used as a test-bed. DRTTPS [5] is a discrete event simulator developed to simulate distributed real-time transaction-based database system. It enables the user to configure various parameters through its highly interactive graphical user interface. Different protocols such as routing protocols, concurrency protocols , pre-emption protocols and priority protocols can be added to DRTTPS to test and analyze their performance. Its discrete event simulation engine consists of tick (simulation clock) , entities, events, event queue, event scheduler and event processor. A tick is a unit used in the simulator to measure discrete amount of time. Events are created by entities and inserted in the event queue based on their execution time [5]. Examples of events in DRTTPS include: sending a message from one node to another, transaction arriving at a node and transaction committing. Event scheduler extracts an event from the top of the event queue and calls the event processor. Once the event is processed completely, the state of the system gets updated. Different simulator modules and their vital features are explained in detail in the following subsections. 3.1.1 Network Architecture Network is the core component of DRTTPS , having one or more sites connected to each other through wide area network. Each site can have many nodes, and each node can have one or more real-time databases as shown in Figure 3.1. Nodes are connected to each other through local area network and communicate through messages using network connections. These network connections are called links (pipes) , which are bi-directional in nature. Each node has its own routing table, where the routing table stores paths to all the receivable destination nodes. If a link gets congested due to 32 heavy workload, the routing protocol updates the routing table to deliver the message in a timely manner . •••• Bus Topology Site A • WAN Tree Topology Ring Topology Figure 3.1: Network Architecture [5] Each link or network connection has the following components: • Source Node: node initiating network connection. • Destination Node: final node receiving the messages. A message originating from source node can travel through one or more intermediate nodes to reach its final destination node. • Bandwidth: the maximum capacity (number of messages) which can be t ransferred from one node to another in a particular time period. Bandwidth is one of the major factors that affects network performance. 33 • Latency: time taken by a message to travel from one node to another. High latency network connection suffers long delays. • Queue Length: number of messages waiting on link's queue to get processed. When bandwidth is fully utilized, messages start queuing. Once queue length starts increasing, network delay is caused. Nodes' connections to each other and data flowing within a network are determined by the network topology. Some of the network topologies implemented in DRTTPS are ring, star, tree, fully connected and hypercube. 3.1.2 Node Architecture A node represents a single computer, which can perform computations and communicate with other nodes via messages. Messages can be sent directly to the neighbouring nodes, whereas the messages have to pass through some intermediate nodes to reach non-neighbouring nodes. A node consists of many hardware components such as processor manager, disk manager, buffer and swap disk. These components are explained as follows: • Processor manager manages all the processors in a node. A node can have more than one processor, and each processor can process one page at a time. The number of ticks taken by a processor to process one page is called process time. More than one page can be processed at a particular instance, if hyperthreading feature is enabled. • Disk manager manages the disks of a node. Each disk stores specified set of pages. A disk can read or write one page at a time. The ticks taken by a disk to read or write a page is called access time. The property that enables a disk to distribute data as partitioned or replicated is called data distribution (explained in section 1. 2). 34 • Buffer represents the memory storage of a node which temporarily stores data (pages). The page limit of the buffer depends on the buffer size attribute. If a page is available in the buffer, transaction accesses the page directly from the buffer instead of reading from the disk. • Swap disk represents the virtual memory of a node. It stores pages when buffer is full. A node may also have a workload generator which controls the workload of the system. The system's workload can be controlled by varying its various attributes, such as inter-arrival time, slack time, work-size, update percentage and workload size. Any distribution can be selected for these attributes as shown in Figure 3.2. Arrival time represents the inter-arrival time, which is the time difference between the two transactions arriving at a node. Low inter-arrival time results in high system load because many transactions enter the system within a short duration. The workload generator creates transactions with associated deadlines to complete the operations. Each transaction processes the pages to perform read or write operations. The number of pages accessed by a transaction depends on an attribute called worksize. After processing, pages are updated on the disk; determined by an attribute called update percent (percentage of transaction's operations required to be updated on the disk). An extra time called slack time is allocated to the transaction for completing its operations. The total number of transactions generated by the workload generator depends on the workload size attribute. When a transaction gets delayed (even after passing the slack time) because of inevitable problems like deadlock and congestion, it may get aborted. There is a transaction time-out attribute, which decides transaction's maximum waiting time limit during its execution. Another node attribute called maximum active transactions represents the maximum number of transactions which can run simultaneously on a node. When multiple transactions are generated by different nodes ' workload generators, numerous problems (such as concurrent access, resource sharing, and congestion) 35 rn DRTIPS Setup Tool FHe Run Setup Options c:J Site 0 f ·· c:J Variation Container O 9 c:J Variati on 0 ,·-1- : D Connections I' c:J Node O <>- c:J Processor Manager <>- c:J Disk Manager 0 D Workload Generator D BufferO D Swap Disk O ' <>- c:J Node 3 ___j M ·····e···a "......'...1.1..5.....................................................................................'......• ~ Slack ·-·············-································-···········································-, ~~ c:J Node 4 o- c:J Node 5 o- c:J Node 6 Type <>- c:J Node 7 I• I junlform Seed : -26 43071 645 542540204 ..............................J • ,, From '676 To j 2028 ..J - I I I _:_H [2 I To : 12 I From I ii. ..................... ' _.... ~ -: :-~-1:_2_72-69- 5-98-80_2_83_ _ _1 : ·o H Seed : -346 19145466 428 4365 3 <>- c:J Node 1 ~ c:J Node 2 ~ b Workload Generator 0 ,·Arrival -·---····-------··-··--o Ir- Figure 3.2: Workload Generator Attributes can arise . To handle these problems various protocols are implemented in DRTTPS at t he node level. Each node can run t heir own protocols based on t heir simulation environment. Some of t hese protocols are discussed hereunder: 3.1.2.1 Concurrency Control Protocol Concurrency control protocols are used to handle simultaneous access of database by different users [1 3]. The aim is to coordinate concurrent access to t he database system. These protocols control t he transaction's requests for locks on pages. Transactions need a shared lock (or exclusive lock) to read/ access t he pages, and an exclusive lock is required to perform t he write operation on t he pages. There are various concurrency control protocols implemented in DRTTPS such as greedy locking, greedy 36 locking all copies, speculative locking and adaptive speculative locking [53, 54]. Greedy locking all copies protocol has been used in this research to handle concurrency. In this protocol, when a transaction starts its execution, it requests locks on all the pages which will be required during its lifetime . The transaction does not start its operation until all the locks have been granted to it. Locks are released when the transaction is completed. Greedy locking and greedy locking all copies are similar, except for the fact that the later protocol supports replication across nodes. 3.1.2.2 Preemption Protocol When two or more transactions try to access the same page, priority inversion may occur [1]. Priority inversion happens when a high priority transaction waits for low priority transaction to commit. Preemption protocols handle these scenarios by controlling the preemption of transactions. The preemption protocols implemented in DRTTPS are listed below: • High Priority Preempts: A transaction holding a lock on page can be preempted only if a transaction trying to preempt has higher priority. • Never Preempts : No preemption will happen. • Priority Inheritance: Priority inheritance is a method for avoiding priority inversion. When a transaction with high priority waits for the transaction having low priority, low priority transaction inherits the priority of high priority transaction in order to complete itself instead of being aborted. 3.1.2.3 Routing Protocols In a network, nodes communicate with each other via messages. Routing algorithms determine the route of a message from one node to another, according to the network topology. In a fully connected network, routing is simple because there is a direct path from one node to the other. But, topologies like hypercube, ring and tree 37 have intricate routing due to indirect paths between nodes. Messages from t he source node (node sending messages) are routed to intermediate nodes before reaching the final destination node (node receiving messages), and these paths are determined by the routing algorithms. We have selected DSP and RIP for comparison with our proposed NNPR protocol. DSP is a classical routing protocol, whereas RIP is widely used in today's distributed networks. These protocols are described below: • Dijkstra's Shortest Path (DSP) [26] - Dijkstra's Shortest Path Algorithm computes the shortest path between two nodes. The cost of the route is the sum of the latencies of all the intermediate routes from source to destination node. For example, in Figure 3.3 there are many paths from Node 1 to Node 8, but the shortest path between these nodes is [1-2-5-8] with a total cost of 4. The routing table format for Node 1 is shown in Table 3.1. In DSP algorithm, once the routing table is set-up it never changes during the simulation run. The cost of the route is not affected by the amount of congestion. Figure 3.3: Shortest path between Node 1 and 8 using DSP Algorithm • Routing Information Protocol (RIP) - RIP is a distance vector routing protocol [27] in which each node shares routing information with its neighbour38 Table 3.1: Routing table for Node 1 Source Node Destination Node Shortest path Cost 1 2 [1-2] 1 1 3 [1-3] 2 1 4 [1-4] 2 1 5 [1-2-5] 3 1 6 [1-3-6] 3 1 7 [1-4-7] 4 1 8 [1-2-5-8] 4 ing nodes at set time intervals. In turn , the neighbours exchange t he routing information with their nearest neighbours, and so on, until all the nodes have the same routing information of the network (this state is known as convergence). While generating a simulation with RIP, a parameter called updateTime defines how often each node sends its entire routing information to t he neighbouring nodes. Unlike DSP, RIP takes into account both congestion and latency metrics while updating the routing tables. In RIP, when a path becomes congested, the network does not discover it immediately due to the slow convergence. Another disadvantage of RIP is the count to infinity problem. It happens when a network link breaks, and nodes mislead each other by calculating the shortest path to infinity (a node sends the wrong and outdated shortest path calculation to another node, which propagates through the entire network and reaches infinity). • Neural Network Prediction-based Routing protocol (NNPR) - This is the main contribution of this thesis and has been discussed in detail in Chapter 4. 39 3.1.3 Simulation Set-up DRTTP S's set-up tool provides t he user with a graphical user interface to configure and simulate a real-t ime distributed database system. It involves setting up site components, network architecture, node structure and other simulation settings. The user can save t he configured settings and run t he simulation from t he set-up interface. It has another important component called variation container, which allows t he user to run and compare many simulations simultaneously. After setting up t he required parameters , variation tab can be opened to vary t he desired parameters. For example, in Figure 3.4, a variation is created by varying t he bandwidt h parameter. Many variations can be generated through t he variation tab. C.iJ AutotMtic. Simulation