Various Approaches for Dynamic Load Balancing for Multiprocessor Interconnection Network

Multiprocessor interconnection network have become powerful parallel computing system for real-time applications. Now a days the many researchers doing research on the dynamic load scheduling in multiprocessor system. Load balancing is the method of dividing the total load among the processors of the distributed system to progress task's response time as well as resource utilization whereas ignoring a condition where few processors are overloaded or under loaded or moderately loaded. However, in dynamic load balancing algorithm presumes no priori information about behavior of tasks or the global state of the system. There are numerous issues while designing an efficient dynamic load balancing algorithm that involves utilization of system, amount of information transferred among processors, selection of tasks for migration, load evaluation, comparison of load levels and many more. This paper enlightens the performance analysis on dynamic load balancing strategy


Introduction
In computing, load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives [1]. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process.
When adaptive algorithms are used, after an interval of computation, the mesh may be refined (or coarsened) at some locations, usually based on an estimate of the discretization error. The refinement (or coarsening) process can generate widely varying numbers of mesh nodes on the processors. Subsequently, there is a need for dynamic load balancing. Load imbalance may also be caused by the use of local time stepping, local spatial approximation schemes of varying orders [2], or non-linear material properties.
Load balancing differs from channel bonding in that load balancing divides traffic between network interfaces on a network socket (OSI model layer 4) basis, while channel bonding implies a division of traffic between physical interfaces at a lower level, either per packet (OSI model Layer 3) or on a data link (OSI model Layer 2) basis with a protocol like shortest path bridging.
Sender Initiated Diffusion (SID)' is a highly distributed local approach which makes use of nearneighbor load information to apportion extra load from heavily loaded processors to underloaded neighbors in the system. Receiver Initiated Difision (RID) is the converse of the SID strategy, where underloaded processors requisition load from heavily loaded neighbors. Hierarchical Balancing Method (HBM) is an asynchronous, global, approach which organizes the system into a hierarchy of subsystems. Load balancing is initiated at the lowest levels in the hierarchy with small subsets of processors and ascends to the highest level which encompasses the entire system. Gradient Model (GM) [6,15]employs a gradient map of the proximities of underloaded processors in the system to guide the migration of tasks between overloaded and underloaded processors. Dimension Exchange Method (DEM) [12,14], is a global, fully synchronous, approach. Load balancing is performed

II. Example of Dynamic Load Balancing
As a simple example, figure 1 shows a mesh of shape ``A'', partitioned into 8 subdomains. It has been refined in Figure 1 (b). Due to the mesh refinement subdomain 1 has more nodes than the other subdomains.
In general dynamic load balancing algorithms should satisfy the following objectives: 1.Re-balance the load of each processor with speed and scalability.
2.Minimize the edge-cut (or more precisely, the communication cost of the application after the rebalance) In order to satisfy the first objective, the dynamic load balancing algorithm should not only identify what to migrate efficiently, the amount of data required to be migrated should also be kept to a minimum. Various metrics such as TotalV and MaxV [6] have been used to model and minimize the data migration cost.
One way to re-balance the load is to repartition the mesh using one of the partitioning algorithms . Indeed parallel algorithms such as JOSTLE or ParMETIS are able to partition large mesh very rapidly [7]. For example, ParMETIS was able to partition a mesh of the order of 1 million nodes in less than 2 seconds on 128 PEs of a Cray T3D [8]. However it is important, but difficult, to ensure that the new partitioning will be ``close'' to the original partitioning. Should the new partitioning deviate considerably from the old one then the cost of transferring large amounts of data will be incurred [8]. It has been found that repartitioning is more appropriate when there has been a substantial localized refinement on the mesh An alternative strategy is to migrate the excessive nodes to neighboring processors, effectively shifting the boundaries to achieve a balanced load. This approach may potentially cause less movement of data than repartitioning, although the edge-cut after the migration could possibly be larger than that given by a global repartitioning. Therefore care must be taken to keep edge-cut down when choosing the nodes to be migrated. It has been found [12] that this strategy is more suitable when the load imbalances caused by the refinement are low, or when localized high imbalances occur throughout the mesh. This is because in such cases the optimal partition will be relatively close to the initial partition.

III Categories of Dynamic Load Balancing
A. Client-side random load balancing Another approach to load balancing is to deliver a list of server IPs to the client, and then to have client randomly select the IP from the list on each connection. This essentially relies on all clients generating similar loads, and the Law of Large Numbers to achieve a reasonably flat load distribution across servers. It has been claimed that client-side random load balancing tends to provide better load distribution than round-robin DNS; this has been attributed to caching issues with round that in case of large DNS caching servers, tend to skew the distribution for round-robin DNS, while client-side random selection remains unaffected regardless of DNS caching.
With this approach, the method of deli IPs to the client can vary, and may be implemented as a DNS list (delivered to all the clients without any round-robin), or via hard coding it to the list. If a "smart client" is used, detecting that randomly selected server is down and connecting randomly again, it also provides fault tolerance.

B. Server-side load balancers
For Internet services, server-side load balancer is usually a software program that is listening on the port where external clients connect to access services. The load balancer forwards requests to one of the "backend" servers, which usually replies to the load balancer. This allows the load balancer to reply to the client without the client ever knowing about the internal separation of functions. It also prevents clients from contacting back-end servers directly, which may have security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports. Some load balancers provide a mechanism for doing something special in the event that all backend servers are unavailable. This might include forwarding to a backup load balancer, or displaying a message regarding the outage.

A. Diffusion algorithm
One of the most popular approaches to the flow calculation problem is to use diffusion based algorithms [6,14]. In a heat diffusion process, the initial uneven temperature distribution in space causes the movement of heat, and the system eventually reaches a steady-state temperature.
The diffusion algorithm, as described in [ as follows. At each iteration k+1 of the algorithm, processor will send an amount proportional to the difference between its load and its neighbor's load, , to its neighbor j Assume International Journal of Trend in Scientific Research and Development, Volume 1(4), ISSN: 2456 robin DNS; this has been ed to caching issues with round-robin DNS, that in case of large DNS caching servers, tend to robin DNS, while side random selection remains unaffected With this approach, the method of delivery of list of IPs to the client can vary, and may be implemented as a DNS list (delivered to all the clients without any coding it to the list. If a "smart client" is used, detecting that randomly necting randomly side load balancer is usually a software program that is listening on where external clients connect to access balancer forwards requests to one of the "backend" servers, which usually replies to the load balancer. This allows the load balancer to reply to the client without the client ever knowing about the internal separation of functions. It also prevents end servers directly, which may have security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated mechanism for doing something special in the event that all backend servers are unavailable. This might include forwarding to a backup load balancer, or displaying a message

IV Types of Dynamic Load Balancing Algorithm
One of the most popular approaches to the flow calculation problem is to use diffusion based ]. In a heat diffusion process, the initial uneven temperature distribution in space causes the movement of heat, and the system eventually e diffusion algorithm, as described in [6], is given of the algorithm, will send an amount proportional to the s load and its neighbor's j Assume Cij=Cji, the new load li (k+1) of the processor combination of its own load from/to its neighboring vertices, namely Initially the load for vertex form, the above equation can be rewritten as where is a diagonal matrix of the size that consists of the coefficients of size , to be defined in Section choice of the coefficients, Boillat [ The diffusion algorithm, being method of the form (11), can converge quite slowly on graphs with small connectivity. Boillat [ that the worst case happens when the graph is, say, a line, and in such a case the number of iterations needed to reach a given tolerance is number of vertices. There are other variations of the diffusion algorithm ([]).A special case of the following equation, is solved in []. The convergence of the diffusion algorithm can also be improved using the Chebyshev polynomial [9]. Many investigations of dynamic load balancing algorithms have used a diffusive approach, although the details vary. For example, in both the tiling algorithm and the iterative tree balancing algorithm [11], a processor selects amongst its neighbors the one with the highest load and posts a request. In the tiling algorithm the amount of load t be sent is decided by looking at the average of the loads in the neighborhood. In the iterative tree balancing algorithm the requests are viewed as a forest of trees. The flow along the branches of the tree is then calculated using a logarithmic time par scan operation.
There are two type of diffusion algorithm 2456-6470 www.ijtsrd.com 596 of the processor is given by the combination of its own load li (k) and contributions from/to its neighboring vertices, namely (1) is . In matrix form, the above equation can be rewritten as (2) is a diagonal matrix of the size , that consists of the coefficients Cij, and is a matrix , to be defined in Section 4.3.4. For the choice of the coefficients, Boillat [6] suggested The diffusion algorithm, being a stationary iterative ), can converge quite slowly on graphs with small connectivity. Boillat [6] proved that the worst case happens when the graph is, say, a line, and in such a case the number of iterations needed to reach a given tolerance is O(p 2 ) with p, the number of vertices. There are other variations of the ([]).A special case of the is solved in []. The convergence of the diffusion algorithm can also be improved using the Chebyshev Many investigations of dynamic load balancing algorithms have used a diffusive approach, although the details vary. For example, in both the tiling algorithm and the iterative tree balancing algorithm [11], a processor selects amongst its neighbors the one with the highest load and posts a request. In the tiling algorithm the amount of load to be sent is decided by looking at the average of the loads in the neighborhood. In the iterative tree balancing algorithm the requests are viewed as a forest of trees. The flow along the branches of the tree is then calculated using a logarithmic time parallel There are two type of diffusion algorithm

Sender Initiated Diffusion (SID)
The SID strategy is a, local, near-neighbor approach which employs overlapping balancing domains to achieve global balancing. for an processor system with a total system load diffusion approach, such as the SID strategy, will cause each processor's load to converge to [1,7,8] Balancing is performed by each processor whenever it receives a load update message from a neighbor indicating that the neighbors load, 1 i <Ideal Load where Ideal Load is a preset threshold. Each processor is limited to load information from within its own domain, which consists of itself and its immediate neighbors First, the balancing process is initiated by any processor whose load drops below a pre specified threshold ( L L o w (2) Second, upon receipt of a load request, a processor will fulfill the request only up to an amount equal to half of its current load (3) The RID strategy differs from its counterpart SID in the task migration phase. Here, an underloaded processor first sends out requests for load and then receives acknowledgment for each request

B. Dimension Exchange Algorithm
Cybenko suggested a dimension exchange algorithm, in which the edges of the graph are colored so that no two edges of the same color share a vertex. Pairs of processors having the same color were grouped and a processor pair (i, j) with load l i and lj exchange their load, after which each has the load algorithm was proved to converge in graph considered was a hypercube with dimension Xu and Lau [10] extended the dimension exchange algorithm so that after the exchange processor load li*a+lj*(1-a). If a=0.5 this is equivalent to Cybenko's algorithm. Based on an eigen value analysis of the underlining iterative matrices, they argued that for some graph a factor than 0.5 gives better convergence. On a graph with small connectivity, this algorithm suffers in International Journal of Trend in Scientific Research and Development, Volume 1(4), ISSN: 2456 neighbor diffusion approach which employs overlapping balancing domains to achieve global balancing. for an N processor system with a total system load L, a diffusion approach, such as the SID strategy, will cause each processor's load to converge to L/N.
Balancing is performed by each processor whenever it receives a load update message from a neighbor <Ideal Load , preset threshold. Each processor is limited to load information from within its own domain, which consists of itself and its First, the balancing process is initiated by any processor whose load drops below a L o w ) . [7,8] a load request, a processor will fulfill the request only up to an amount equal to half of its current The RID strategy differs from its in the task migration phase. Here, an underloaded processor first sends out requests for load and then receives acknowledgment for each request dimension exchange algorithm, in which the edges of the graph are colored so that no two edges of the same color share a vertex. Pairs of processors having the same color were grouped and a exchange their ter which each has the load (li+lj)/2. The steps if the graph considered was a hypercube with dimension d. Xu and Lau [10] extended the dimension exchange algorithm so that after the exchange processor has this is equivalent to Cybenko's algorithm. Based on an eigen value analysis of the underlining iterative matrices, they argued that for some graph a factor other gives better convergence. On a graph with is algorithm suffers in convergence in the same way as the diffusion algorithm.

C. Multilevel Algorithm
To speedup the diffusion algorithm, Horton [] suggested a multilevel diffusion method. The processor graph was bisected and the load imbalance between the two subgraphs was determined and transferred. This process was repeated recursively until the subgraphs could not be bisected any more. The advantage of the algorithm is that it is guaranteed to converge in log(p) bisections, and the final load will be almost exactly balanced even if the loads are integers. However, because it is not always possible to bisect a connected graph into two connected subgraphs, it was not clear from the paper how to proceed for such a case. Connectivity can of course be restored by adding new edges to a disconnected subgraph. However this is equivalent to moving data between non-neighboring processors and should be avoided.
Linear To speedup the diffusion algorithm, Horton [] suggested a multilevel diffusion method. The processor graph was bisected and the load imbalance the two subgraphs was determined and transferred. This process was repeated recursively until the subgraphs could not be bisected any more. The advantage of the algorithm is that it is guaranteed bisections, and the final load almost exactly balanced even if the loads are integers. However, because it is not always possible to bisect a connected graph into two connected subgraphs, it was not clear from the paper how to proceed for such a case. Connectivity can of course be ored by adding new edges to a disconnected subgraph. However this is equivalent to moving data neighboring processors and should be Here is the number of subdomain that may be moved to sub domain using a node selection strategy based on layering (see the next section). This linear programming problem was solved using the simplex method to give the flow. The problem has 2|E| variables and |V|+|E| A multilevel approach was used to group subdomains into super-partitions, thereby breaking the linear programming problem into smaller ones to be solved by subsets of processors. This reduced the overall complexity of solving the linear problem.

D. Hierarchical Balancing Method (HBM)
It is an asynchronous global, approach which organizes the system into a hierarchy of subsystems. [1,7] • Load balancing is initiated at the lowest levels in the hierarchy with small subsets of p ascends to the highest level which encompasses the entire system. • Specific processors are designated to control the balancing operations at different levels of the hierarchy.
The hierarchical balancing scheme functions asynchronously. The balancing process is triggered at different levels in the hierarchy by the receipt of load International Journal of Trend in Scientific Research and Development, Volume 1(4), ISSN: 2456 Similar linear programming based flow calculation is the number of vertices on that may be moved to sub domain j, using a node selection strategy based on layering (see the next section). This linear programming problem was solved using the simplex method to give the flow.
|+|E| constraints. A multilevel approach was used to group subdomains partitions, thereby breaking the linear programming problem into smaller ones to be solved by subsets of processors. This reduced the overall complexity of solving the linear programming

D. Hierarchical Balancing Method (HBM)
It is an asynchronous global, approach which organizes the system into a hierarchy of subsystems. [1,7] • Load balancing is initiated at the lowest levels in the hierarchy with small subsets of processors and ascends to the highest level which encompasses the entire system. • Specific processors are designated to control the balancing operations at different levels of eme functions balancing process is triggered at different levels in the hierarchy by the receipt of load update messages indicating an imbalance lower level domains. All load levels are initialized with each processor sending its load information up the tree

E. The Gradient Model (GM)
The gradient model [5,13] is a demand driven approach .The basic concept is that underloaded processors inform other processors in the system of their state, and overloaded processors respond by sending a portion of their load to t loaded processor in the system. This model employs a gradient map of the proximities of underloaded processors in the system to guide the migration of tasks between overloaded and underloaded processors.
The resulting effect is a form of tasks migrating through the system are guided by the proximity gradient and gravitate points. The scheme is based on two threshold parameters: the Low-Water-High-Water-Mark (HWM). A processor's stat considered light if its load is below the LWM, heavy if above the HWM, and moderate otherwise.

F. Central Queue Algorithm:
This algorithm stores new activity and unfulfilled requests in a cyclic FIFO queue. Each new activity is inserted in the queue. Then, whenever a request for an activity is received the first activity is removed from the queue. If there is not any requested activity in the queue then the request is buffered until a new activity is available. This is a centralized initiated algorith and need high communication among nodes.

G. Local Queue Algorithm:
This algorithm supports inter process migration. This idea is static allocation of all new process with process migration initiated by the host when its load falls under the predefined minimum number of ready processes. When the host gets under load it request for the activities from the remote hosts. The remote hosts than look up its local list for ready activities and some of the activities are passed on to the requestor host and get the acknowledgement from the host. This is a distributed co-operative algorithms requires inter process communication but lesser as compared to central queue algorithm. 2456-6470 www.ijtsrd.com 598 update messages indicating an imbalance between All load levels are initialized with each processor sending its load information up

The Gradient Model (GM)
The gradient model [5,13] is a demand driven approach .The basic concept is that underloaded processors inform other processors in the system of their state, and overloaded processors respond by sending a portion of their load to the nearest lightly loaded processor in the system. This model employs a gradient map of the proximities of underloaded processors in the system to guide the migration of tasks between overloaded and The resulting effect is a form of relaxation where tasks migrating through the system are guided by the proximity gradient and gravitate towards underloaded points. The scheme is based on two threshold -Mark (LWM) and the Mark (HWM). A processor's state is considered light if its load is below the LWM, heavy if above the HWM, and moderate otherwise.

F. Central Queue Algorithm:
This algorithm stores new activity and unfulfilled requests in a cyclic FIFO queue. Each new activity is Then, whenever a request for an activity is received the first activity is removed from the queue. If there is not any requested activity in the queue then the request is buffered until a new activity is available. This is a centralized initiated algorithm and need high communication among nodes.
This algorithm supports inter process migration. This idea is static allocation of all new process with process migration initiated by the host when its load minimum number of ready processes. When the host gets under load it request for the activities from the remote hosts. The remote hosts than look up its local list for ready activities and some of the activities are passed on to the requestor host he acknowledgement from the host. This is a operative algorithms requires inter process communication but lesser as compared to IJTSRD | May-Jun 2017 Available Online @www.ijtsrd.com H. Least Connection Algorithm : This algorithm decides the load distribution on the basis of connections present on a node. The load balancer maintains the log of numbers of connections on each node. The number increases when a new connection is established and decreases when connection finishes or time out. The nodes with least number of connections are selected first.

V Features of Load Balancer
Hardware and software load balancers may have a variety of special features. The fundamental feature of a load balancer is to be able to distribute incoming requests over a number of backend servers in the cluster according to a scheduling algorithm. Most of the following features are vendor specific: Asymmetric load: A ratio can be manually assigned to cause some backend servers to get a greater share of the workload than others. This is sometimes used as a crude way to account for some servers having more capacity than others and may not always work as desired.
Priority activation: When the number of available servers drops below a certain number, or load gets too high, standby servers can be brought online.
SSL Offload and Acceleration: Depending on the workload, processing the encryption and authentication requirements of an SSL request can become a major part of the demand on the Web Server's CPU; as the demand increases, users will see slower response times, as the SSL overhead is distributed among Web servers. To remove this demand on Web servers, a balancer can terminate SSL connections, passing HTTPS requests as HTTP requests to the Web servers. If the balancer itself is not overloaded, this does not noticeably degrade the performance perceived by end users. The downside of this approach is that all of the SSL processing is concentrated on a single device (the balancer) which can become a new bottleneck. Some load balancer appliances include specialized hardware to process SSL. Instead of upgrading the load balancer, which is quite expensive dedicated hardware, it may be cheaper to forgo SSL offload and add a few Web servers. Also, some server vendors such as Oracle/Sun now incorporate cryptographic acceleration hardware into their CPUs such as the T2000. F5 Networks incorporates a dedicated SSL acceleration hardware card in their local traffic manager (LTM) which is used for encrypting and decrypting SSL traffic. One clear benefit to SSL offloading in the balancer is that it enables it to do balancing or content switching based on data in the HTTPS request.
Distributed Denial of Service (DDoS) attack protection: load balancers can provide features such as SYN cookies and delayed-binding (the back-end servers don't see the client until it finishes its TCP handshake) to mitigate SYN flood attacks and generally offload work from the servers to a more efficient platform.
HTTP compression: reduces amount of data to be transferred for HTTP objects by utilizing gzip compression available in all modern web browsers. The larger the response and the further away the client is, the more this feature can improve response times. The tradeoff is that this feature puts additional CPU demand on the load balancer and could be done by web servers instead.
TCP offload: different vendors use different terms for this, but the idea is that normally each HTTP request from each client is a different TCP connection. This feature utilizes HTTP/1.1 to consolidate multiple HTTP requests from multiple clients into a single TCP socket to the back-end servers.
TCP buffering: the load balancer can buffer responses from the server and spoon-feed the data out to slow clients, allowing the web server to free a thread for other tasks faster than it would if it had to send the entire request to the client directly.
Direct Server Return: an option for asymmetrical load distribution, where request and reply have different network paths.
Health checking: the balancer polls servers for application layer health and removes failed servers from the pool.
HTTP caching: the balancer stores static content so that some requests can be handled without contacting the servers.
Content filtering: some balancers can arbitrarily modify traffic on the way through.
IJTSRD | May-Jun 2017 Available Online @www.ijtsrd.com HTTP security: some balancers can hide HTTP error pages, remove server identification headers from HTTP responses, and encrypt cookies so that end users cannot manipulate them.
Priority queuing: also known as rate shaping, the ability to give different priority to different traffic.
Content-aware switching: most load balancers can send requests to different servers based on the URL being requested, assuming the request is not encrypted (HTTP) or if it is encrypted (via HTTPS) that the HTTPS request is terminated (decrypted) at the load balancer.
Client authentication: authenticate users against a variety of authentication sources before allowing them access to a website.
Programmatic traffic manipulation: at least one balancer allows the use of a scripting language to allow custom balancing methods, arbitrary traffic manipulations, and more.
Firewall: direct connections to backend servers are prevented, for network security reasons Firewall is a set of rules that decide whether the traffic may pass through an interface or not.
Intrusion prevention system: offer application layer security in addition to network/transport layer offered by firewall security.

VI.
MODEL FOR DYNAMIC LOAD SCHEDULING APPROACH We have developed a general model for dynamic load balancing.

A. Processor Load Evaluation
• A load value is estimated for each processor in the system.
• These values are used as input to the load balancer to detect load imbalances and make load migration decisions.
B. Load Balancing Profitability Determination: • The imbalance factor quantifies the degree of load imbalance within a processor domain.
• It is used as an estimate of potential speedup obtainable through load balancing • It is weighed against the load balancing overhead to determine whether or not load balancing is profitable at that time.
C. Task Migration Strategy: Sources and destinations for task migration are determined. Sources are notified of the quantity and destination of tasks for load balancing.

D. Task Selection Strategy:
Source processors select the most suitable tasks for efficient and effective load balancing and send them to the appropriate destinations.
• The first and fourth phases of the model are application dependent and purely distributed. Both of these phases can be executed independently on each individual processor. • Our focus is on the Profitability Determination and Task Migration phases, the second and third phases, of the load balancing process • As the program execution evolves, the inaccuracy of the task requirement estimates leads to unbalanced load distributions. • The imbalance must be detected and measured (Phase 2) and an appropriate migration strategy devised to correct the imbalance (Phase 3). • During the Profitability Determination Phase a decision is made as to whether or not to invoke the load balancer. • The load imbalance factor Ф(t) is an estimate of the potential speedup obtainable through load balancing at time t . • It is defined as the difference between the maximum processor loads before and after load balancing, L max and L bal , respectively. Ф(t)= L max -L bal IJTSRD | May-Jun 2017 Available Online @www.ijtsrd.com

VII CONCLUSIONS
In this paper, dynamic load balancing strategies designed to support highly parallel systems have been presented and compared. The different strategies exemplify some of the main issues and tradeoffs that exist in dynamic load balancing, specifically in reference to highly parallel systems. Two major issues, that of load balancing overhead and the degree of knowledge used in balancing decisions were discussed. Also considered were, the concept of balancing domains, the aging of information, and the form of balancing initiation. Of the five strategies proposed, the DEM strategy tended to outperform the rest for all granularities. The efficiency of the DEM and the HBM strategies, depends heavily on the system interconnection topology. The hypercube topology is ideally suited to match these two strategies communication dependencies. Furthermore, the system sizes tested were very small in the context of highly parallel systems. The overhead of synchronization costs [scale as O(NlogN)] for the DEM approach and the aging period and non uniform overhead distributions of the HBM approach may deteriorate their performance when the number of processors is large (1000 processors). The RID strategy, on the other hand, is easily ported to simpler topologies, and can scale gracefully for larger systems. Finally, for a wider variety of applications, exhibiting local communication dependencies between tasks, the RID scheme is able to maintain task locality. Therefore, since its performance was shown to be comparable to those of the DEM and HBM approaches, the RID strategy may be best suited for a broader range of systems supporting a large variety of applications