Problems in Task Scheduling in Multiprocessor System

This contemporary computer systems are multiprocessor or multicomputer machines. Their efficiency depends on good methods of administering the executed works. Fast processing of a parallel application is possible only when its parts are appropriately ordered in time and space. This calls for efficient scheduling policies in parallel computer systems. In this work deterministic problems of scheduling are considered. The classical scheduling theory assumed that the application in any moment of time is executed by only one processor. This assumption has been weakened recently, especially in the context of parallel and distributed computer systems. This monograph is devoted to problems of deterministic scheduling applications (or tasks according to the scheduling terminology) requiring more than one processor simultaneously. We name such applications multiprocessor tasks. In this work the complexity of open multiprocessor task scheduling problems has been established. Algorithms for scheduling multiprocessor tasks on parallel and dedicated processors are proposed. For a special case of applications with regular structure which allow for dividing it into parts of arbitrary size processed independently in parallel, a method of finding optimal scattering of work in a distributed computer system is proposed. The applications with such regular characteristics are called divisible tasks. The concept of a divisible task enables creation of tractable computation models in a wide class of computer architectures such as chains, stars, meshes, hypercubes, multistage networks. Divisible task method gives rise to the evaluation of computer system performance. Examples of such performance evaluation are presented. This work summarizes earlier works of the author as well as contains new original results.


II. PARALLEL COMPUTER SYSTEMS A. Hardware
It is common to start a description of parallel systems with an attempt of classifying types of parallelism and types of parallel machines. A useful view on parallelism types is distinguishing between data parallelism and code parallelism. . Another view of parallel processing classification considers granularity of parallelism. Classical computers execute instructions in the order dictated by the sequence in the program code. This approach is called controldriven or von Neumann architecture.
In control-driven computers have been divided into four classes: SISD (single instruction stream, single data stream), SIMD (single instruction stream, multiple data streams), MISD (multiple instruction streams, single data stream), MIMD (multiple instruction streams, multiple data streams). A multicomputer consists of a set of processors with local memories, interconnected by some kind of network. We will name by processing element (PE) a processor with local memory and a network interface. Tightly-coupled computers can be further dierentiated by the type of PE interconnection. In this work we limit considered interconnection types to: bus(es), point-to-point networks (called also single-stage networks).
In multistage networks PEs are connected by several layers of switches while the internal layer switches have no PEs attached. Multistage networks are divided here into: trees and multistage cube network

B. Software
In many common applications (programs) great potential parallelism can be found. Thus, programs can be executed via many concurrent threads (mutual relations between the notions of an application, a thread and a task). Computer systems should provide support for implementing parallelism of an application including the issues posed by scheduling. Parallel operating systems are evolving from previously existing systems and many ideas have been "naturally" inherited. Based on acceptable response time two load types have been distinguished in singleprocessor systems: terminal (or interactive) and batch load. Since batch tasks are submitted to the computer system far earlier than their actual execution begins, deterministic scheduling algorithms can be applied. For the terminal load which requires immediate response, access to processors is granted on the basis of FCFS, Round-Robin, multi-level priority queues etc

III. COMMUNICATION MODE
The next element of the architecture is the commutation mode. The commutation mode is a physical protocol for message routing. We describe commutation modes here because routing functions are increasingly executed by dedicated hardware. The methods we refer to in this section are also called switching or routing techniques. Among various commutation (or routing) modes we distinguish storeand-forward, circuit-switched and packet-switched.
In the store-and-forward mode when a PE sends a message to another PE located at distance d, the message (either as whole or in packet pieces) is sent to the closest PE on the path and it is stored there.
In the circuit-switched mode from the transmitter to the receiver a header of the message is sent which reserves all the links of a communication path to form a circuit between both PEs.
In the packet-switched modes the message is split into packets which consist of its. Flits are also called ow of control digits. These are words passed over a link in one control cycle.
Among the packet-switched modes three sub-types can be identified: wormhole, virtual-cut-through and buered-Wormholemodes. These modes differ in the behavior of its and packets when the packet cannot move forward (e.g. there is no free link). In the wormhole mode, the progressing of the message in the pipe is stopped. All the its remain in the intermediate buffers thus blocking the links.
In the virtual-cut-through mode its continue progressing on their way until reaching the site where the rst it is stopped. There a whole packet is waiting for release of the link. This mode assumes infinite capacity of buffers In the buffered wormhole mode, its of some packets move until they reach the stopped it, then the whole packet is stored there. Yet, the number of packets that can be stored is limited by buyer's capacity.

IV.
FORMULATION OF TASK SCHEDULING PROBLEM A task scheduling problem consists of the application model, system computing model and performance evaluation metrics. This section will discuss an application model, a system computing model followed by performance evaluation metrics.

A. Application model
Task scheduling of a given application is represented by directed acyclic graph (DAG) G1= (T, E), where T is the finite set of m tasks {T1, T2, T3…Tm} and E is the set of edges {eij }between the tasks Ti and Tj. Here, each edge represents the precedence constraints between the tasks Ti and Tj such that Tj can not start until Ti completes its execution. Each task Ti is associated with an execution time ET (Ti) and each edge eij is associated with a communication time CT (Ti, Tj) for data transfer from Ti to Tj. If there is a direct path from Ti to Tj then Ti is the predecessor of Tj and Tj is the successor of Ti. A entry task does not any predecessor, similarly, an exit task does not any successors. Layout of a DAG with six tasks is shown in figure 4 [16] Where T1, T2, T3, T4, T5 and T6 are different tasks of the given DAG.

B. Algorithm
Rules Design objectives of the algorithm: • Able to dynamically regulate task allocation to each processor according to changes in the system load, optimize the utilization of processor resources • Minimize the effect on the working performance of the processor. The node processor in the algorithm is defined into the four states as follows: R I: Suppose A1, A2, B1 and B2 represent the processor node set in heavy load, busy, light load and no load states respectively, load balancing is carried out in following steps: • Equilibrium operation is conducted between the heavy loaded node and light loaded node • Equilibrium operation is conducted between the busy node and the no load node when there is no heavy load Node. • Repeat steps 1 and 2 until no migratable task can be selected or the loads on the processor Nodes are relatively balanced. R II: The execution time of the parallel program is always restricted by the task at the slowest execution IJTSRD | May-Jun 2017 Available Online @www.ijtsrd.com speed L denotes the load. So when load L moves from A1 to B1.

C. Load Evaluation
It's common to evaluation node load by the array length of CPU. And this load evaluation mode is concise and rapid .In order to evaluation load state of each node; we let character C to represent process capability of each node in homogeneous multiprocessor system. Namely, it's maximum process number that can be processed by CPU in per unit time.

V. CONCLUSIONS
In this work we considered selected methods of scheduling in multiprocessor computer systems. With the advent of modern computer systems it turned out that classical scheduling methods in many cases are not satisfactory Therefore, two new scheduling models were analyzed here: multiprocessor tasks and divisible tasks. Multiprocessor tasks require several processors simultaneously, thus allow for expressing task parallelism at high level of abstraction. This model allows for finding simple solutions of problems which in other setting are again intractable. Moreover, divisible task concept permits introducing computer architecture context which in the classical approach is often highly generalized to make problems manageable. In this way scheduling problems have been combined with communication optimization problems. Experimental simulation has shown that the algorithm can solve very well the load balancing problems in a multiprocessor system and significantly improve the system performance and task processing. The proposed algorithm is characterized by less load balancing frequency and less overhead. It can be seen from the results of the experiment that use of the algorithm proposed in the paper can get good performance