15: Parallel Programming Problems

Lecture

Parallel programming for about 40 years is a promising direction. This is always a new topic in programming. Such a prolonged infancy suggests that all is not well with this child. Therefore, here we first of all critically analyze the basic concepts that need to be mastered, and only then sit down to encode a parallel program.

Natural parallelism of algorithms

When considering the programming styles, it turned out that often the linear order of execution of program operators is imposed on it from the outside and serves only as a support necessary for implementation in a particular system.

Let us turn to the classical examples, when the computational algorithm is well parallelized.

Example 15.1.1 . Matrix multiplication is a case in which sufficiently broad masses of programmers and customers of computational programs have realized the potential advantage of parallelization. If we divide the matrix of the product k * mxl * n into submatrices of size mxn, then to calculate each such submatrix it is sufficient to have m rows of the first factor and n columns of the second. By separating the computations by independent processors, we can at the cost of some duplication of the original data get the result much faster.

Similarly, the problem of solving a system of linear equations is parallelized.

Example 15.1.2 . Let the task be represented as a set of several interacting automata (thus, in principle, it fits into the framework of automaton programming, but a single system is structured into subsystems, and most of the actions work within subsystems). Then we have a natural parallelism.

These are many of the game tasks in which several characters are involved, and most parsing tasks are the same.

Already in these two examples it is clear that there are fundamentally different cases. In the first example, the tasks are indifferent to how the result will be calculated. In such a case, we use for parallelizing the properties of a specific computational algorithm, which has many potentially computational structures that are potentially suitable for implementing information interconnections in the corresponding data network (sequential necessarily among them).

In the second example, parallelism is naturally dictated by the task itself, since the system is divided into subsystems. I hope you are not surprised by the fact that in this case, the natural from a theoretical point of view, the solution is almost almost never feasible. The current parallel systems have a fixed (usually small) number of processors, and if there are many processors, they are tied into a rigid structure. Therefore, tasks with natural parallelism are very rarely placed in the Procrustean bed of such a system. It is much easier to push there something that is not parallel in nature, but can be parallelized, since in this case the structure of the implementation of the algorithm can be tailored to the requirements of a particular system.

Thus, the main problem with concurrency is that the programming and architecture of machines come into a conceptual contradiction .

In life, each of you met with the simplest type of parallelism: a long and relatively autonomous printing process starts, as a rule, in parallel with the execution of other programs. The case when you simultaneously run several programs, as a rule, is not a real parallelism (this is discussed below).

Types of concurrency

The result of the team is the result of the participant who came last.

(from the rules of competitions in military-applied sports)

- Doctor, what am I sick with?

- Do not worry, sick, an autopsy will show.

(Russian joke)

The type of parallelism that arises when matrixes are multiplied is natural for structured programming, but in fact is not parallelism. In structured programming, data is organized into a network over which a program moves. This network is partially in order, and if there are several subnets in the network that are weakly connected to each other and pass through a certain segment of the source network from beginning to end, then these subnets can be executed on independent calculators that interact with each other at critical times. Usually such a data network partitioning is not unique, and therefore it is possible to execute one and the same algorithm on parallel systems of different architecture.

Example 15.2.1 . Let the data network have the following form (Fig. 15.1)

  15: Parallel Programming Problems

Fig. 15.1. Fork

Then it is natural to execute it on a one-, two- or three-processor machine. In the case of a multiprocessor machine at the beginning and at the final stage, the calculation goes sequentially, and the three branches are distributed between the processors. In particular, if two side branches take less time to count than the main branch, it is possible to distribute both on the same processor on a two-processor machine.

In the parallelized program, the final stage of execution begins only after all three branches, which provide the initial data for it, are completed. Such parallelism, when all parallel branches need to be successfully completed to complete a block of parallel processes, typically arises in computational problems and is called the &-parallelism. How successful is the result of parallelization depends on the uniform distribution of the computational load between the branches (see the first epigraph).

Another type of concurrency was first realized on search tasks. For example, if we know that a lion who escaped from a zoo is in one of four quarters and cannot move from one to another, to speed up the search, we can (if we have enough search teams) send a team to each of the quarters, and finish the search as soon as one of the teams finds a fugitive. Thus arises   15: Parallel Programming Problems - parallelism, when it is enough to successfully complete a parallel block so that one of the running processes comes to success.

Consider now what are the general conditions of applicability   15: Parallel Programming Problems - parallelism. A private example of the lion shows that, in principle , it would be possible to present a "procedure" for finding a fugitive in the form of the following conditional sentence:

  if
     Lion quarter 1 -> Search quarter 1,
     Lion quarter 2 -> Search quarter 2,
     Lion quarter 3 -> Search quarter 3,
     Lion quarter 4 -> Search quarter 4
 fi 

But the trouble is that to find out the truth of the guard, you need to execute a protected command, i.e. find the lion. Thus, this conditional operator can only serve as a ghost.

Further, there is another implicit condition, which is illustrated by the second epigraph. If the execution of a guarded team with improper guards can harm (the resources will take, the harm is something more substantial, like treatment according to an incorrect diagnosis), then we will most likely be forced to open the program after the sudden and painful death of the program.

In this way,   15: Parallel Programming Problems - parallelism is a support for the effective implementation of a phantom conditional operator that satisfies two requirements:

  1. check the security is no easier than execute the corresponding protected command;
  2. executing a command from the wrong alternative cannot hurt anything but wasting resources.

Then you can run various options in parallel on different processors and see which one will come to luck.

As an example of a problem, naturally admitting>   15: Parallel Programming Problems - parallelism and non-search task, we can consider the situation often arising in differential games (in particular, in the game "interceptor and carrier") when there is a point of choice. After this point, several incompatible methods are proposed, and in order to determine which of the strategies will lead to the capture of the target, it is necessary to calculate the behavior of the system with the considered strategy to the end. Then, if there is a choice point ahead and you need to make a decision in advance, it is advisable to start several modeling processes at once.

The third type of parallelism is actually not parallelism, and is perhaps the most common in programming practice. This is quasi-parallelism. You encounter it in programs when you spawn a subprocess with a command like thread. In this case, several processes are executed as one process in which the steps of the subprocesses are mixed, so that the programmer has the impression that the processes are executed in parallel. Quasi-parallelism can look from the outside like any of the two types of parallelism discussed earlier. In the first approximation, quasi-parallelism gives only a net loss, but in fact it allows you to take waiting intervals (for example, data input or output) that are naturally occurring in one of the processes by the operation of other processes. This is how your player works, while you are puzzled over errors in your program.

Since here all management is actually concentrated in one place, the methods of working with quasi-parallel processes are most developed.

And, finally, the last case of parallelism, when processes are not just run parallel with each other, they run on different, often poorly related, calculators. For example, the server is running a search in the database, and the main program is running for you. This is a distributed execution . Recently, such a case has also been well developed, because in such a difficult situation, any shortcomings can lead to the agony of the program.

Example 15.2.2 . A classic example of effective use   15: Parallel Programming Problems - parallelism on a poorly interconnected distributed system is a massive hacker attack of a password. Thousands of hackers, dividing the search field between themselves, launch brute force programs on their machines. Someone will find it, and the fact that others can spend a few more days (since at night hackers usually “work” and sleep during the day) to drive the brute force program doesn’t affect anything: anyway, the gain is enormous.

Interaction of processes and parallelization

So far, we have considered the ideal case when processes are independent among themselves and the question of their interaction arises only at the moment of their completion. But in practice, of course, this is extremely rare. Processes are almost always interconnected, and therefore it is necessary to consider the issue of organizing connections between them.

The first connection is general data. Consider the simplest case when processes have a common memory field in which they write updated data and read them if necessary. Even in this case, there are difficulties. If one of the processes began to update the data in the field, and the other just at that time reads them, then it may happen that he gets a jumble of old and updated data (such a case in distributed systems and in databases is considered as a violation of data integrity 1 ).

Here we should introduce the concept of a critical interval , when a program occupying a certain resource must be sure that no one else can turn to this resource, that its “friends” working in parallel cannot harm or be disoriented.

Of course, the easiest way to establish exclusive use of a shared resource: the program that accesses this resource sets the flag, and while the flag is raised, no one else can access it. But imagine, for example, the collective work on a puffy technical task. I need to, say, make changes to Chapter 3, but I cannot do this, because for a couple of hours someone has been working on Chapter 13, which practically does not depend on Chapter 3. In this case, corporate support systems for distributed work are resorted to (Lotus Works can be cited as a positive example of a stable system, although the author has not worked with it since 2002, so improvements since then could have led to a loss of reliability). They solve subtle problems of maintaining data integrity when changes are received from multiple sources, when server, network, etc. fails. All this is done in such a way that the work of one of the employees practically does not interfere with the work of the others (and if it does, then both are trying to get into the same place, but for these cases a system of conflict resolution is provided).

The critical interval, as often happens in programming, is a concept that exists in two ways: hardware and logic. The logical critical interval is determined (or rather, in principle should be determined) by the needs of the program system and can be set by natural software. But the trouble is that for equipment even such an elementary action as checking the employment flag of a resource and setting its new value can be divisible, and in the interval between your reading and writing, someone else will write down that the resource is busy and then. .. Therefore, there are hardware critical intervals that are hidden for a regular user within synchronization commands. The system tries to minimize these intervals, since at such times when the critical process is stopped and transferring control to another is extremely undesirable. Another thing is the logical critical interval. Here you can pause the process and start another one, as long as it does not access the closed resource.

When a shared resource is one, everything is more or less clear. But when there are a lot of them, several of them may be required at once ...

Example 15.3.1 . Five European philosophers walk deeply around the hall, not paying attention to each other. In the middle of the hall is a round table with a spaghetti dish, five chairs and five forks, one between every two chairs. When the philosopher feels that he is hungry, he sits down on a random free space, picks up the plugs on both sides of it (if necessary, waits until they are free), sat up and gets up to continue thinking. As European philosophers, they will not eat spaghetti with one fork, and since they are self-absorbed philosophers, they completely ignore the fact that forks are not washed 2 .

If all five philosophers sit at the table at the same time and take a fork in their right hand, then they will die of hunger at that table.

This example is the classic parallel programming problem. All theoretical disciplines of parallel process control are tested on it.

The second type of connection between parallel processes is critical points . When a process has reached a critical point, this means that it can move on only if other processes also reach the corresponding critical point. For example, to calculate the next iteration, you need to make sure that the previous iteration is already over by everyone. For the next search, you need to know that at least one of the processes running in parallel has already found the previous data.

There is one important special case of the & -parallelism, when critical points and critical intervals do not cause problems. This is pipelining . With pipelining, most of the time, the processes run independently. Then the work of all private processes is suspended, the data is exchanged and events are viewed. Such an organization of work is similar to the factory conveyor: while the conveyor is standing, workers perform their operations; then the conveyor moves, passing the results of the operations to the next performer.

It was an informal introduction to theoretical subtleties arising from the interaction of processes (anyway, really parallel or quasi-parallel). But in practice, other details are often more important.

Good parallelization is impossible without a detailed analysis of the program. In fact, if one of the processors carries 99% of the computational load, then we only lose on parallelization, as the overhead of process synchronization will be added. So the first problem of practical parallelization is the prediction of the complexity of computations of various program blocks. Only if it is possible to evenly distribute the computations among the processors, and this uniformity should be maintained within each interval between the interactions of the processes, does the parallelization give a gain. In parallel programming, the author observed examples when, seemingly innocuous and absolutely correct from the point of view of ordinary programming, the enhancement of one of the blocks led to a deterioration in the work of the entire program. So optimization is not always monotonous with respect to partitioning into subsystems.

Теперь рассмотрим один из методов квазипараллельного программирования, зачастую хорошо подходящий для моделирования естественного параллелизма автоматного программирования и потенциального параллелизма событийного программирования. Этомоделирование в системе с логическим дискретным временем.

Концепция времени является одной из важнейших во многих областях. Более того, в известном американском афоризме Time is money абсолютная ценность низводится до уровня условной. Время мы не можем получить ни от кого, не можем сохранить, можем лишь тратить.

При компьютерном моделировании поведения объектов целесообразно выделить два аспекта:

  • образы действующих объектов;
  • моделирование времени.

If time is not modeled, but taken from the real world, they talk about real-time systems. If time is part of the virtual world of the system, then this is a logical time. Highlighting its applied aspects, they often speak of logical time as conditional , or model, time.

Рассмотрим тот простейший (но исключительно важный) случай времени, когда время может быть представлено как линейно упорядоченная совокупность шагов, а все процессы, протекающие между шагами, можно считать неделимыми. Тогда на временной оси у нас есть лишь дискретная совокупность точек, в которых нужно поддерживать целостность системы и каждого из процессов. Такая дисциплина возможна и для систем реального времени, если допустима некоторая задержка ответа на событие (такой случай называется мягким реальным временем ). Тогда можно дождаться естественного завершения шага нашего процесса и лишь в промежутке между шагами просматривать события.

If time is not only discrete, but also logical, then we are talking about a system with discrete events . This situation is well suited for both modeling and pipelining and   15: Parallel Programming Problems - parallelism.

Example 15.3.2 . Consider a model problem. Suppose you need to find the shortest path in a loaded directed graph, where each arc is equipped with a number interpreted as its length. It is traditionally interpreted as defining the shortest path between cities A and B, connected by a network of unidirectional roads.

Покажем, что решение задачи можно построить как систему взаимодействующих процессов с дискретным временем. Это - хороший метод структурирования задачи и отделения деталей от существенных черт: сначала мы строим абстрактное представление, а затем конкретизируем его, например укладывая, по сути своей, параллельный алгоритм в рамки последовательной программы 3 . Идея этого подхода восходит к У.-И. Далу и Ч. Хоору, которые использовали данную задачу для демонстрации возможностей системы с дискретными событиями, предоставляемой языками SIMULA 60 и SIMULA 67, совпадавшими по модели времени и структуре управления процессами.

Базовой концепцией в данной задаче естественно является   15: Parallel Programming Problems - параллелизм. Определение кратчайшего расстояния можно представить как соревнование действующих агентов 4 "разбредающихся" по разным дорогам. Сразу же разграничиваются два класса алгоритмов: прямые , когда общий процесс начинается с A, и обратные , для которых стартовым городом назначается B. Для показа принципиальных моментов достаточно рассмотреть по одному прямому и обратному алгоритму.

Прямой алгоритм разбредающихся агентов можно описать как поведение каждого агента по следующей схеме, в качестве параметра которой задается местонахождение агента:

  1. Если агент стоит в городе, то
    • Если местонахождение агента есть B, то цель достигнута. В качестве результата выдается пройденный путь.
    • Агент проверяет, является ли город запретным. Если это так, агент ликвидируется (понятно, что при этом информация по системе в целом не теряется - другие агенты продолжают действовать).
    • Город, в котором стоит агент, объявляется запретным.
    • Порождается столько наследников агента, сколько дорог исходит из текущего местонахождения данного агента. При этом в качестве локальных данных новых агентов задается пройденный путь, запомненный родительским агентом от A до текущего местонахождения (не принципиально, уничтожается ли родительский агент или он становится одним из экземпляров наследников). Если нет дорог из текущего местонахождения, то агент ликвидируется - он зашел в тупик.
  2. Переход к следующему моменту времени - для каждого агента осуществляется один шаг передвижения по текущей дороге. Это можно интерпретировать как задержку выполнения программы на время, необходимое для перемещения в свой следующий пункт (мерой времени в данном случае служит расстояние между пунктами).
  3. Осуществляется переход к пункту 1.

Видно, что алгоритм завершает работу, когда найден путь из A в B либо когда все агенты оказываются ликвидированы.

Как и обычно, более жесткие условия на окончательную программу влекут за собой во многих отношениях более мягкие требования к прототипу (нужно расширить возможности его перестройки в разные варианты окончательной программы), но при этом желательно максимально возможное повышение уровня понятий прототипа. Если в конце концов программа будет реализована квазипараллельно, не нужно обращать внимание на то, сколько агентов могут действовать одновременно. В частности, в случае заведомого наличия пути из A в B можно не проверять запретность либо ослабить ее до случая, когда предок данного агента посещал данный город , и, следовательно, он оказался на запомненном пути. В этом случае не потребуется запоминание глобальной, а точнее, общей для всех агентов информации о графе дорог и городов.

Приведенный алгоритм нуждается в запоминании сведений о пройденном пути в локальных данных агентов. Таким образом, локальные данные каждого из агентов, которые будут ликвидированы, хранятся напрасно. Если же их забывать, то искомый путь не может быть получен, получаются лишь некоторые его характеристики. Этого недостатка лишено решение с помощью обратного блуждания, т.е. от B к A. Для него все работает точно так же, за исключением следующего:

  • локальная информация об агентах и их путях не запоминается;
  • в качестве пометки о посещении указывается, из какого города агент пришел в данный город (это можно называть пометкой-рекомендацией ).

В результате, когда какой-либо из агентов достигнет A, последовательность, начинающаяся с A и выстраивающаяся по пометкам-рекомендациям, дает искомый путь.

Fig. 15.2 иллюстрирует работу всех трех алгоритмов: прямого (a), прямого с пометками (b) и обратного с пометками-рекомендациями (c). Надписи на дугах-дорогах обозначают агентов, верхние индексы на них - порядок порождения агентов. В косых скобках записаны локально хранимые сведения о посещаемых городах. Зачеркнутые надписи указывают на уничтожение агента в связи с использованием сведений о посещениях городов.

  15: Parallel Programming Problems

Fig. 15.2. Логика алгоритмов поиска пути

Как прямой, так и обратный алгоритмы работоспособны, если обеспечить одновременную работу сразу всех процессов агентов. Если вычислитель дает возможность многопроцессорной обработки, причем с потенциально неограниченным 5 числом процессоров, то это условие выполнено автоматически.

Если параллелизм возможен, но процессов недостаточно (например, если Вы решите воспользоваться для реализации этих алгоритмовProlog-системой Muse), то мы находимся в трудной ситуации, требующей творческих решений, комбинации научного и инженерного подхода 6 .

Возникает вопрос о средствах эмуляции дискретных событий. Имеются пакеты работы с дискретными событиями, но еще в 60-е гг. было показано, что мышление в терминах времени на самом деле требует собственного языка. Классическим примером здесь до сих пор является язык SIMULA 67. Он в явном виде провозглашает имитацию параллелизма и, следовательно, дает человеку возможность мыслить в терминах действующих агентов, что представляется более гибким и естественным способом выражения многих алгоритмов. Систему с дискретными событиями можно рассматривать в качестве общего метода преодоления противоречия между принципиальным параллелизмом и реальной последовательностью вычислений.

Классическая реализация систем с дискретными событиями - это общая среда поддержки процессов. Каждый процесс может находиться в одном из четырех состояний, которые определяются с использованием так называемого управляющего списка : строго упорядоченной очереди процессов, назначаемых на выполнение. В хорошей реализации классической модели этот список скрыт.

Для каждого процесса определяется структура данных, достаточная для полного запоминания его состояния в момент, когда дискретный шаг закончен 7 . Запомненное состояние называется точкой возобновления .

Состояние называется:

  • active when (realistically) the process program is executed;
  • suspended , when the execution of the process program is interrupted, but the resume point is stored and the process is in the control list;
  • passive , when the process is not running and is not in the control list, but the resume point is stored;
  • completed when its program execution is interrupted and the resume point is not memorized.

The design of the control list simulates time. The first process is always active. This is the only active process. If he interrupts his execution, the suspended process following him becomes active. A process can be inserted into the control list ( before any process in the list or after it, after a certain time) or deleted from it. A process can also be assigned for a specific time , it is inserted before the process, the execution time of which is minimally higher than the one being scheduled. Perhaps a random (pseudo-random) action to insert a process into one or another place in the control list.

We can assume that with the help of the control list processes are set relative priorities.

It is postulated that all operations with the control list and with the activity states of the processes are the result of discrete events in the system. Until any of the events has occurred, the state of the process and its position in the control list cannot be changed.

It can be seen that this is a simulation model that creates a structure that is practically indistinguishable for an external observer from real parallelism, and with an unlimited number of processors. Since we have a system with discrete events, logical time and time delays are only a means of restructuring the control list.

As follows from the above, in relation to the problem being solved, copying agents means:

  • creating local data structures of agent processes;
  • placing agents-processes in the control list;
  • moving each active agent-process along the control list by the amount of its delay;

According to the event postulate, the execution of any agent process cannot lead to a change in the planned order of execution of other processes until an event occurs. In this case, the events are the expiration of the time delay, as well as the self-destruction of the active agent, as a result of which the next control list process becomes active.

For example, for the inverse algorithm with notes-recommendations, the diagram of the program of the agent who came to the city is as follows:

  1. Wait for the time it takes to move from a location to a given city (in fact, this means only a corresponding rearrangement of oneself in the control list, followed by a suspension of the process);
  2. If this city has already been visited (there is a recommendation mark in the city), then eliminate yourself, i.e. complete the process;
  3. Make a recommendation in this city, using the location from which the agent came;
  4. If the given city is A, then build a path using the reference recommendations and complete the process;
  5. For each road leading to a given city, generate a process of a new agent, as a parameter of which the given city is set. Assign the activation of a new process immediately after the termination of the activity of the parent process;
  6. Complete the process.
  15: Parallel Programming Problems

Fig. 15.3. Managing list

In fig. 15.3 depicts the beginning of the sequence of states of the control list, which is obtained by executing the algorithm with the data shown in Fig. 15.2s. At the top of the figure, the time model is depicted: on the horizontal axis, there are moments when the state of the agents changes (   15: Parallel Programming Problems - function of travel time between cities; a, b, c and d - moments for further explanation). In the lower part of the figure, the sequence of changes in the control list is shown (its states are indicated by rectangular blocks, in which, for clarity, vertical arrows indicate connections of processes assigned to the same time, and horizontal arrows indicate the ordering of processes by model time).

The deflation of agents begins with the spawning of processes A1 and A2, which corresponds to two roads leading to city B. This is the moment of model time, marked as a. The moment b corresponds to four states replacing each other, c corresponds to three, and ad represents one state. Comparison of Figures 15.2a and 15.2b shows that no special modeling of time, and even more delays is required: time changes "instantly" when all events assigned to earlier periods have worked and an event assigned to later term. In principle, even the time attribute is not required here - the order relationship between events is sufficient .

Even looping some agents will not interfere with the execution of the algorithm, if only the capacity of the control list is sufficient.

It is necessary to emphasize a number of important points:

  1. There is no real parallelism in a system with discrete events, but there is a parallelism effect, which is devoid of many of the most unpleasant features that require special care about synchronization.
  2. Initially, at each moment, the set of simultaneously acting agents is partially ordered, it becomes fully ordered due to the corresponding placement in the control list.
  3. A control list is a general data structure for agent-processes, but it cannot be considered global, since the control list does not have explicit access.

The book [14] can be recommended as a good presentation of the current state of affairs in the technique of practical parallel programming. In particular, the MPI system described in this book, and now the very widely and unreasonably advertised Open MP system, includes means for the quasi-parallel execution of parallel programs that are different from the system with discrete events. We do not describe these tools, because for real programs they are needed only as a method of debugging parallel programs on a machine with insufficient configuration (and a method that does not cause much enthusiasm).

125
0

Order a test or course / diploma work

find out the cost and
I need to get work done
or
I am an expert in this field and
I want to earn

Without intermediaries, but with a guarantor option

Rating 9 of 10. count vote: 2
You vote:



Comments (0)


To leave a comment

To reply

Programming Styles and Techniques

Terms: Programming Styles and Techniques